3 minute read

1. GenomePermalink

Genome is the genetic material of an organism. It consists of DNA, or RNA in RNA viruses. 具体说来,下列元素都属于 genome:

  • DNA
    • protein-coding genes
    • pseudogenes
    • transposon (TE - transposoble element)
    • DNA elements that can ‘jump’ to a new genomic location
  • RNA
    • rRNA (ribosomal RNA)
    • tRNA (transfer RNA)
    • short non-coding RNA
      • miRNA (microRNA)
        • MicroRNAs usually induce gene silencing by binding to target sites found within the 3’UTR of the targeted mRNA. This interaction prevents protein production by suppressing protein synthesis and/or by initiating mRNA degradation.
        • MicroRNAs are partially complementary to one or more messenger RNA (mRNA) molecules, and their main function is to downregulate gene expression in a variety of manners, including translational repression, mRNA cleavage, and deadenylation.
      • siRNA (short interfering RNA, double-stranded)
        • may come from outside the cell (e.g. virus); endo-siRNA also discovered, transcribed from cell’s own DNA
        • the most commonly used RNA interference (RNAi) tool for inducing short-term silencing of protein coding genes
        • siRNA is a synthetic RNA duplex designed to specifically target a particular mRNA for degradation.
      • 参考:
    • lncRNA (long non-coding RNA)
      • 2000 nt
      • variety of functions

DNA/RNA 其实是 molecule 的名称,所以 genome 和 DNA/RNA 的关系大致相当于 “一坨炭” 和 “碳分子” 的关系

2. Chromosome / Chromatin / Nucleosome / Chromatid / DNA / Double StrandsPermalink

  • DNA 是大分子名,Double Strand (or Double Helix) 是它的物理结构,它由下面两类元素构成:
    • Nucleotide ([‘nju:klɪətaɪd], 核苷酸),包括:
      • Adenine ([‘ædənɪn], 腺嘌吟)
      • Cytosine ([‘saɪtəʊsi:n], 胞嘧啶)
      • Guanine ([‘gwɑ:ni:n], 鸟嘌呤)
      • Thymine ([‘θaɪmi:n], 胸腺嘧啶)
    • Sugar-Phosphate Backbone
  • DNA + histone 的 complex 整体叫做 chromotin
  • cell 在不分裂的情况下,可以认为:一条 further condensed chromatin 就是一条 chromosome
    • chromosome 按功能分类可以分为 allosome ([‘ælʊsəʊm], 性染色体) / autosome ([‘ɔ:təsəʊm], 常染色体)
    • 人体的 autosome 是按长度从长到短编号的,最长的是 1 号,最短的是 22 号
    • 类似 chr18 这样的都是 1 条 chromosome 的名字,我们说人体每个细胞都有 a pair of chr18’s
      • pair up 的两条 chromosomes 称为 homologous chromosomes (或 homologs for short), 原义是指 identical to one another in shape and size,但明显 XY 不能算
      • “identical in shape and size” 明显不意味着 “identical in sequence”
      • We can specify maternal chr18 and paternal chr18 to indicate from which parent it is inherited.
  • cell 在分裂的情况下 (具体在 Cell Cycle 的 S phase),一条 chromatin 会分裂成两条 chromatids
    • Every chromatid has a short p-arm (“p” for “petit”) and a long q-arm (“q” for “queue”)
    • Every 2 chromatids are connected by a centromere.
    • 我们在研究单条 chromosome 的时候仍然会使用 p-arm、q-arm 和 centromere 来细分单条 chromosome 的结构
    • 注意: 我们经常看到的 chromosome 的图片都是一个 X-shaped with 4 arms 的结构,这其实是分裂中的两条 chromatids,注意这并不是 chromosome 的正常形态,人体内并不是有 23 对这样的 X-shaped chromosomes (我 TM 被这个图片骗惨了!)

数量关系:

  • Every human cell has 23 pairs == 46 chromosomes
  • 1 chromosome == 2 strands
  • So every human cell has 92 strands

3. Coordinates Systems / Upstream & Downstream / TSS DistancePermalink

我们前面一路从 DNA zoom in 到 chromosome,现在进一步 zoom in 到 strand:

  • chromStart and chromEnd are columns from table snp142 of database hg19 in UCSC Genome Browser
  • txStart and txEnd are columns from table ensGene of database hg19 in UCSC Genome Browser
  • 0-based coordinate system is used here

3.1 Coordinates SystemsPermalink

参考 Tutorial: Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems:

  • 0-based:
    • UCSC Genome Browser
    • BED, BAM formats
  • 1-based
    • HGMD
    • Ensembl
    • GFF, SAM, VCF formats
  • 转换规则:
    • chromStart0=chromStart11
    • chromEnd0=chromEnd1

3.2 Upstream & DownstreamPermalink

3.3 Directions of StrandsPermalink

  • 参考 Question: Forward And Reverse Strand Conventions
  • This designation of (+)/(-) strand is arbitrary.
    • Once fixed, the (+) strand determines the direction of coordinate axis;
    • then (-) strand goes reversely.
  • A gene g can be on the (+) strand or (-) strand:
    • The strand that g is on is called g’s coding strand (a.k.a. its sense strand).
    • The other strand is called g’s template strand (a.k.a. its antisense strand)
    • If strand(g)= (+) TSS(g)=g.txStart
    • If strand(g)= (-) TSS(g)=g.txEnd
      • N.B. g.txStart<g.txEnd always holds

3.4 TSS DistancePermalink

  • TSS: Transcription Start Site`
    • TSS distance: actually means “distance to TSS”
  • Given a SNP s and a gene g:
    • If strand(g)= (+) TSS-dist(s,g)=s.chromStartg.txStart;
    • If strand(g)= (-) TSS-dist(s,g)=g.txEnds.chromStart.
    • In other words, suppose:
      • strand(g)={1 g is on (+) strand1otherwise
      • TSS(g)={g.txStart g is on (+) strandg.txEndotherwise
        • TSS-dist(s,g)=(s.chromStartTSS(g))×strand(g)
    • If s is in the upstream of g, TSS-dist(s,g)<0, whichever strand g is on
    • If s is in the downstream of g, TSS-dist(s,g)>0, whichever strand g is on

Categories:

Updated:

Comments