WD kernel with shifts
原文 RASE: recognition of alternatively spliced exons in C.elegans.
1. IntroductionPermalink
作者先列举了一系列新的 feature 成果,指出:
- 这些 feature 的确很 discriminative
- 但是有很多只适用于 conserved 的 sequence
- human exons are frequently not conserved, making conservational features not available
于是我们提出:要用 always available 的 features 来做 constitutive spliced exon 和 alternative spliced exon 的 classification,注意他说了两个方面的 feature:
- features derived from the exon and intron lengths, and
- features based on the pre-mRNA sequence.
在后面的 combined kernel
2. MethodsPermalink
2.1 A database of alternatively and constitutively spliced exons for Caenorhabditis elegansPermalink
收集数据。
注意第二段:
In the following step we identified pairs of sequences in our set that share the same 3’ and 5’ boundaries of the upstream and downstream exon, respectively, where one sequence contains an internal exon and the other does not (i.e. shows evidence of alternative exon usage with the same flanking exon boundaries). This way, we identified 487 exons for which ESTs show evidence for alternative splicing.
这里的意思是:我把上下游相同的 sequences 找出来形成一个小集合(原文的 pair of sequences),它们理应是相同的,但是有的 sequence 有 internal exon 有的没有,那就说明这个 exon 有被 alternative spliced。
这 487 个 exon 是 positive training example (alternative spliced)。然后后面 2531 个是 negative training example (constitutively spliced),鉴定方法是选出 [intron, exon, intron] 这样的 exon triples,然后:
… exon triples that did not show evidence for alternative splicing. We considered this as sufficiently likely when the internal exon and the flanking introns were confirmed by at least two different EST sequences.
EST 这里就不展开了。
于是总共有 3018 个 training example,training 啊、CV 啊什么的可以就位了。
2.2 The weighted degree (WD) kernelPermalink
and is the oligomer ([‘ɒlɪgəʊmə], 低聚物) of length starting at position of the sequence . 其实就是 -mer
这个 kernel 要求严格对齐,不能处理错位(shift)的情况,于是作者提出了 WD kernel with shifts in order to find sequence motifs which are less precisely localized:
是 position cursor,表示当前位置 是 shift 长度 右半部分的意思是:先把 shift 几位与 比一比,再把 shift 几位与 比一比 is the weight assigned to shifts (in either direction) of extent determines the shift range at position is a weighting over the position in the sequence.2.3.3 MKL for interpretation
会 KWSK。
接下来证明这个 kernel 是 valid kernel,结合 Ng 的 Note 看看就好。
然后阐述了下与 oligo kernel 的区别与联系。
2.3 Distinguishing alternatively from constitutively spliced exonsPermalink
2.3.1 OverviewPermalink
第一段把 introduction 又说了一遍,妹的论文还可以这样写啊……
第二段有:
We define a 201 nt window of (−100,+100) around the acceptor and donor splice sites, respectively, and extract a pair of subsequences,
and , for each exon , .
也就是说,对每一个 exon,我们一头一尾产生两个 sequence。比较两个 exon 时,头与头比较、尾与尾比较。这样就达成了:
captures positional information relative to the start and the end of the exon (particularly in the intronic regions upstream and downstream and the exonic sequence near the boundaries of the exon)
于是来了个 combined kernel:
最后那一项是一个 linear kernel,
characterizing the exon length characterizing the upstream intron length of characterizing the downstream intron length of characterizing in which of the three frames of the exon stop codons appear
后面就不展开了。
2.3.2 Model selectionPermalink
开始跑 CV。然后他说的 model selection 好像就是 tune the parameters。
2.3.3 MKL for interpretationPermalink
MKL 指 Multiple Kernel Learning,它的思想是这样的:如果我们有一个 linear combined kernel,比如
而且 MKL 还可以这么用:比如我有
我们这里讨论 MKL 是为了计算 3.1.2
。
2.4 Finding skipped exons within intronsPermalink
看样子就是先自己确定 splicing site,也就是自己定位 exon,再判断是否是 alternative splicing。还是用上面的 kernel,这里就不深入了。
另外从这一大节看出,输入应该都是 exon triple,不然单独输入一个 exon 就不用判断 splicing site 了。
2.5 Material and methods for the biological confirmation experimentPermalink
偏生物,skip
3. Results and DiscussionPermalink
3.1 Recognition of alternatively spliced exonsPermalink
3.1.1 Simulation experimentPermalink
介绍试验数据
3.1.2 Understanding the SVM classifierPermalink
前面我们通过 KML 计算出了
然后针对这几个区域,统计 hexamer 的 frequency,可以得到潜在的 motif。
这一节的思路值得学习。
3.1.3 Biological validationPermalink
skip
3.2 Finding skipped exons within intronsPermalink
skip
4. ConclusionPermalink
skip
Comments