What does effect size mean in GWAS?
这 statistics 和 biology 搁一块儿简直就是灾难:各种不讲人话的、侧面描述的、不给数学公式的定义,各种 overloaded terminology。好嘛,这又来一个 effect size。
It’s the Effect Size, StupidPermalink
It’s the Effect Size, Stupid 是关于 effect size 的经典文章了,你 google 一般都能搜出这一篇文章。
说的是:假设有 population
即 effect size 是 standard deviation- 我觉得它其实就是用 standard deviation 作为一个 unit 去量化了两组数据的 difference 了(联系 Gaussian 分布和 Z-score)
问题有二:
- 你两个 groups,到底哪个是 experimental 哪个是 control,这是你自己说了算的,所以可以考虑加个 abs value
需要 estimate,具体看文章
It’s NOT the Only Effect SizePermalink
根据 StatisticsSolutions: Effect Size 的说法,effect size 有很多种!上面的定义是 Standardized mean difference,只算其中一种。以下这些都算是 effect size:
- Regression coefficient (e.g.
in ) - Pearson correlation coefficient (i.e. Pearson’s
) - Odds ratio (参 Explaining Odds Ratios)
- Exposure 可以是 Genotype
- Outcome 可以是 Phenotype
- Cohen’s
effect size - Cohen’s
effect size - Glass’s
effect size - Hedges’
effect size - Cramer’s
effect size (a.k.a. Cramer’s )
WTF is Effect Size?Permalink
那你可能要问了:哪到底 WTF is effect size? 它 measure 的是啥?
根据 StatisticsSolutions: Effect Size:
Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale.
我觉得这是一句废话。
In statistics, an effect size is a quantitative measure of the magnitude of a phenomenon.
然后 phenomenon 的定义是 an observable fact or event,等于没说,但我觉得可以扩展一下:
- 如果你的 phenomenon 是 difference of two groups,那么 Standardized mean difference 就是 magnitude of difference
- 如果你的 phenomenon 是 correlation of two groups,那么 Pearson correlation coefficient 就是 magnitude of correlation
- 依此类推,只要你的两组数据能构成一个 phenomenon,那么 effect size 它 measure 的就是这个 magnitude of phenomenon
Effect Size in GWASPermalink
GWAS 中你其实有两个观测对象:genotypes (or SNP alleles) 和 phenotypes,再分一个 experimental 和 control,其实你会有 4 组数据(假设是 bi-allelic;然后 phenotype 只有两种)。参考 CMU: Genomes and Complex Diseases:
那么怎么算 effect size of an allele on the phenotype?
- 用 Standardized mean difference 明显不对
- 用 Pearson correlation coefficient 好像也不对
- 用 Odds ratio 貌似是可以的
- ……
但其实我们用的是 Regression coefficient 哒!你会想问:这个 regression 要咋做?参考 CMU: Genomes and Complex Diseases:
- 这里 phenotype 是 continuous 的,但 discrete (categorical) 的情况也是类似的
这里我们忽略 intercept,只拿 slope
Related Concepts: Penetrance / Direction of an Allelic EffectPermalink
Penetrance is the probability of developing a particular disease given a particular genotype, i.e.
Comments