1 minute read

这 statistics 和 biology 搁一块儿简直就是灾难:各种不讲人话的、侧面描述的、不给数学公式的定义,各种 overloaded terminology。好嘛,这又来一个 effect size。

It’s the Effect Size, StupidPermalink

It’s the Effect Size, Stupid 是关于 effect size 的经典文章了,你 google 一般都能搜出这一篇文章。

说的是:假设有 population X,然后你有一个 experimental group Xe 和一个 control group Xc,那么有

ES=μ(Xe)μ(Xc)SD(X)
  • ES 即 effect size
  • SD 是 standard deviation
  • 我觉得它其实就是用 standard deviation 作为一个 unit 去量化了两组数据的 difference 了(联系 Gaussian 分布和 Z-score)

问题有二:

  • 你两个 groups,到底哪个是 experimental 哪个是 control,这是你自己说了算的,所以可以考虑加个 abs value
  • SD(X) 需要 estimate,具体看文章

It’s NOT the Only Effect SizePermalink

根据 StatisticsSolutions: Effect Size 的说法,effect size 有很多种!上面的定义是 Standardized mean difference,只算其中一种。以下这些都算是 effect size:

  • Regression coefficient (e.g. β in Xe=βXc+ϵ)
  • Pearson correlation coefficient (i.e. Pearson’s r)
  • Odds ratio (参 Explaining Odds Ratios)
    • Exposure 可以是 Genotype
    • Outcome 可以是 Phenotype
  • Cohen’s d effect size
  • Cohen’s f2 effect size
  • Glass’s δ effect size
  • Hedges’ g effect size
  • Cramer’s V effect size (a.k.a. Cramer’s φ)

WTF is Effect Size?Permalink

那你可能要问了:哪到底 WTF is effect size? 它 measure 的是啥?

根据 StatisticsSolutions: Effect Size

Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale.

我觉得这是一句废话。

根据 Wikipedia: Effect Size:

In statistics, an effect size is a quantitative measure of the magnitude of a phenomenon.

然后 phenomenon 的定义是 an observable fact or event,等于没说,但我觉得可以扩展一下:

  • 如果你的 phenomenon 是 difference of two groups,那么 Standardized mean difference 就是 magnitude of difference
  • 如果你的 phenomenon 是 correlation of two groups,那么 Pearson correlation coefficient 就是 magnitude of correlation
  • 依此类推,只要你的两组数据能构成一个 phenomenon,那么 effect size 它 measure 的就是这个 magnitude of phenomenon

Effect Size in GWASPermalink

GWAS 中你其实有两个观测对象:genotypes (or SNP alleles) 和 phenotypes,再分一个 experimental 和 control,其实你会有 4 组数据(假设是 bi-allelic;然后 phenotype 只有两种)。参考 CMU: Genomes and Complex Diseases:

那么怎么算 effect size of an allele on the phenotype?

  • Standardized mean difference 明显不对
  • Pearson correlation coefficient 好像也不对
  • Odds ratio 貌似是可以的
  • ……

但其实我们用的是 Regression coefficient 哒!你会想问:这个 regression 要咋做?参考 CMU: Genomes and Complex Diseases:

  • 这里 phenotype 是 continuous 的,但 discrete (categorical) 的情况也是类似的

这里我们忽略 intercept,只拿 slope β 就是 GWAS 中最常见的 effect size of an allele on the phenotype 了。

Penetrance is the probability of developing a particular disease given a particular genotype, i.e. P(Disease|Allele). 有时候 penetrance 也被算是一种 effect size,从定义上来看也说得通。

The direction of an allelic effect is the direction (add or subtract) that an allele has on a phenotype.

Categories:

Updated:

Comments