3 minute read

总结自:

  • Chi: [kaɪ]

其实还有 Chi-square test for variance in a normal population 以及 Chi-squared distribution,这里不涉及。


What is a chi-square testPermalink

A chi-square test is also referred to as χ2 test (χ 这个符号在 latex 里就是 \chi).

The test is applied when you have two categorical variables from a single population. It is used to determine whether these two categorical variables are independent.

Digress: What is a categorical variable?Permalink

Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical).

  • Categorical variables take on values that are names or labels. E.g.
    • the color of a ball (red, green, blue, etc.)
    • the breed of a dog (collie, shepherd, terrier, etc.)
  • Quantitative variables represent a measurable quantity. E.g.
    • the population of a city

When to Use Chi-Square Test for IndependencePermalink

The test procedure is appropriate when the following conditions are met:

  • The sampling method is simple random sampling.
  • The variables under study are each categorical.
  • If sample data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5.

State the HypothesesPermalink

Given variable A (which has r levels), and variable B (which has c levels),

  • H0: variable A and variable B are independent.
  • Ha: variable A and variable B are not independent.

Analyze Sample DataPermalink

  • Degrees of freedom: DF=(r1)(c1)
  • Expected frequencies: Er,c=(nrnc)/n
    • Er,c is the expected frequency count for level r of variable A and level c of variable B
    • nr is the total number of sample observations at level r of variable A
    • nc is the total number of sample observations at level c of variable B
    • n is the total sample size
  • Test statistic: χ2=[(Or,cEr,c)2Er,c]
    • Or,c is the observed frequency count for level r of variable A and level c of variable B
  • p-value: 计算时需要 DFχ2 两个值,可以使用 Chi-Square Calculator: Online Statistical Table

ExamplePermalink

Question: Is there a gender gap? Do the men’s voting preferences differ significantly from the women’s preferences?

  • H0: “Gender” and “Voting Preference” are independent.
  • Ha: “Gender” and “Voting Preference” are not independent.
DF=(r1)(c1)=(21)(31)=2E1,1=(400450)/1000=180000/1000=180E1,2=(400450)/1000=180000/1000=180E1,3=(400100)/1000=40000/1000=40E2,1=(600450)/1000=270000/1000=270E2,2=(600450)/1000=270000/1000=270E2,3=(600100)/1000=60000/1000=60χ2=(200180)2/180+(150180)2/180+(5040)2/40+{}=1(250270)2/270+(300270)2/270+(5060)2/60=2.22+5.00+2.50+1.48+3.33+1.67=16.2

查表得 P(DF=3,χ2>16.2)=0.0003.

Since the p-value (0.0003) is less than the significance level (0.05), we cannot accept the null hypothesis. Thus, we conclude that there is a relationship between gender and voting preference.

Comments