Calculate p-values from Z, T, Chi-square, or F test statistics. Choose one-tailed or two-tailed tests. Shows a shaded distribution canvas, APA-style significance statement, and α-level decision (reject/fail to reject H₀).
Choose Z-test (large samples, known σ), T-test (small samples, enter degrees of freedom), Chi-square test (goodness of fit or independence, enter df), or F-test (ANOVA, enter numerator and denominator df). Enter the computed test statistic value.
Select one-tailed left, one-tailed right, or two-tailed. Then choose your α level (0.10, 0.05, 0.01, or 0.001). The p-value is calculated and compared to α. The result clearly states: Reject H₀ (p < α, statistically significant) or Fail to reject H₀ (p ≥ α).
A canvas draws the probability distribution with the critical region shaded in red and the test statistic marked with a vertical line. APA-format reporting is shown (e.g. "t(19) = 2.45, p = .021, two-tailed"). Copy the statement for your report.
A p-value is the probability of observing a test statistic at least as extreme as the one computed from your data, assuming the null hypothesis (H₀) is true. It is NOT the probability that H₀ is true. A small p-value (e.g. p = 0.02) means your result would be rare if H₀ were true — providing evidence against H₀. A large p-value (e.g. p = 0.45) means your result is consistent with H₀ — you cannot reject it. The p-value is a conditional probability: P(data this extreme | H₀ is true).
A two-tailed test tests whether a parameter differs from the null value in either direction (greater or less). It splits α across both tails: for α=0.05, the critical region is 2.5% in each tail. Use when the direction of the effect is not predicted in advance. A one-tailed test tests whether the parameter is specifically greater than (right tail) or less than (left tail) the null value. It puts all of α in one tail (5% for α=0.05). Use only when you have a strong directional hypothesis BEFORE data collection. One-tailed tests have more power but are less conservative — two-tailed tests are standard in most fields.
α = 0.05 (5%) is the most widely used conventional threshold in social sciences, biology, and medicine. It means you accept a 5% risk of falsely rejecting a true H₀ (Type I error). α = 0.01 (1%) is used when a false positive is costly — clinical drug trials, engineering safety. α = 0.001 (0.1%) is used in physics (particle detection: "5σ" ≈ p < 2.87×10⁻⁷) and genome-wide studies. α = 0.10 (10%) is sometimes used in exploratory social science or economics research. The α level should always be set BEFORE data collection, not chosen based on results.
Statistical significance (p < α) only means the result is unlikely under H₀ — it says nothing about whether the effect is large or important. With a very large sample size, even a tiny, meaningless difference can be statistically significant. Practical significance (effect size) measures how large the effect is: Cohen's d for mean differences (0.2 = small, 0.5 = medium, 0.8 = large), R² for regression, odds ratio for proportions. Always report both the p-value AND the effect size. A significant p-value with a tiny effect size rarely justifies real-world action.
Z-test: one or two means with known σ, or large samples (n ≥ 30) — uses standard normal distribution. T-test: one or two means with unknown σ and small samples — uses t-distribution with n−1 or Welch degrees of freedom. Chi-square (χ²): categorical data — goodness-of-fit (does data fit a distribution?) or independence (are two categorical variables associated?). F-test: comparing variances of two groups, or overall significance in ANOVA (are means of 3+ groups all equal?) or regression. Each has specific assumptions (independence, normality, equal variances) that should be checked before applying.
Degrees of freedom (df) represent the number of independent pieces of information in a dataset available to estimate a parameter. For a t-test with one sample of size n: df = n−1 (because once you know n−1 deviations from the mean, the last one is determined). For two-sample t-test: df ≈ n₁ + n₂ − 2. For chi-square goodness-of-fit: df = categories − 1 − (estimated parameters). More df means the t or χ² distribution approaches the normal distribution. Fewer df means heavier tails and more conservative tests.