30  Quantitative and Qualitative Data

Code
options(repos = c(CRAN = "https://cran.rstudio.com/"))
knitr::opts_chunk$set(message = FALSE)

30.1 What the Syllabus Covers

The most basic distinction in data analysis is between quantitative (numerical) and qualitative (categorical) data. The distinction determines which descriptive statistics, graphs, and inferential tests are appropriate.

PYQ patterns: (a) classify a given variable as quantitative or qualitative, (b) sub-classify into discrete/continuous (quant) or nominal/ordinal (qual), (c) match each data type to its measure of central tendency (mode/median/mean), and (d) identify the Stevens scale (NOIR).

30.2 Quantitative Data

Quantitative data are numerical measurements — values that can be added, averaged, and ordered.

30.2.1 Two Types

TipDiscrete vs Continuous
Type Definition Examples
Discrete Countable integers only Number of students, number of cars, defective items
Continuous Any value on a scale, including fractions Height (cm), weight (kg), time (s), temperature

30.2.2 Stevens’ Scales — Recap

TipStevens’ NOIR Scales
Scale Example Permitted operations Permitted statistics
Nominal Gender, religion =, ≠ Mode, χ²
Ordinal Rank, Likert <, > Median, Spearman’s ρ
Interval Temperature °C, IQ +, − Mean, SD, Pearson’s r, t-test
Ratio Height, weight, income +, −, ×, ÷ All — geometric mean, CV

Nominal and Ordinal are qualitative; Interval and Ratio are quantitative.

30.2.3 Central Tendency, Dispersion, Shape

TipThree Properties of Quantitative Data
  • Central tendency — Mean (arithmetic, geometric, harmonic) · Median · Mode.
  • Dispersion — Range · Quartile deviation · Mean deviation · Variance · Standard deviation · Coefficient of variation.
  • Shape — Skewness (asymmetry) · Kurtosis (peakedness).

30.2.4 Three Means

TipArithmetic / Geometric / Harmonic Means
  • Arithmetic Mean (AM) = Σx / n. Most-used. Sensitive to outliers.
  • Geometric Mean (GM) = ⁿ√(x₁ × x₂ × … × xₙ). Used for growth rates and ratios.
  • Harmonic Mean (HM) = n / Σ(1/x). Used for rates (e.g., average speed of equal distances).
  • Order: AM ≥ GM ≥ HM (for positive numbers).

30.2.5 Median, Mode, and Quartiles

TipMedian, Mode, Quartiles
  • Median = middle value when ordered. For n odd: (n+1)/2th term. For n even: average of two middle terms.
  • Mode = most frequently occurring value. May be 0, 1 (unimodal), 2 (bimodal), or more.
  • Quartiles Q1 (25 %), Q2 (50 % = median), Q3 (75 %).
  • Percentiles = 100 divisions. Deciles = 10 divisions.
  • Empirical relation: Mode ≈ 3 × Median − 2 × Mean (for moderately skewed distributions).

30.2.6 Standard Deviation and Variance

TipSD and Variance Formulas
  • Variance σ² = Σ(xᵢ − x̄)² / n (population) or /(n−1) (sample).
  • Standard Deviation σ = √Variance.
  • Coefficient of Variation CV = (σ / x̄) × 100 %. Useful for comparing dispersion across datasets with different units.

30.2.7 Empirical / Normal Distribution Rule

Tip68-95-99.7 Rule

In a normal distribution: - ~68 % of values within μ ± 1σ. - ~95 % within μ ± 2σ. - ~99.7 % within μ ± 3σ.

30.2.8 Skewness and Kurtosis

TipSkewness and Kurtosis
  • Skewness — asymmetry. Positive (long right tail; Mean > Median > Mode). Negative (long left tail; Mean < Median < Mode). Symmetric (Mean = Median = Mode).
  • Kurtosis — peakedness. Mesokurtic (normal), Leptokurtic (sharp peak), Platykurtic (flat).

30.3 Qualitative Data

Qualitative data are categorical — values that label categories, not amounts.

30.3.1 Two Types

TipNominal vs Ordinal
Type Definition Examples
Nominal Categories with no inherent order Religion, blood group, gender, state
Ordinal Categories with a meaningful order Likert (strongly agree → strongly disagree), education level, severity

30.3.2 Statistics for Qualitative Data

TipStatistics for Qualitative Data
  • Central tendency: Mode (nominal); Median (ordinal — though strictly mode is safer).
  • Dispersion: Frequency distribution, percentages, mode-based diversity.
  • Association: Cramér’s V, phi (φ) coefficient, Goodman-Kruskal lambda, Kendall’s tau (ordinal).
  • Tests: Chi-square (χ²), Fisher’s exact test (small samples), Mann-Whitney U (ordinal), Wilcoxon, Kruskal-Wallis.

30.3.3 Qualitative Research vs Qualitative Data

TipA Useful Distinction
  • Qualitative DATA (this sub-unit) — categorical numbers (e.g., “55 men, 45 women”).
  • Qualitative RESEARCH — depth-oriented method with words, narratives, observations as data (Topic 8).

Both are “qualitative” but in different senses: the first is about measurement scale; the second is about research approach.

30.4 Mixed Data and Coding

TipCoding Qualitative Data for Quantitative Analysis
  • Dummy coding — categorical → 0/1 indicator variables.
  • Effect coding — −1/0/1.
  • One-hot encoding — common in ML.
  • Likert scales are ordinal; commonly treated as interval for parametric tests.

30.5 Comparing the Two — Side by Side

TipQuantitative vs Qualitative — Side-by-Side
Dimension Quantitative Qualitative
Data Numbers Categories
Scales Interval, Ratio Nominal, Ordinal
Operations Arithmetic Equality / order only
Central tendency Mean, Median, Mode Mode, Median (ordinal)
Dispersion SD, Variance, CV Distribution, %
Charts Histogram, line, scatter Bar, pie
Test for association Pearson’s r, regression χ², Cramér’s V
Tests for difference t-test, ANOVA χ², Fisher’s exact
Software examples SPSS, R, Stata NVivo, ATLAS.ti (qual research)

flowchart TB
  D{Data} --> QN[Quantitative<br/>Numerical]
  D --> QL[Qualitative<br/>Categorical]
  QN --> DC[Discrete<br/>Count]
  QN --> CO[Continuous<br/>Measurement]
  QL --> NM[Nominal<br/>No order]
  QL --> OR[Ordinal<br/>Order, no equal intervals]
  DC --> S1[Interval/Ratio]
  CO --> S1
  NM --> S2[Nominal]
  OR --> S3[Ordinal]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

30.6 Worked Examples — Classify the Variable

TipClassify Each Variable
  • Blood group (A, B, AB, O): Qualitative · Nominal.
  • Education level (primary, secondary, tertiary): Qualitative · Ordinal.
  • Age in years: Quantitative · Ratio · Continuous (often reported as discrete).
  • Number of children: Quantitative · Ratio · Discrete.
  • Temperature in °C: Quantitative · Interval · Continuous.
  • Temperature in Kelvin: Quantitative · Ratio · Continuous.
  • Likert satisfaction (1–5): Qualitative · Ordinal (often treated as interval).
  • Marks out of 100: Quantitative · Interval (some argue ratio) · Continuous-discrete hybrid.

30.7 Choosing the Right Statistic

TipStat by Scale and Question
Scale Central tendency Dispersion Association Test
Nominal Mode Diversity index Cramér’s V, φ χ², Fisher’s exact
Ordinal Median (Mode) IQR Spearman’s ρ, Kendall’s τ Mann-Whitney, Wilcoxon, Kruskal-Wallis
Interval Mean SD Pearson’s r t-test, ANOVA
Ratio Mean (GM, HM where appropriate) SD, CV Pearson’s r t-test, ANOVA, regression

30.8 Common Mistakes

TipCommon Mistakes
  • Using the mean on ordinal data without justification (e.g., averaging satisfaction codes).
  • Using a t-test on nominal data (use χ² instead).
  • Confusing discrete quantitative (count) with ordinal (rank).
  • Treating a 0 in Celsius as “no temperature” — Celsius has no true zero; Kelvin does.
  • Ignoring outliers when reporting the mean.
  • Confusing qualitative data with qualitative research.
  • Using bar charts for continuous data (use histograms; bars don’t touch in bar chart but do in histogram).

30.9 Theory Anchors

TipConcepts and Persons
Person Year Contribution
S.S. Stevens 1946 NOIR scales
Karl Pearson early 20th c. Correlation; chi-square
R.A. Fisher 1925, 1935 ANOVA, F-test, design of experiments
Charles Spearman 1904 Rank correlation (ordinal)
G. Udny Yule early 20th c. Yule’s Q (qualitative association)
Maurice Kendall 1938 Kendall’s tau (ordinal)
Harald Cramér 1946 Cramér’s V (nominal association)
John W. Tukey 1977 Exploratory Data Analysis (EDA); boxplot
C.R. Rao 20th c. Cramér-Rao bound; Indian statistician
Florence Nightingale 1858 Polar (coxcomb) charts; pioneer of statistical visualisation

30.10 Practice Questions

Q 01 Classify Easy

Which of the following is QUALITATIVE data?

  • AHeight in cm
  • BNumber of cars
  • CReligion
  • DTemperature
View solution
Correct Option: C
Religion — category, no inherent order = nominal qualitative.
Q 02 Sub-type Medium

"Number of defective items in a batch" is:

  • ADiscrete quantitative
  • BContinuous quantitative
  • CNominal qualitative
  • DOrdinal qualitative
View solution
Correct Option: A
Countable integers = discrete.
Q 03 Scale Medium

Likert-scale satisfaction (1 = strongly disagree to 5 = strongly agree) is:

  • ANominal
  • BOrdinal
  • CInterval
  • DRatio
View solution
Correct Option: B
Ordinal — order, but unequal "psychological" intervals. (Often treated as interval for parametric tests.)
Q 04 Stevens Medium

Temperature measured in Celsius is on which Stevens scale?

  • ANominal
  • BOrdinal
  • CInterval
  • DRatio
View solution
Correct Option: C
Equal intervals, but 0 °C ≠ absence of temperature → interval. Kelvin would be ratio.
Q 05 Stat Easy

The MOST appropriate measure of central tendency for nominal data is:

  • AMean
  • BMedian
  • CMode
  • DVariance
View solution
Correct Option: C
Mode — the only measure that makes sense for unordered categories.
Q 06 Means Hard

For a set of positive numbers, the relationship among the three means is:

  • AAM > GM > HM
  • BHM > GM > AM
  • CGM > AM > HM
  • DAM = GM = HM
View solution
Correct Option: A
For positive numbers: AM ≥ GM ≥ HM, with equality only when all values are equal.
Q 07 Empirical Medium

In a normal distribution, approximately what % of values fall within μ ± 2σ?

  • A50 %
  • B68 %
  • C95 %
  • D99.7 %
View solution
Correct Option: C
68-95-99.7 rule. 1σ → 68 %, 2σ → 95 %, 3σ → 99.7 %.
Q 08 Skewness Hard

In a positively-skewed distribution:

  • AMean > Median > Mode
  • BMean < Median < Mode
  • CMean = Median = Mode
  • DMean = Mode > Median
View solution
Correct Option: A
Long right tail pulls the mean to the right of the median, which is to the right of the mode: Mean > Median > Mode.
Q 09 Mode-Median-Mean Hard

The empirical relationship for a moderately-skewed distribution is:

  • AMode = 3 Median − 2 Mean
  • BMean = 3 Mode − 2 Median
  • CMedian = Mean × Mode
  • DMode = Mean × Median
View solution
Correct Option: A
Mode ≈ 3 × Median − 2 × Mean for moderately skewed distributions.
Q 10 Kurtosis Hard

A distribution with a sharper peak than the normal curve is called:

  • AMesokurtic
  • BLeptokurtic
  • CPlatykurtic
  • DSkewed
View solution
Correct Option: B
Leptokurtic = sharper peak. Mesokurtic = normal; Platykurtic = flatter.
Q 11 Test Medium

To test the association between two NOMINAL variables (e.g., gender and voting choice), the appropriate test is:

  • At-test
  • BPearson's r
  • CChi-square
  • DANOVA
View solution
Correct Option: C
Chi-square (χ²) for categorical-categorical association.
Q 12 Correlation Medium

Spearman's rank correlation (ρ) is most appropriate for:

  • ATwo nominal variables
  • BTwo ordinal variables
  • CTwo interval / ratio variables
  • DOne nominal and one ratio variable
View solution
Correct Option: B
Spearman's ρ uses rank order — perfect for ordinal data or non-linear monotonic relationships.
Q 13 Dispersion Medium

The Coefficient of Variation (CV) is defined as:

  • Aσ / x̄
  • B(σ / x̄) × 100 %
  • Cx̄ / σ
  • Dσ²
View solution
Correct Option: B
CV = (σ / x̄) × 100 %. Allows comparison of relative dispersion across datasets with different units.
Q 14 Mean Calc Medium

The arithmetic mean of 5, 8, 12, 15, 20 is:

  • A10
  • B12
  • C13
  • D15
View solution
Correct Option: B
(5 + 8 + 12 + 15 + 20) / 5 = 60 / 5 = 12.
Q 15 Median Calc Medium

The median of 12, 5, 18, 7, 22 is:

  • A7
  • B12
  • C13
  • D18
View solution
Correct Option: B
Sorted: 5, 7, 12, 18, 22 → middle = 12.
Q 16 Mode Easy

A distribution with two distinct modes is called:

  • AUnimodal
  • BBimodal
  • CMultimodal
  • DAmodal
View solution
Correct Option: B
Bimodal = two modes; multimodal = more than two.
Q 17 Encoding Hard

Converting a qualitative variable like "city = Mumbai/Delhi/Chennai" into three 0/1 indicator variables is called:

  • ALikert scaling
  • BOne-hot / dummy encoding
  • CNormalisation
  • DStandardisation
View solution
Correct Option: B
One-hot / dummy encoding — categorical → indicator variables for use in quantitative models.
Q 18 Chart Medium

To visualise the distribution of a CONTINUOUS quantitative variable, the BEST chart is:

  • ABar chart
  • BPie chart
  • CHistogram
  • DWord cloud
View solution
Correct Option: C
Histogram — bars touch, representing continuous intervals. Bar charts (gaps between bars) are for categorical data.
Q 19 Stat Medium

For ordinal data, the appropriate test for two independent groups is:

  • AIndependent t-test
  • BMann-Whitney U
  • CChi-square
  • DPearson's r
View solution
Correct Option: B
Mann-Whitney U — non-parametric, for ordinal or non-normal interval data.
Q 20 Match Hard

Match each scale with its appropriate central-tendency measure:

(i) Nominal (a) Mean
(ii) Ordinal (b) Mean (incl. GM, HM)
(iii) Interval (c) Mode
(iv) Ratio (d) Median
  • A(i)-c, (ii)-d, (iii)-a, (iv)-b
  • B(i)-a, (ii)-b, (iii)-c, (iv)-d
  • C(i)-b, (ii)-c, (iii)-d, (iv)-a
  • D(i)-d, (ii)-a, (iii)-b, (iv)-c
View solution
Correct Option: A
Nominal → Mode; Ordinal → Median; Interval → Mean; Ratio → Mean (including GM, HM).

30.11 Quick Recall

ImportantQuick recall
  • Quantitative: numerical (Interval, Ratio); Qualitative: categorical (Nominal, Ordinal).
  • Quantitative sub-types: Discrete (count) · Continuous (measurement).
  • Qualitative sub-types: Nominal (no order) · Ordinal (order).
  • Stevens NOIR: Nominal · Ordinal · Interval · Ratio. NOIR mnemonic.
  • 3 properties of quantitative data: Central tendency · Dispersion · Shape.
  • 3 Means: AM (= Σx/n) · GM (=ⁿ√Πx) · HM (= n/Σ(1/x)); AM ≥ GM ≥ HM.
  • Median, Mode, Quartiles. Empirical: Mode ≈ 3 Median − 2 Mean (moderate skew).
  • Variance σ² · SD σ · CV = σ/x̄ × 100 %.
  • 68-95-99.7 rule for normal distribution.
  • Skewness: Positive (Mean > Median > Mode) · Negative (Mean < Median < Mode) · Symmetric (Mean = Median = Mode).
  • Kurtosis: Mesokurtic (normal) · Leptokurtic (sharp) · Platykurtic (flat).
  • Central tendency by scale: Nominal → Mode · Ordinal → Median · Interval/Ratio → Mean.
  • Correlation by scale: Nominal → Cramér’s V/φ · Ordinal → Spearman’s ρ, Kendall’s τ · Interval/Ratio → Pearson’s r.
  • Tests by scale: Nominal → χ², Fisher’s exact · Ordinal → Mann-Whitney, Wilcoxon, Kruskal-Wallis · Interval/Ratio → t-test, ANOVA, regression.
  • Coding: Dummy/one-hot encoding for categorical → quantitative analysis.
  • Bar chart = categorical (gaps between bars). Histogram = continuous (bars touch).
  • Qualitative DATA ≠ Qualitative RESEARCH — different senses of the word.
  • Indian statistician: P.C. Mahalanobis (ISI 1931); C.R. Rao (Cramér-Rao bound).