What the Syllabus Covers
This sub-unit has three examined heads:
-
Sources of data — primary vs secondary; internal vs external.
-
Acquisition — methods of collecting / observing data.
-
Classification — organising raw data into structured form.
The most-repeated PYQ patterns are: (a) distinguishing primary vs secondary, (b) matching the right acquisition method to a research question, (c) recognising the four bases of classification (geographic, chronological, qualitative, quantitative), and (d) naming Indian statistical agencies (NSO, NSSO, CSO, MoSPI, RBI, RGI, NITI Aayog).
What “Data” Is
Data are facts, observations or measurements recorded for analysis. They become information only after processing and interpretation. Statistical work moves through: Source → Acquisition → Classification → Analysis → Interpretation → Presentation.
Data → Information (processed data) → Knowledge (information + context + meaning) → Wisdom (applied knowledge). Russell Ackoff (1989).
Sources of Data
Primary vs Secondary
| Primary |
Collected first-hand by the researcher |
Survey, interview, observation, experiment |
Tailored, recent, controlled |
Expensive, slow |
| Secondary |
Collected by someone else for another purpose |
Census, government reports, journal articles, databases |
Cheap, fast, large scale |
May not fit research need; outdated |
Internal vs External
-
Internal — sourced from within the organisation (sales records, HR data, accounts).
-
External — sourced from outside (government, industry bodies, market research firms).
Major Indian Sources of Secondary Data
| MoSPI |
Ministry of Statistics and Programme Implementation |
Apex statistical ministry |
|
NSO (2019 merger) |
National Statistical Office |
Merged CSO + NSSO under MoSPI |
| CSO |
Central Statistical Office |
National accounts, GDP, IIP, CPI |
| NSSO |
National Sample Survey Office |
Household consumption, employment, health surveys |
| RGI / ORG |
Registrar General & Census Commissioner |
Census (every 10 years); Vital Statistics; SRS |
| RBI |
Reserve Bank of India |
Banking, monetary, balance-of-payments data |
| NITI Aayog |
National Institution for Transforming India |
SDG India Index, policy data |
| NCRB |
National Crime Records Bureau |
Crime statistics |
| CGHS / MoHFW |
Ministry of Health and Family Welfare |
Health & demographic data; NFHS |
| DGCIS |
Directorate General of Commercial Intelligence and Statistics |
Foreign trade data |
| IIP |
Index of Industrial Production |
Industrial output |
| AISHE |
All India Survey on Higher Education |
Higher-ed data (MoE) |
| NIRF |
National Institutional Ranking Framework |
HEI rankings |
| IMD |
India Meteorological Department |
Weather, climate data |
|
ISRO Bhuvan |
— |
Geospatial data |
| EAC-PM |
Economic Advisory Council to PM |
Economic analysis |
| Open Government Data (OGD) Platform |
data.gov.in |
Open datasets |
International Sources
-
UN Statistical Division — World Statistics Pocketbook.
-
World Bank — World Development Indicators.
-
IMF — World Economic Outlook, Government Finance Statistics.
-
WHO — Global Health Observatory.
-
OECD — Education at a Glance; PISA.
-
UNESCO Institute for Statistics — Education and culture.
-
ILO — Labour statistics.
-
FAO — Agriculture and food.
Acquisition (Methods of Data Collection)
Primary Data — Six Standard Methods
| Direct personal investigation |
Researcher meets each respondent |
Small samples; sensitive topics |
| Indirect oral investigation |
Witnesses or third parties questioned |
When respondent unavailable |
| Schedules through enumerators |
Trained field-workers carry the form |
Census, large rural surveys |
| Mailed / online questionnaire |
Respondent fills the form |
Large, literate samples |
| Local correspondents |
Reporters in different localities |
Regular feed (e.g., agricultural prices) |
| Observation |
Watching behaviour or events |
Ethnography, classroom interaction |
Survey vs Experiment vs Observation
-
Survey — descriptive; respondents asked about themselves.
-
Experiment — manipulative; IV manipulated under control.
-
Observation — non-intrusive; researcher watches.
(Detailed coverage in Topic 8.)
Modes of Data Collection (CAPI/CATI/CAWI)
-
CAPI — Computer-Assisted Personal Interviewing (tablet in the field).
-
CATI — Computer-Assisted Telephone Interviewing.
-
CAWI — Computer-Assisted Web Interviewing (Google Forms, SurveyMonkey).
-
CASI — Computer-Assisted Self-Interviewing (sensitive topics).
-
PAPI — Paper-and-Pencil Interviewing.
Sampling — Quick Recap
(Detailed in Topic 9.) Probability sampling (simple random, stratified, systematic, cluster, multi-stage, PPS) allows statistical generalisation. Non-probability (convenience, purposive, quota, snowball) does not.
Classification of Data
Classification is the systematic arrangement of raw data into classes or categories with common characteristics.
Four Bases of Classification
| Geographical / Spatial |
Place |
State-wise literacy |
| Chronological / Temporal |
Time |
Population census, year by year |
| Qualitative |
Attribute (non-numeric) |
Gender, religion, marital status |
| Quantitative / Numerical |
Numeric value |
Income brackets, marks |
Qualitative Classification — Simple vs Manifold
-
Simple (dichotomous) — single attribute, two categories (e.g., male / female).
-
Manifold (multi-attribute) — multiple attributes combined (e.g., gender × literacy × urban-rural).
Quantitative Classification — Discrete vs Continuous
-
Discrete — values are countable integers (number of children).
-
Continuous — values fall on a continuum (height, weight, temperature).
Frequency Distribution
A frequency distribution organises quantitative data into classes (intervals) and shows the count in each.
-
Class interval — e.g., 10–20, 20–30.
-
Class limits — upper and lower bounds.
-
Class boundaries — true limits (avoiding overlap; e.g., 9.5–19.5, 19.5–29.5).
-
Class width — upper − lower.
-
Class mark / midpoint — (upper + lower) / 2.
-
Inclusive (10–19, 20–29) vs exclusive (10–20, 20–30) class methods.
-
Open-ended class — e.g., “above 90”.
Number of Classes — Sturges’ Rule
Herbert Sturges (1926): k ≈ 1 + 3.322 log₁₀(N), where k = number of classes and N = number of observations. Used as a rough rule.
Frequencies
-
Absolute frequency — raw count.
-
Relative frequency — count / total.
-
Cumulative frequency — running total; less-than or greater-than ogive.
-
Percentage frequency — relative × 100.
Tabulation
Tabulation is the orderly arrangement of classified data into rows and columns.
- Table number.
-
Title — concise, clear.
-
Head-note — additional explanatory note.
-
Stub — row labels.
-
Caption — column labels.
-
Body — actual data.
-
Source note — credit data source.
-
Footnote — clarifications.
Types of Table
-
Simple / one-way — one characteristic.
-
Two-way — two characteristics cross-classified.
-
Manifold — three or more characteristics.
-
Reference table — general purpose, large.
-
Summary table — derived measures (means, totals).
-
Frequency table — frequency distribution.
Stevens’ Scales — Recap
(Detailed in Topic 8.)
Nominal · Ordinal · Interval · Ratio — increasing in informational richness.
Data Quality and Errors
-
Validity — measures what it claims to.
-
Reliability — produces consistent results.
- Completeness, accuracy, timeliness.
-
Sampling error — random difference between sample and population.
-
Non-sampling error — instrument bias, coverage failure, non-response, data-entry mistakes. Cannot be cured by larger samples.
Big Data Vocabulary (Brief)
Volume · Velocity · Variety · Veracity · Value.
-
Volume — size of data set.
-
Velocity — speed of generation.
-
Variety — structured, semi-structured, unstructured.
-
Veracity — trustworthiness.
-
Value — usefulness.
Some lists add Variability and Visualisation.
Data Visualisation Preview
(Detailed in Topic 30.) Quick names: bar chart, pie chart, line graph, histogram, ogive, scatter plot, box plot, heatmap, GIS map, dashboard. Tools: Excel, Tableau, Power BI, R/ggplot2, Python/matplotlib, D3.js, QGIS, Bhuvan (ISRO).
Theory Anchors
| C.R. Kothari |
1985 |
Standard Indian textbook on data collection |
| Croxton & Cowden |
mid-20th c. |
Four bases of classification |
| Herbert Sturges |
1926 |
Sturges’ formula for class number |
| Stanley Smith Stevens |
1946 |
NOIR scales |
| Russell Ackoff |
1989 |
DIKW pyramid |
| R.A. Fisher |
1925, 1935 |
Statistical methodology |
| Indian Statistical System |
1949 onward |
Mahalanobis, ISI; NSS, CSO |
| P.C. Mahalanobis |
1950 |
Father of Indian statistics; founded ISI 1931 |
| MoSPI |
1999 (re-org) |
Indian statistical ministry |
| NSO |
2019 |
CSO + NSSO merger |
| OGD Platform |
2012 |
data.gov.in |
Practice Questions
Data collected first-hand by the researcher for a specific purpose are called:
-
APrimary data
-
BSecondary data
-
CTertiary data
-
DReference data
View solution
Correct Option: A
Primary data — collected by the researcher for the research at hand.
The Census of India is an example of:
-
APrimary data
-
BSecondary data
-
CTertiary data
-
DExperimental data
View solution
Correct Option: B
For researchers reusing it, the Census is secondary data. (For RGI conducting it, it is primary.)
The Census of India is conducted by:
-
ANSSO
-
BCSO
-
CRegistrar General of India
-
DRBI
View solution
Correct Option: C
RGI (Registrar General & Census Commissioner of India). Census conducted every 10 years.
The NSO (National Statistical Office) was formed in 2019 by merging:
-
ARBI and CSO
-
BCSO and NSSO
-
CNSSO and RGI
-
DNITI Aayog and MoSPI
View solution
Correct Option: B
CSO + NSSO → NSO under MoSPI, 2019.
MoSPI stands for:
-
AMinistry of Science and Public Implementation
-
BMinistry of Statistics and Programme Implementation
-
CMinistry of Statistics and Planning Information
-
DMinistry of Sample Survey and Population Index
View solution
Correct Option: B
Ministry of Statistics and Programme Implementation — apex statistical ministry, Government of India.
Classifying data state-wise (Maharashtra, Karnataka, Tamil Nadu …) is a:
-
AGeographical classification
-
BChronological classification
-
CQualitative classification
-
DQuantitative classification
View solution
Correct Option: A
By place / region = Geographical / spatial.
Data classified by religion or marital status is a:
-
AGeographical classification
-
BChronological classification
-
CQualitative classification
-
DQuantitative classification
View solution
Correct Option: C
By attribute = Qualitative. By value = quantitative; by time = chronological.
"Number of children in a family" is a:
-
AQualitative variable
-
BDiscrete quantitative variable
-
CContinuous quantitative variable
-
DNominal variable
View solution
Correct Option: B
Counted in integers; no fractional values → discrete quantitative.
In an exclusive classification, the class "20–30" includes a value of:
View solution
Correct Option: C
In the exclusive method, the upper limit is EXCLUDED. The class 20–30 includes 20 to 29.99…; 30 goes to the next class.
Sturges' rule for the number of classes in a frequency distribution is:
-
Ak = √N
-
Bk = 1 + 3.322 log₁₀ N
-
Ck = N/10
-
Dk = 2 log₂ N
View solution
Correct Option: B
k = 1 + 3.322 log₁₀(N) (Sturges 1926).
Croxton and Cowden recognise how many bases of classification?
View solution
Correct Option: C
Four: Geographical · Chronological · Qualitative · Quantitative.
A door-to-door enumerator with a tablet asking questions and recording answers in real time is using:
View solution
Correct Option: B
CAPI = Computer-Assisted Personal Interviewing.
In the DIKW pyramid, the layer immediately above "Data" is:
-
AInformation
-
BWisdom
-
CKnowledge
-
DAnalytics
View solution
Correct Option: A
DIKW: Data → Information → Knowledge → Wisdom (Russell Ackoff, 1989).
The "5 Vs" of big data are:
-
AVolume, Velocity, Variety, Veracity, Value
-
BValidity, Variety, Volume, Vector, Velocity
-
CVision, Velocity, Value, Verification, Volume
-
DValidation, Visualisation, Velocity, Volume, Variety
View solution
Correct Option: A
Volume · Velocity · Variety · Veracity · Value.
India's foreign trade statistics are published by:
View solution
Correct Option: A
DGCIS = Directorate General of Commercial Intelligence and Statistics, Kolkata.
India's Open Government Data platform is hosted at:
-
Adata.gov.in
-
Bindia.gov.in
-
Cmca.gov.in
-
Ddigitalindia.gov.in
View solution
Correct Option: A
data.gov.in — OGD platform, launched 2012 by MeitY + NIC.
Which is NOT probability sampling?
-
ASimple random
-
BStratified random
-
CCluster sampling
-
DQuota sampling
View solution
Correct Option: D
Quota is non-probability. Random/stratified/cluster are probability.
"The Father of Indian Statistics", founder of the Indian Statistical Institute (1931), is:
-
AC.R. Rao
-
BP.C. Mahalanobis
-
CR.A. Fisher
-
DP.V. Sukhatme
View solution
Correct Option: B
Prasanta Chandra Mahalanobis — founder of ISI (Kolkata, 1931); architect of the Mahalanobis Plan (Second Five-Year Plan).
In a statistical table, the labels of ROWS are called:
-
AStub
-
BCaption
-
CTitle
-
DBody
View solution
Correct Option: A
Stub = row labels. Caption = column labels.
Match each Indian agency with its primary output:
| (i) |
RGI |
(a) |
Household surveys |
| (ii) |
RBI |
(b) |
Census |
| (iii) |
NSSO |
(c) |
Crime statistics |
| (iv) |
NCRB |
(d) |
Banking and BoP |
-
A(i)-b, (ii)-d, (iii)-a, (iv)-c
-
B(i)-a, (ii)-b, (iii)-c, (iv)-d
-
C(i)-c, (ii)-d, (iii)-a, (iv)-b
-
D(i)-d, (ii)-c, (iii)-b, (iv)-a
View solution
Correct Option: A
RGI → Census; RBI → Banking/BoP; NSSO → Household surveys; NCRB → Crime statistics.
Quick Recall
-
Data = raw facts; processed → Information → Knowledge → Wisdom (DIKW, Ackoff 1989).
-
Sources: Primary (first-hand) vs Secondary (already collected). Internal vs External.
-
Indian sources: MoSPI (apex); NSO = CSO + NSSO (2019 merger). RGI = Census; RBI = monetary/BoP; NCRB = crime; DGCIS = foreign trade; NFHS (MoHFW); AISHE (MoE); NIRF; NITI Aayog; IMD weather; Bhuvan (ISRO); OGD data.gov.in.
-
International: UN Stat Division · World Bank · IMF · WHO · OECD · UNESCO · ILO · FAO.
-
6 primary methods: Direct personal · Indirect oral · Schedules through enumerators · Mailed/online questionnaire · Local correspondents · Observation.
-
Modes: PAPI · CAPI · CATI · CAWI · CASI.
-
3 approaches: Survey · Experiment · Observation.
-
4 bases of classification (Croxton & Cowden): Geographical · Chronological · Qualitative · Quantitative.
-
Qualitative: simple (dichotomous) vs manifold. Quantitative: discrete vs continuous.
-
Frequency distribution: class interval, limits, boundaries, width, mark; inclusive vs exclusive; open-ended.
-
Sturges’ rule (1926): k = 1 + 3.322 log₁₀ N.
-
Frequencies: absolute · relative · cumulative · percentage.
-
Table parts (8): number · title · head-note · stub · caption · body · source note · footnote.
-
Stevens’ scales: N · O · I · R.
-
Data quality: Validity · Reliability · Completeness · Accuracy · Timeliness.
-
Errors: Sampling (random, fixed by N) vs Non-sampling (bias, not fixed by N).
-
Big data 5 Vs: Volume · Velocity · Variety · Veracity · Value.
-
Mahalanobis (P.C.) — Father of Indian Statistics; founded ISI Kolkata 1931.