28  Sources, Acquisition and Classification of Data

Data are facts, observations or measurements recorded for analysis. The path from raw data to useful information runs through three stages: identifying sources, choosing methods of acquisition, and classifying the data so it can be analysed.

28.1 Sources of Data

TipPrimary vs Secondary Sources
Source Definition Examples Strength Limitation
Primary Collected first-hand by the researcher Survey, experiment, observation, interview Tailored to the question; full control over quality Time-consuming; expensive
Secondary Collected by someone else, repurposed Census, NSSO, AISHE, journal articles, RBI reports Quick, low-cost, often large-scale Original purpose may not match the new question; quality outside researcher’s control
TipMajor Indian Secondary-Data Sources
  • Census of India — decennial demographic data (Office of the Registrar General and Census Commissioner).
  • National Sample Survey Office (NSSO) — household consumption, employment, health, education.
  • National Statistical Office (NSO) — successor body that consolidates several survey programmes.
  • All India Survey on Higher Education (AISHE) — annual data on higher education institutions.
  • Reserve Bank of India (RBI) — financial and monetary statistics.
  • Open Government Data Platform (data.gov.in) — central government datasets.
  • State of Forest Report (Forest Survey of India) — forest cover and biodiversity.
  • Sample Registration System (SRS) — births, deaths, fertility.
  • Periodic Labour Force Survey (PLFS) — labour-market indicators.
  • National Family Health Survey (NFHS) — health, nutrition, fertility.

28.2 Methods of Data Acquisition

TipSix Working Methods of Data Acquisition
Method What it captures Best when
Survey / Questionnaire Self-reported opinions, behaviour, demographics Large samples, literate populations
Interview Spoken responses; structured / semi-structured / unstructured Sensitive topics, low-literacy contexts
Observation Behaviour as it happens Classroom interaction, child development
Experiment Manipulated cause and observed effect Causal claims
Document analysis Existing records Historical, policy work
Sensor / device data Real-time machine readings Traffic, weather, IoT, fitness trackers

28.2.1 Census vs Sample Survey

TipCensus vs Sample
Approach Coverage Cost Accuracy Use
Census Every unit in the population High Highest (no sampling error) Decennial population count, foundational planning data
Sample survey A representative subset Lower Subject to sampling error Routine household, business and labour surveys

28.3 Classification of Data

Data may be classified along several independent dimensions.

28.3.1 By Source

  • Internal — generated by an organisation itself (sales records, payroll).
  • External — drawn from sources outside the organisation (government reports, market research).

28.3.2 By Nature

TipQuantitative vs Qualitative
Type Description Examples
Quantitative Numerical measurements Height, marks, income
Qualitative / Categorical Categories or attributes Gender, religion, occupation

28.3.3 By Measurement Level (Stevens 1946)

TipFour Levels of Measurement
Level What it permits Example
Nominal Naming and counting Gender, religion
Ordinal Rank ordering Educational attainment, satisfaction (Likert)
Interval Equal intervals; no absolute zero Temperature in Celsius, year
Ratio Equal intervals; absolute zero Income, height, weight, age

28.3.4 By Time

TipTime-Based Classification
Type Description Example
Time-series Same indicator measured at successive time points Monthly inflation, annual GDP
Cross-section Different units at the same time Household survey on a single date
Panel / Longitudinal Same units at successive time points NSSO panel, NFHS rounds

28.3.5 By Frequency

  • Discrete — countable values (number of children).
  • Continuous — values on a continuum (height, time).

flowchart TB
  D[Data Classification] --> S[By Source]
  D --> N[By Nature]
  D --> M[By Measurement Level]
  D --> T[By Time]
  S --> S1[Internal]
  S --> S2[External]
  N --> N1[Quantitative]
  N --> N2[Qualitative]
  M --> M1[Nominal · Ordinal · Interval · Ratio]
  T --> T1[Time-series]
  T --> T2[Cross-section]
  T --> T3[Panel]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

28.4 Working Stages: Editing, Coding, Tabulation

After collection, raw data goes through three working stages before analysis.

TipThree Stages of Data Preparation
Stage Purpose
Editing Identify and correct errors, missing values, inconsistencies
Coding Assign numeric labels to categorical responses (M = 1, F = 2)
Tabulation Arrange in tables, frequency distributions, cross-tabs

28.5 Frequency Distribution

A frequency distribution counts how often each value (or class of values) appears in the data.

TipFrequency Distribution Vocabulary
  • Class interval — a range of values (e.g., 50–60).
  • Class boundary — exact upper and lower limits (49.5–60.5).
  • Class width — upper − lower (60 − 50 = 10).
  • Class mark / mid-point — average of upper and lower (55).
  • Frequency (f) — number of observations in the class.
  • Cumulative frequency — sum of frequencies up to and including the class.
  • Relative frequency — frequency as a proportion of total.

28.6 Practice Questions

Q 01 Primary vs Secondary Easy

A researcher conducts her own household survey on consumption patterns. The data she collects is classified as:

  • APrimary data
  • BSecondary data
  • CTertiary data
  • DArchival data
View solution
Correct Option: A
Data collected first-hand by the researcher is primary data.
Q 02 Census Easy

The Census of India is conducted at intervals of:

  • AFive years
  • BTen years (decennial)
  • CTwenty years
  • DAnnually
View solution
Correct Option: B
The Census of India is decennial — conducted every 10 years.
Q 03 Levels of Measurement Medium

A data set lists "religion" as Hindu / Muslim / Christian / Other. This is at which level of measurement?

  • ANominal
  • BOrdinal
  • CInterval
  • DRatio
View solution
Correct Option: A
Categories with no inherent order = nominal.
Q 04 AISHE Medium

AISHE — the All India Survey on Higher Education — is conducted by:

  • AReserve Bank of India
  • BMinistry of Education, Government of India
  • CUNICEF
  • DWorld Bank
View solution
Correct Option: B
AISHE is conducted by the Ministry of Education, Government of India.
Q 05 Time Classification Medium

Annual GDP figures of India from 1991 to 2024 are an example of:

  • ACross-section data
  • BTime-series data
  • CPanel data
  • DCategorical data
View solution
Correct Option: B
Same indicator measured at successive time points = time-series data.
Q 06 Discrete vs Continuous Easy

Which of the following is an example of *continuous* quantitative data?

  • ANumber of children in a household
  • BHeight of a person
  • CType of religion
  • DMarital status
View solution
Correct Option: B
Height can take any value on a continuum. Number of children is discrete; religion and marital status are categorical.
Q 07 Class Width Medium

In the class interval 50–60, the class width is:

  • A5
  • B10
  • C11
  • D55
View solution
Correct Option: B
Class width = upper limit − lower limit = 60 − 50 = 10.
Q 08 Stages of Preparation Easy

Assigning the labels "M = 1, F = 2" to gender responses is which stage of data preparation?

  • AEditing
  • BCoding
  • CTabulation
  • DSampling
View solution
Correct Option: B
Coding = assigning numeric labels to categorical responses.
ImportantQuick recall
  • Two sources: Primary (first-hand) vs Secondary (re-used).
  • Indian secondary sources: Census, NSSO/NSO, AISHE, RBI, NFHS, PLFS, data.gov.in.
  • Six acquisition methods: Survey, Interview, Observation, Experiment, Document analysis, Sensor data.
  • Classification dimensions: Source, Nature, Measurement level, Time, Frequency.
  • Stevens’s levels: Nominal · Ordinal · Interval · Ratio (NOIR).
  • Time forms: Time-series · Cross-section · Panel.
  • Three preparation stages: Editing · Coding · Tabulation.