flowchart TB
D[Data Classification] --> S[By Source]
D --> N[By Nature]
D --> M[By Measurement Level]
D --> T[By Time]
S --> S1[Internal]
S --> S2[External]
N --> N1[Quantitative]
N --> N2[Qualitative]
M --> M1[Nominal · Ordinal · Interval · Ratio]
T --> T1[Time-series]
T --> T2[Cross-section]
T --> T3[Panel]
classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;
28 Sources, Acquisition and Classification of Data
Data are facts, observations or measurements recorded for analysis. The path from raw data to useful information runs through three stages: identifying sources, choosing methods of acquisition, and classifying the data so it can be analysed.
28.1 Sources of Data
| Source | Definition | Examples | Strength | Limitation |
|---|---|---|---|---|
| Primary | Collected first-hand by the researcher | Survey, experiment, observation, interview | Tailored to the question; full control over quality | Time-consuming; expensive |
| Secondary | Collected by someone else, repurposed | Census, NSSO, AISHE, journal articles, RBI reports | Quick, low-cost, often large-scale | Original purpose may not match the new question; quality outside researcher’s control |
- Census of India — decennial demographic data (Office of the Registrar General and Census Commissioner).
- National Sample Survey Office (NSSO) — household consumption, employment, health, education.
- National Statistical Office (NSO) — successor body that consolidates several survey programmes.
- All India Survey on Higher Education (AISHE) — annual data on higher education institutions.
- Reserve Bank of India (RBI) — financial and monetary statistics.
- Open Government Data Platform (data.gov.in) — central government datasets.
- State of Forest Report (Forest Survey of India) — forest cover and biodiversity.
- Sample Registration System (SRS) — births, deaths, fertility.
- Periodic Labour Force Survey (PLFS) — labour-market indicators.
- National Family Health Survey (NFHS) — health, nutrition, fertility.
28.2 Methods of Data Acquisition
| Method | What it captures | Best when |
|---|---|---|
| Survey / Questionnaire | Self-reported opinions, behaviour, demographics | Large samples, literate populations |
| Interview | Spoken responses; structured / semi-structured / unstructured | Sensitive topics, low-literacy contexts |
| Observation | Behaviour as it happens | Classroom interaction, child development |
| Experiment | Manipulated cause and observed effect | Causal claims |
| Document analysis | Existing records | Historical, policy work |
| Sensor / device data | Real-time machine readings | Traffic, weather, IoT, fitness trackers |
28.2.1 Census vs Sample Survey
| Approach | Coverage | Cost | Accuracy | Use |
|---|---|---|---|---|
| Census | Every unit in the population | High | Highest (no sampling error) | Decennial population count, foundational planning data |
| Sample survey | A representative subset | Lower | Subject to sampling error | Routine household, business and labour surveys |
28.3 Classification of Data
Data may be classified along several independent dimensions.
28.3.1 By Source
- Internal — generated by an organisation itself (sales records, payroll).
- External — drawn from sources outside the organisation (government reports, market research).
28.3.2 By Nature
| Type | Description | Examples |
|---|---|---|
| Quantitative | Numerical measurements | Height, marks, income |
| Qualitative / Categorical | Categories or attributes | Gender, religion, occupation |
28.3.3 By Measurement Level (Stevens 1946)
| Level | What it permits | Example |
|---|---|---|
| Nominal | Naming and counting | Gender, religion |
| Ordinal | Rank ordering | Educational attainment, satisfaction (Likert) |
| Interval | Equal intervals; no absolute zero | Temperature in Celsius, year |
| Ratio | Equal intervals; absolute zero | Income, height, weight, age |
28.3.4 By Time
| Type | Description | Example |
|---|---|---|
| Time-series | Same indicator measured at successive time points | Monthly inflation, annual GDP |
| Cross-section | Different units at the same time | Household survey on a single date |
| Panel / Longitudinal | Same units at successive time points | NSSO panel, NFHS rounds |
28.3.5 By Frequency
- Discrete — countable values (number of children).
- Continuous — values on a continuum (height, time).
28.4 Working Stages: Editing, Coding, Tabulation
After collection, raw data goes through three working stages before analysis.
| Stage | Purpose |
|---|---|
| Editing | Identify and correct errors, missing values, inconsistencies |
| Coding | Assign numeric labels to categorical responses (M = 1, F = 2) |
| Tabulation | Arrange in tables, frequency distributions, cross-tabs |
28.5 Frequency Distribution
A frequency distribution counts how often each value (or class of values) appears in the data.
- Class interval — a range of values (e.g., 50–60).
- Class boundary — exact upper and lower limits (49.5–60.5).
- Class width — upper − lower (60 − 50 = 10).
- Class mark / mid-point — average of upper and lower (55).
- Frequency (f) — number of observations in the class.
- Cumulative frequency — sum of frequencies up to and including the class.
- Relative frequency — frequency as a proportion of total.
28.6 Practice Questions
A researcher conducts her own household survey on consumption patterns. The data she collects is classified as:
View solution
The Census of India is conducted at intervals of:
View solution
A data set lists "religion" as Hindu / Muslim / Christian / Other. This is at which level of measurement?
View solution
AISHE — the All India Survey on Higher Education — is conducted by:
View solution
Annual GDP figures of India from 1991 to 2024 are an example of:
View solution
Which of the following is an example of *continuous* quantitative data?
View solution
In the class interval 50–60, the class width is:
View solution
Assigning the labels "M = 1, F = 2" to gender responses is which stage of data preparation?
View solution
- Two sources: Primary (first-hand) vs Secondary (re-used).
- Indian secondary sources: Census, NSSO/NSO, AISHE, RBI, NFHS, PLFS, data.gov.in.
- Six acquisition methods: Survey, Interview, Observation, Experiment, Document analysis, Sensor data.
- Classification dimensions: Source, Nature, Measurement level, Time, Frequency.
- Stevens’s levels: Nominal · Ordinal · Interval · Ratio (NOIR).
- Time forms: Time-series · Cross-section · Panel.
- Three preparation stages: Editing · Coding · Tabulation.