7  Evaluation Systems: Elements and Types of evaluation, Evaluation in Choice Based Credit System in Higher education, Computer based testing, Innovations in evaluation systems

7.1 The Four Examined Sub-topics

The syllabus carves this topic into four examined heads:

  1. Elements and types of evaluation
  2. Choice Based Credit System (CBCS) evaluation
  3. Computer-Based Testing (CBT)
  4. Innovations in evaluation systems

The two most frequent PYQ patterns are (a) distinguishing measurement vs assessment vs evaluation, and (b) matching type of evaluation (formative/summative/diagnostic/placement) with its purpose.

flowchart TB
  E{Evaluation in<br/>Higher Education} --> A[Elements & Types]
  E --> B[CBCS Evaluation]
  E --> C[Computer-Based Testing]
  E --> D[Innovations<br/>Authentic · Rubrics · OBE]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

7.2 Three Layers — Measurement, Assessment, Evaluation

These three terms are routinely confused in PYQs. The classical distinction:

TipMeasurement vs Assessment vs Evaluation
Layer What it does Output Example
Measurement Assigns a number to a property Numeric score Raw test marks (62/100)
Assessment Gathers and interprets evidence of learning Description of performance Performance grade, feedback note
Evaluation Judges worth, makes a decision Value judgement, decision Pass/fail, promote, certify

A common chain: measurement → assessment → evaluation. Measurement is quantitative; assessment can be quantitative or qualitative; evaluation is interpretive and decision-oriented.

7.2.1 The Three Domains of Educational Outcomes

TipThree Bloom Domains — All Three Must Be Evaluated
Domain Lead theorist Levels (highest first) Tool
Cognitive Bloom 1956 / Anderson-Krathwohl 2001 Create · Evaluate · Analyse · Apply · Understand · Remember Written test, viva
Affective Krathwohl 1964 Characterising · Organising · Valuing · Responding · Receiving Attitude scale, observation
Psychomotor Simpson 1972 / Dave 1970 / Harrow 1972 Origination · Adaptation · Mechanism · Manipulation · Imitation · Perception Performance test, checklist

7.3 Elements of Evaluation

A complete evaluation system has four interlocking elements:

TipFour Elements of Evaluation
  1. Objectives — what the learner should know, do, value (stated in Bloom verbs).
  2. Learning experiences — methods, materials, activities.
  3. Evaluation procedures — tools, tests, rubrics.
  4. Feedback / reform — using results to revise.

This is the Tyler model (1949) — Ralph W. Tyler in Basic Principles of Curriculum and Instruction. Tyler is the “father of curriculum evaluation”.

7.3.1 Essential Qualities of a Good Evaluation Tool

TipSix Qualities of a Good Evaluation Tool
Quality What it means Worked example
Validity Measures what it claims to measure A statistics test that tests statistics — not language
Reliability Consistent across raters / occasions Two graders give same essay similar marks
Objectivity Free from grader bias MCQ, machine-marked
Practicability Feasible to administer Time, cost, infrastructure realistic
Discrimination Separates high from low achievers Item-analysis discrimination index
Comprehensiveness Covers full content & all Bloom levels Blueprint with table of specifications

Validity types: content, construct, criterion (concurrent & predictive), face. Reliability methods: test-retest, parallel-form, split-half, Kuder-Richardson (KR-20, KR-21), Cronbach’s α.

7.4 Types of Evaluation — The PYQ-Heavy Section

7.4.1 By Purpose

TipFour Types by Purpose
Type When What it tells Theorist
Placement Before instruction Where to place the learner in a sequence
Diagnostic Before / during instruction Specific weaknesses, learning difficulties
Formative During instruction Where the learner is now — for improving learning Michael Scriven, 1967
Summative At end of instruction Whether the learner has achieved — for judging Michael Scriven, 1967

The formative/summative distinction is Michael Scriven’s, in The Methodology of Evaluation (1967). Memory cue: formative = forming (in progress); summative = sum (at the end).

7.4.2 By Reference

TipTwo Types by Reference
Type Compares learner to Use
Norm-Referenced Test (NRT) Other learners in the group Ranking, selection (NEET, JEE)
Criterion-Referenced Test (CRT) A pre-defined standard Mastery, certification (driving licence)

Robert Glaser (1963) coined both terms.

7.4.3 By Approach / Mode

TipModes of Evaluation
  • Internal (by the institution) vs External (by an outside body).
  • Continuous (CCE-style) vs Terminal (one-shot exam).
  • Oral (viva, interview) vs Written vs Practical.
  • Subjective (essay) vs Objective (MCQ).
  • Process (how the learner solves) vs Product (what the learner produces).

7.4.4 The Big Four Tests — Achievement, Aptitude, Intelligence, Personality

TipFour Standardised Test Families
Test What it measures Examples
Achievement What has been learned School exam, UGC NET
Aptitude Potential to learn / perform GRE, JEE Advanced
Intelligence General cognitive ability Stanford-Binet, WAIS, Raven’s
Personality Traits, dispositions MMPI, NEO-PI, 16-PF

7.5 Choice-Based Credit System (CBCS) Evaluation

7.5.1 Concept and Origin

CBCS is a credit-based, semester-based, learner-choice framework adopted by UGC in 2015 for Indian HEIs (implementation document: Choice Based Credit System, UGC, 2015). It replaces year-end-only summative evaluation with semesterised, continuous, multi-component assessment.

TipThree Course Types in CBCS
  • Core Courses (CC) — compulsory in the discipline.
  • Elective Courses (EC) — chosen by learner; Discipline-Specific Elective (DSE), Generic Elective (GE), Skill Enhancement Course (SEC), Ability Enhancement Compulsory Course (AECC).
  • Foundation Courses — value-based, language, environmental.

7.5.2 The CBCS Credit Definition

TipCredit Calculation under CBCS
  • 1 Theory credit = 1 hour of lecture per week for one semester (~15 weeks → ~15 contact hours).
  • 1 Tutorial credit = 1 hour per week.
  • 1 Practical credit = 2 hours per week.
  • A typical UG semester carries 20–24 credits; a 3-year UG = 132–148 credits.

7.5.3 CBCS Grading — 10-Point Scale

TipUGC’s CBCS 10-Point Grade Scale
Letter Grade Grade Point
O Outstanding 10
A+ Excellent 9
A Very Good 8
B+ Good 7
B Above Average 6
C Average 5
P Pass 4
F Fail 0
Ab Absent 0

7.5.4 SGPA, CGPA — How to Compute

TipSGPA and CGPA Formulae

SGPA (Semester Grade Point Average) for a semester:

\[\text{SGPA} = \frac{\sum (C_i \times G_i)}{\sum C_i}\]

where \(C_i\) = credit of course \(i\) and \(G_i\) = grade point earned in course \(i\).

CGPA (Cumulative Grade Point Average) across all semesters:

\[\text{CGPA} = \frac{\sum (C_i \times S_i)}{\sum C_i}\]

where \(S_i\) = SGPA of semester \(i\) weighted by total credits of that semester. UGC also publishes a CGPA → percentage conversion: % ≈ CGPA × 10 − 7.5 (varies by university).

7.5.5 Internal vs External Components

Typical CBCS course evaluation splits 25/75 or 30/70 or 40/60: internal (mid-semester test, assignment, attendance, project) + external/end-semester (written exam, viva, practical).

7.5.6 NEP 2020 + NCrF — Beyond CBCS

NEP 2020 introduces multi-disciplinary flexibility on top of CBCS:

TipNEP 2020 Evaluation Reforms
  • Academic Bank of Credits (ABC) — UGC 2021. Learner’s credits stored centrally; can be redeemed across HEIs.
  • Multiple Entry and Multiple Exit (MEME) — Certificate (1 yr) → Diploma (2 yr) → Bachelor’s (3 yr) → Bachelor’s Honours / Research (4 yr).
  • National Credit Framework (NCrF) 2022 — integrates school, HE, vocational; gives credit for learning outcome, not seat-time.
  • Four-Year Undergraduate Programme (FYUP) with research option (UGC 2022 curriculum framework).
  • PARAKH (Performance Assessment, Review, and Analysis of Knowledge for Holistic development) — NEP-mandated national assessment centre (2023, NCERT).

7.6 Computer-Based Testing (CBT)

7.6.1 What CBT Is

A Computer-Based Test is delivered, recorded, and (often) marked through a computer. Includes online, kiosk-based, iBT (Internet-based test), and CAT (Computer-Adaptive Test).

TipThree Types of CBT
  • Linear CBT — same items in same order for every candidate (NTA NET).
  • Linear-on-the-fly Testing (LOFT) — items drawn at random from an item bank per candidate.
  • Computer-Adaptive Testing (CAT) — item difficulty adjusts to learner’s running performance using Item Response Theory (IRT); GRE, GMAT use this.

7.6.2 Indian CBT Landscape

TipIndian Computer-Based Tests
  • NTA (National Testing Agency, 2017) conducts JEE Main, NEET-UG, UGC-NET, CUET-UG, CUET-PG as CBTs.
  • NCERT NAS (National Achievement Survey) has moved toward CBT.
  • PARAKH is the NEP-2020 holistic-assessment centre under NCERT.
  • SWAYAM proctored exams are CBTs at TCS-iON centres.

7.6.3 CBT Strengths and Limitations

TipCBT — Strengths and Limitations
Strengths Limitations
Fast scoring; instant feedback Digital divide; rural connectivity
Item-bank security; randomised items Item-bank construction is expensive
Multimedia items (video, simulation) Limited for essay / creative tasks
Adaptive testing (CAT) shortens tests Requires IRT-trained item-writers
Auto-proctoring (AI, webcam) Privacy concerns; bias in proctoring AI
Data analytics on item performance Hardware failure risk

7.6.4 Item Response Theory (IRT)

IRT underlies modern CBTs. It models the probability of a correct response as a function of:

TipIRT Parameter Models
  • 1-PL (Rasch model) — only item difficulty.
  • 2-PL — item difficulty + discrimination.
  • 3-PL — adds guessing parameter (chance of correct by guess).
  • 4-PL — adds upper asymptote (carelessness).

IRT lets the same ability score come from different item sets — essential for CAT and equating across forms.

7.7 Innovations in Evaluation Systems

7.7.1 Authentic / Performance Assessment

Authentic assessment (Grant Wiggins, 1989) asks learners to perform real-world tasks. Examples: portfolio, exhibition, internship report, capstone project.

7.7.2 Portfolio Assessment

A curated collection of the learner’s work over time — drafts, finals, reflection. Captures growth, not snapshot. Used in NEP-2020 holistic assessment and PARAKH.

7.7.3 Rubrics

A scoring tool with criteria × performance levels in a matrix.

TipTwo Types of Rubric
  • Holistic rubric — single overall score; fast.
  • Analytic rubric — separate score for each criterion; diagnostic.

7.7.4 Open-Book Examination

Learner brings prescribed reference material. Tests application and analysis, not recall. NEP-2020 endorses open-book in HE.

7.7.5 Continuous and Comprehensive Evaluation (CCE)

CCE treats evaluation as ongoing throughout the year, covering both scholastic and co-scholastic domains. Introduced by NCF 2005; embedded in CBCS internal-assessment design.

7.7.6 Outcome-Based Education (OBE) and Bloom-Anderson Verb Use

OBE (William Spady, 1994) defines course outcomes (COs) and programme outcomes (POs), written with measurable Bloom verbs, then aligns curriculum and assessment to them. Used by NAAC, NBA (engineering), and the Washington Accord (engineering global mobility).

TipOBE Alignment Triangle

Outcomes ↔︎ Teaching-Learning ↔︎ Assessment must all align (the constructive-alignment principle of John Biggs, 1996).

7.7.7 Online Proctoring, AI in Evaluation

TipAI-Era Evaluation Innovations
  • AI-proctored exams — face detection, gaze tracking, browser lockdown.
  • Automated essay scoring (AES) — e-Rater (ETS), IntelliMetric.
  • Adaptive practice systems — Khan Academy, ALEKS.
  • Learning analytics dashboards — predict at-risk learners.
  • Generative-AI in feedback — ChatGPT-assisted, formative feedback.
  • Blockchain credentialing — tamper-proof certificates (NAD: National Academic Depository).

7.7.8 360° / Peer / Self-Assessment

Peer assessment: peers grade each other (used in MOOC essays). Self-assessment: learner judges own work against a rubric. 360° combines self + peer + teacher + external.

7.7.9 Concept Mapping and Mind Mapping

Concept mapping (Joseph Novak, 1972, based on Ausubel’s subsumption theory) is now used as an evaluation tool — to surface a learner’s mental schema. Mind mapping (Tony Buzan, 1974) is its single-centre cousin.

7.8 Common Threats to Evaluation Quality

TipFive Common Threats
Threat What goes wrong
Halo effect One impression colours rating of unrelated traits
Leniency / Severity Rater consistently too generous or too harsh
Central tendency Rater clusters everyone in the middle
Order / sequence bias Position of paper in pile affects mark
Cultural / language bias Items disadvantage a subgroup

Defences: double-blind grading, multiple raters, moderation, rubrics, item analysis, IRT calibration.

7.9 How the Pieces Fit Together

flowchart LR
  O[Learning Outcome<br/>3 domains · Bloom verbs] --> P[Plan<br/>Tyler's 4 elements]
  P --> T[Teach<br/>Method · Support · Environment]
  T --> A[Assess<br/>Formative · Summative<br/>NRT · CRT · CBT]
  A --> R{Reform}
  R -.-> O
  R -.-> P
  R -.-> T
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

Each arrow is a deliberate design decision; the dotted loop is the evaluation-into-improvement cycle that distinguishes a professional teacher from a content deliverer.

7.10 Theory Anchors at a Glance

TipPersons, Years and Key Ideas
Person Year Contribution PYQ hook
Ralph W. Tyler 1949 Curriculum evaluation; 4 elements “Tyler model”
Benjamin S. Bloom 1956 / 1968 Taxonomy (cognitive); Mastery Learning Verbs · OBE root
Anderson & Krathwohl 2001 Revised Bloom (Remember → Create) Verb shift
D.R. Krathwohl 1964 Affective domain (Receiving → Characterising) 5 levels
E.J. Simpson 1972 Psychomotor (Perception → Origination) 7 levels
Michael Scriven 1967 Formative vs Summative Distinction
Robert Glaser 1963 NRT vs CRT Reference distinction
Grant Wiggins 1989 Authentic assessment Performance tasks
John Biggs 1996 Constructive Alignment OBE alignment
William Spady 1994 Outcome-Based Education (OBE) NAAC/NBA root
Joseph Novak 1972 Concept mapping Schema visualisation
Tony Buzan 1974 Mind mapping Note-taking innovation
UGC 2015 CBCS framework, 10-point grade Indian implementation
UGC / NTA / NEP 2017–22 NTA (2017), ABC (2021), NCrF (2022), PARAKH (2023) National reforms

7.11 Practice Questions

Q 01 Three Layers Easy

Which one of the following is the BROADEST in scope?

  • AMeasurement
  • BAssessment
  • CEvaluation
  • DTesting
View solution
Correct Option: C
Evaluation is the broadest — it judges worth and makes decisions. Assessment gathers evidence; measurement assigns a number; testing is one tool.
Q 02 Scriven Medium

The distinction between FORMATIVE and SUMMATIVE evaluation was given by:

  • ABloom
  • BMichael Scriven
  • CRobert Glaser
  • DRalph Tyler
View solution
Correct Option: B
Michael Scriven, The Methodology of Evaluation, 1967. Formative = for improvement; summative = for judgement.
Q 03 NRT vs CRT Medium

In a norm-referenced test (NRT), the learner's performance is compared with:

  • AA pre-defined criterion
  • BThe learner's own previous score
  • COther learners in the group
  • DAn expert standard
View solution
Correct Option: C
NRT ranks against other learners (e.g., JEE, NEET). CRT compares to a fixed standard (e.g., driving licence). Both terms by Robert Glaser, 1963.
Q 04 Types Medium

A teacher gives a short quiz at the end of each topic to identify what learners have not yet mastered. This is BEST described as:

  • APlacement evaluation
  • BDiagnostic evaluation
  • CFormative evaluation
  • DSummative evaluation
View solution
Correct Option: C
Formative evaluation happens during instruction to improve learning. Diagnostic = identify root cause of weakness; placement = pre-instruction; summative = end-of-instruction.
Q 05 Tool Quality Medium

"A test gives the same result when administered twice to the same group." This describes the test's:

  • AValidity
  • BReliability
  • CObjectivity
  • DPracticability
View solution
Correct Option: B
Reliability = consistency across occasions/raters. Validity = measures what it claims.
Q 06 CBCS Credits Medium

Under UGC's CBCS, ONE theory credit equals:

  • A1 hour of lecture per week for one semester
  • B2 hours of lecture per week for one semester
  • C3 hours of lecture per week for one semester
  • D15 hours of lecture per semester
View solution
Correct Option: A
1 theory credit = 1 hour of lecture per week for one semester. (1 tutorial credit = 1 hour/week; 1 practical credit = 2 hours/week.)
Q 07 CBCS Grade Hard

In UGC's 10-point CBCS grade scale, the letter grade "O" stands for "Outstanding" and carries a grade point of:

  • A8
  • B9
  • C10
  • D12
View solution
Correct Option: C
O (Outstanding) = 10; A+ (Excellent) = 9; A (Very Good) = 8; B+ = 7; B = 6; C = 5; P (Pass) = 4; F (Fail) = 0.
Q 08 CBCS Types Medium

Under CBCS, "AECC" stands for:

  • AAdvanced Elective Choice Course
  • BAbility Enhancement Compulsory Course
  • CAcademic Excellence Credit Course
  • DApplied Engineering Core Curriculum
View solution
Correct Option: B
AECC = Ability Enhancement Compulsory Course (English, Hindi, Environmental Studies). Other electives: DSE, GE, SEC.
Q 09 ABC Medium

The Academic Bank of Credits (ABC), notified by UGC in 2021, primarily enables:

  • AStudent loans for HE
  • BStorage and transfer of academic credits across HEIs
  • CSalary payment for faculty
  • DResearch grants from DST
View solution
Correct Option: B
ABC is a digital bank that stores and lets learners transfer academic credits across HEIs — enabling NEP's Multiple Entry / Multiple Exit.
Q 10 PARAKH Hard

PARAKH, the national assessment centre proposed under NEP 2020 and set up in 2023, falls under:

  • AUGC
  • BAICTE
  • CNCERT
  • DCBSE
View solution
Correct Option: C
PARAKH (Performance Assessment, Review, and Analysis of Knowledge for Holistic Development) is under NCERT.
Q 11 CBT Types Medium

In Computer-Adaptive Testing (CAT), the difficulty of the next item depends on:

  • AThe candidate's prior school marks
  • BThe candidate's running performance on previous items
  • CRandom selection from a fixed bank
  • DThe teacher's manual override
View solution
Correct Option: B
CAT adjusts item difficulty in real time using Item Response Theory (IRT). Used by GRE, GMAT.
Q 12 NTA Easy

The National Testing Agency (NTA), which conducts UGC-NET, JEE Main, NEET-UG and CUET as CBTs, was established in:

  • A2015
  • B2017
  • C2019
  • D2020
View solution
Correct Option: B
NTA was approved in 2017 by the Union Cabinet; began operations 2018.
Q 13 OBE Medium

Outcome-Based Education (OBE), used by NBA and Washington Accord, was systematised by:

  • AWilliam Spady
  • BBenjamin Bloom
  • CHoward Gardner
  • DRobert Mager
View solution
Correct Option: A
William Spady, Outcome-Based Education: Critical Issues and Answers, 1994. Defines COs/POs with measurable Bloom verbs.
Q 14 Constructive Alignment Hard

"Outcomes, teaching-learning activity, and assessment must all align." This principle of constructive alignment is by:

  • AWilliam Spady
  • BJohn Biggs
  • CRalph Tyler
  • DRobert Glaser
View solution
Correct Option: B
John Biggs, "Enhancing teaching through constructive alignment", Higher Education, 1996.
Q 15 Rubric Medium

A rubric that gives ONE overall score, rather than separate scores for criteria, is called:

  • AAnalytic rubric
  • BHolistic rubric
  • CGeneric rubric
  • DAnchor rubric
View solution
Correct Option: B
Holistic = single overall score, fast but less diagnostic. Analytic = separate score per criterion, slower but more informative.
Q 16 Authentic Medium

"Asking learners to perform real-world tasks (portfolio, project, exhibition)" is the central idea of:

  • ANorm-referenced testing
  • BAuthentic / Performance assessment
  • CDiagnostic testing
  • DAchievement testing
View solution
Correct Option: B
Authentic assessment (Grant Wiggins, 1989) — real-world tasks rather than de-contextualised items.
Q 17 Bias Hard

A rater forms an overall favourable impression and consequently grades all separate criteria (content, structure, citations) higher. This is:

  • ACentral tendency error
  • BHalo effect
  • CSeverity error
  • DOrder bias
View solution
Correct Option: B
Halo effect = one impression colours unrelated ratings. Coined by Thorndike (1920).
Q 18 Tyler Medium

"Father of curriculum evaluation", who in 1949 proposed a four-element model of objectives, learning experiences, evaluation, and reform, is:

  • AB.S. Bloom
  • BR.W. Tyler
  • CHilda Taba
  • DD.R. Krathwohl
View solution
Correct Option: B
Ralph W. Tyler, Basic Principles of Curriculum and Instruction, 1949.
Q 19 Sequence Hard

Arrange the four types of evaluation in their typical temporal sequence in a course:

(i) Diagnostic   (ii) Placement   (iii) Summative   (iv) Formative

  • A(ii) → (i) → (iv) → (iii)
  • B(i) → (ii) → (iii) → (iv)
  • C(iv) → (i) → (ii) → (iii)
  • D(iii) → (iv) → (ii) → (i)
View solution
Correct Option: A
Sequence: Placement (before) → Diagnostic (before/during) → Formative (during) → Summative (end).
Q 20 SGPA Hard

A student earns the following in one semester: Course X (3 credits, grade point 8), Course Y (4 credits, grade point 9), Course Z (3 credits, grade point 7). The SGPA is:

  • A7.5
  • B8.0
  • C8.1
  • D8.5
View solution
Correct Option: C
SGPA = (3×8 + 4×9 + 3×7) / (3+4+3) = (24 + 36 + 21) / 10 = 81/10 = 8.1.

7.12 Quick Recall

ImportantQuick recall
  • Three layers: Measurement (number) → Assessment (description) → Evaluation (decision).
  • Three Bloom domains: Cognitive (Bloom 1956 / Anderson-Krathwohl 2001), Affective (Krathwohl 1964 — Receiving → Characterising), Psychomotor (Simpson 1972 — Perception → Origination).
  • Tyler 4 elements (1949): Objectives → Learning experiences → Evaluation → Reform. “Father of curriculum evaluation.”
  • 6 qualities of a tool: Validity · Reliability · Objectivity · Practicability · Discrimination · Comprehensiveness.
  • Reliability methods: test-retest, parallel-form, split-half, KR-20/21, Cronbach’s α.
  • Scriven (1967): Formative (improve) vs Summative (judge).
  • Glaser (1963): NRT vs CRT.
  • 4 types by purpose: Placement → Diagnostic → Formative → Summative.
  • CBCS (UGC 2015): 1 theory credit = 1 hr/wk; 10-point scale (O 10 · A+ 9 · A 8 · B+ 7 · B 6 · C 5 · P 4 · F 0). Course types: CC · DSE · GE · SEC · AECC.
  • SGPA = Σ(Cᵢ × Gᵢ) / Σ Cᵢ.
  • NEP-2020 reforms: ABC (2021), NCrF (2022), PARAKH (2023, NCERT), FYUP with research, MEME (Cert/Dip/Bach/Bach-Honours).
  • NTA (2017): conducts UGC-NET, JEE Main, NEET, CUET as CBTs.
  • CBT types: Linear · LOFT · CAT (using IRT).
  • IRT models: 1-PL Rasch (difficulty) → 2-PL (+ discrimination) → 3-PL (+ guessing) → 4-PL (+ carelessness).
  • OBE (Spady 1994): COs/POs, measurable Bloom verbs. Constructive Alignment (Biggs 1996): Outcomes ↔︎ Teaching ↔︎ Assessment.
  • Authentic assessment (Wiggins 1989): real-world tasks.
  • Rubric types: Holistic (one score) vs Analytic (per criterion).
  • Rater bias: Halo, Leniency/Severity, Central tendency, Order, Cultural.
  • Concept mapping (Novak 1972, based on Ausubel) vs Mind mapping (Buzan 1974).