Core competency: Research and insights

What Room to Read's assessor training in Tanzania and Uganda taught me about the gap between evaluation design and on-the-ground truth, and why research, monitoring and evaluation staff must be present where learning happens.

When data meets real classrooms

May 06, 2026

Japheth Ouma Room To Read

Japheth Ogol
Manager, Literacy
Research, Monitoring & Evaluation

The best data strategies are designed in offices and perfected in classrooms.

The uncomfortable truth about tool design

In October 2024, during a Room to Read assessor training — the process of preparing data collectors to administer a learning assessment to children — for our Literacy Portfolio in Ubungo, Tanzania, what seemed to be a small confusion stopped us in our tracks.

A 6-year-old counted herself when filling in the box title: “Family members living with you.” One clarifying comment instantly corrected the child’s answer. But the moment revealed something significant: Our tools must speak the language of the communities they serve, not just linguistically, but contextually.

Had the question been more contextualized from the start — phrased as "Besides yourself, how many people live with you in your home?" or accompanied by a prompt like "Think about your mother, father, brothers, sisters... how many of them live with you?" — it would have naturally guided a child's thinking without requiring correction. A more contextually grounded question anticipates how a child understands and relates to their world, reducing the burden on data collectors to interpret or clarify, and ultimately producing more reliable data.

No desk review would have caught that. Only presence did.

Fast forward to February 2026 in Kilosa, Tanzania, the largest child assessment in Room to Read’s history. Eighty-nine assessors trained, 81 cleared for data collection with 99% average inter-rater reliability Agreement (how consistently different assessors score the same child), 90 schools over 10 days. The stakes were higher. And the lessons multiplied, each informing learning and improvement in our approach.

What you cannot learn from a protocol

During Room to Read’s September 2025 assessor training for our partner organization the Reading Association of Uganda, questions that arose from assessors present with children could not have been anticipated by a training manual:

“Can I clarify a question for a child in my own understanding?”

“What if a child asks to repeat a section?”

These aren’t footnotes; they are the line between reliable data and noise.

These questions are not minor procedural details to be handled later. They determine whether the data collected actually reflects what a child knows, or whether it is distorted by inconsistent administration. A question clarified differently by two assessors produces two different answers from the same child, making the data unreliable and incomparable across schools or regions.

In Kilosa, I watched a language and literacy trainer use songs to reinforce phonics mastery. This matters because children pronounce letters and sounds differently depending on their linguistic background, and an assessor who does not deeply understand the full range of valid pronunciations risks marking a correct response as wrong. The songs were a creative way to ensure every assessor understood not just what a sound looks like on paper, but its history and variations, so they could recognize it accurately in a real child's voice.

I saw new assessors shadow experienced colleagues before taking over. Building from concepts to tools; basics first, then the core activity produced faster mastery and less anxiety in the room. These are not practices a workplan specifies; only Room to Read's research, monitoring and evaluation team in the room can observe them, document them and pass them on.

Patterns that only appear on school visits

Day 1 Data Quality Assurance (DQA) in Kilosa showed an average inter-rater agreement of 99% — an excellent start. But three assessors were flagged for oral passage reading and reading comprehension scores below the threshold (95%). Sentence Dictation showed the most variability across the whole team.

These were not failure signals. They were feedback. We recommended a targeted refresher and closer supervision. That decision was made at 10 p.m., based on same-day data, and implemented the next morning. That is what field presence makes possible.

Train more assessors than you need. Attrition is real: Funerals, illness, unplanned government visits. Buffer for it.
Repeated practice with immediate feedback, not one-shot training, moves inter-rater reliability scores from 33% to 95%+. Assessors who practice repeatedly and get corrected in real time score children far more consistently than those who go through training only once.
Sentence Dictation consistently requires more training emphasis across contexts. Build that in explicitly.
Fatigue affects data quality. Assessors who travel furthest on Day 1 should get shorter routes on Day 2.
Plan for disruption. Heavy rains, school rescheduling, and government visits are not edge cases; they are the field.

When a baseline is not an answer

One of the most important conversations from my 2026 Dar es Salaam meetings was about baselines, the study conducted at the very start of a project to measure where things stand before any intervention begins — a “starting point” for comparison later. There is a deeply rooted habit in development organizations to default to “baseline first” at the start of any project. This reflex is understandable, but it can be misguided.

A baseline does not prove effectiveness. It cannot answer outcome or impact questions if the Program Implementation Manual (describing how a program is to be delivered), Results Framework (mapping what the program is trying to achieve) and indicator set (defining the specific, measurable signs of effectiveness) are not in place. Conducted without these, it generates data with no clear decision-use, and becomes an accountability trap rather than a learning tool.

The right question is: What decision are we trying to inform? That determines whether you need a multi-year Impact Evaluation with a baseline and endline, a one-time Outcome Evaluation, an in-depth Program Review, or a cross-sectional study to capture a snapshot at a single point in time. Design follows decision-use. Not habit. Not government demands. Not donor deadlines.

A call for presence

Research, monitoring and evaluation staff are not auditors. They are not fault-finders. They are learning architects and learning requires being present in the places where it happens.

In Kilosa, the consulting training team had distinct roles across technical, operations and logistics. The Room to Read research, monitoring and evaluation team and senior literacy facilitator collaboration was visible and valued. Government partners were engaged well enough that school practice exercises ran smoothly, even when used as pilot sites. That quality of coordination is built in person, over days, not managed from a distance.

From Tanzania to Uganda, every field visit taught me something that no dashboard, summary report, or shared tracker could have: The texture of how what happens when a real child sits down, a real assessor opens the tablet and real data begins.

“The journey from data collection to impact is not a pipeline. It is a conversation — one that begins in offices but must continue in the field, where the children are, where the learning is real, and where our tools meet their purpose.”

Learn more about our unique approach