A Good Test

Posted December 31, 2006

I recently finished up a semester at BYU. The end of another semester brings with it that joyful tradition, finals. Going through six finals gave me cause to think quite a bit about tests; what are they for and why are they useful?

I came up with three ways to measure the quality of a test:

Does it measure something valuable?
Is the test both reliable and valid?
Does it provide the correct incentives?

Does it measure something valuable? This depends on the context of the test — a juggling test is valuable for a circus school, a gum chewing test is valuable to aspiring gum gurus, etc. The important questions is, does the test measure one’s understanding of or skill in the valuable part(s) of a subject matter.

Is the test both reliable and valid? Reliable means the test is consistently administered and graded for all test takers. Valid means the test measures what it intends to. A juggling test which asks a student to hold two balls but not throw them would be a reliable test — all test takers could be measured easily the same — but wouldn’t be valid as being able to hold two balls doesn’t show that one can in fact juggle balls. A professor can prepare a test that measures all students fairly but wouldn’t test the students on anything valuable (as defined above).

Does it provide the correct incentives? I haven’t worked this measure out satisfactorily in my mind yet but it’s something like the following. People’s behavior is shaped by incentives. Because tests are a major part of a class grade and most students care about grades, tests motivate students to study more then they would without the incentive. So far this is fine and dandy. I for one appreciate tests for this very reason as I know I’ve studied more, and learned more, over the years because of tests.

So why can this go wrong? Students go to school to learn, presumably. There is of course the old yarn that education is the only thing that Americans pay for and hope to be cheated. Many students it seems care more about the grade then for learning. Because learning is hard, there is a strong incentive for students to try to ‘game’ the test. Some types of tests can be passed by cramming the night before the tests. Tests that only require you to memorize material don’t promote true learning. So this poorer sort of test would provide the wrong incentive to students in that it pushes them toward the poorer sort of learning. A good test would push students toward the richer, and harder, sort of learning.

Now what types of tests provide the correct or wrong incentives I’m not quite sure. As I said, I haven’t completely worked this measure out. I’m not even sure if it is a correct measure. If at some point I can work up more interest in this topic to clarify my thinking, I’ll write another post. Or someone else can save me the work and write it themselves (or link to a page in the comments section).

Tagged with education | learning

Kyle's profile pic Kyle Mathews lives and works in Salt Lake City building useful things. You should follow him on Twitter. Co-founder at Electric.