Quality criteria for good exam questions

When does an assessment have good quality? The terms validity and reliability are important concepts in identifying the quality of an assessment. However, there is a whole world behind these terms. We will limit ourselves in this blog to the greatest common denominator. For more information on the concepts of reliability and validity, we refer to the explanation of Sluiter, Hemker, and Eggen (2018).

Reliable and valid assessment

An assessment is reliable if its result is not based on chance. This is the case, for example, if there are disturbing factors during the administration or if there are errors in questions or in their assessment. A valid test measures what it is supposed to measure and is consistent with the purpose for which the test results are used. An example of the application of this criterion is test questions that are in line with specifically formulated learning or test objectives and with the level at which the test is administered. Complaints about the validity of a test are often related to the meaning stakeholders attribute to the results of a test.

Examination questions that meet quality criteria

In short: good quality testing means that a candidate who actually masters the subject matter passes and that stakeholders have the perception that the testing is fair. With exam questions that meet quality criteria, there are usually big gains to be made. Moreover, meeting these criteria is usually quite easy to achieve. Awareness of these criteria and the ability to apply them is then important. The testing criteria used relate to:

  1. Relevance;
  2. Objectivity;
  3. Specificity;
  4. Efficiency.

Relevance (1)

Does the question belong to the learning and testing objectives in the examination program? Does it concern relevant knowledge or is it about such details that either nobody can know or nobody will ever use this knowledge? The question must be about subject matter that will be of use to a professional. An example: In an exam on product knowledge in the food trade, it may be less important to know the nutritional value of peanut butter by heart. After all, you can read about this on the label.

Objectivity (2)

Is the right answer always right with these questions or are there also situations imaginable in which the ‘right’ answer is in fact not correct? Can other answers also be counted as correct? An objective question usually does not lead to discussion. See the example below of a non-objective question.

What are the colors of the Dutch flag? Tick all the correct answers.

  1. Red
  2. Blue
  3. White
  4. Orange

Correct answers: A, B and C

The question is whether answer D, orange, should also not be counted correctly. Suppose the flag has a pennant, then it is orange. Answer D may not be the most relevant answer, but it is not really wrong either. In any case, the question may lead to discussion.

Specificity (3)

A question should be specific enough that someone who has mastered the material should be able to answer the question correctly and someone who has not mastered the material should not. Thus, a specific question distinguishes between “good” and “bad” candidates. See below for an example of a non-specific (open-ended) question.

Describe the leadership styles of a widely used management theory.

Answer: Hersey and Blanchard’s theory describes four styles.


  • Delegate: leaving tasks to employees, little direction and little support;
    Support, consult: helping employees, little direction;
    Persuade, motivate: lots of task-oriented direction and lots of support;
    Instruct, instruct: much direction, little support.

Other answers at the discretion of the proofreader.

The problem with this question is that it is not very focused; there are several management theories and models that are often applied. In addition, there is no mention of what requirements the description must meet. In this way, there are very many answers that should be counted correctly.

Efficiency (4)

To meet the criterion of efficiency, it is important to limit the information in the question to only that which is necessary to answer the question. An example of what we often see is that a case text contains the entire article from a daily newspaper as background information. The advice then is to include such information not in the exam but in the lesson material. Another example is that by including a double negation, the candidate has to read the question several times to understand it properly. Denials would be best made bold or italicized so they attract attention. In fact, language errors and complicated language constructions fall under the efficiency criterion.


In a good test, it is important that a candidate who has mastered the subject matter passes and that all involved have the perception that the test is fair. The above quality criteria of exam questions help to achieve this goal. Want to know more? Stay informed about developments around (digital) testing and Optimum Assessment and follow us on LinkedIn.


