Quality criteria for good exam questions

Quality criteria for good exam questions

When does an assessment have good quality? The terms validity and reliability are important concepts in identifying the quality of an assessment. However, there is a whole world behind these terms. We will limit ourselves in this blog to the greatest common denominator. For more information on the concepts of reliability and validity, we refer to the explanation of Sluiter, Hemker, and Eggen (2018).

Reliable and valid assessment

Assessments are reliable when the results are not based on chance. This means that the exam should not be affected by external factors during administration, nor should there be errors in the questions or their assessment. A valid test measures what it is supposed to measure and is consistent with the purpose for which the test results are used. An example of the application of this criterion is test questions that are in line with specifically formulated learning or test objectives and with the level at which the test is administered.  Stakeholders’ concerns about the validity of a test often relate to their interpretation of the test results.

Criteria for exam questions of high quality

In short, high-quality testing means that candidates who have mastered the subject matter should pass, and stakeholders perceive the test as fair. Exam questions that meet the quality criteria can greatly enhance the assessment process, and meeting these criteria is usually straightforward. Awareness and application of these criteria are crucial. The testing criteria used are as follows:

  1. Relevance;
  2. Objectivity;
  3. Specificity;
  4. Efficiency.

Relevance (1)

Does the question belong to the learning and testing objectives in the examination program? Does it concern relevant knowledge or is it about such details that either nobody can know or nobody will ever use this knowledge? The question must be about subject matter that will be of use to a professional. An example: In an exam on product knowledge in the food trade, it may be less important to know the nutritional value of peanut butter by heart. After all, you can read about this on the label.

Objectivity (2)

Is the right answer always right with these questions or are there also situations imaginable in which the ‘right’ answer is in fact not correct? Can other answers also be counted as correct? An objective question usually does not lead to discussion. See the example below of a non-objective question.

What are the colors of the Dutch flag? Tick all the correct answers.

  1. Red
  2. Blue
  3. White
  4. Orange

Correct answers: A, B and C

The question is whether answer D, orange, should also not be counted correctly. Suppose the flag has a pennant, then it is orange. Answer D may not be the most relevant answer, but it is not really wrong either. In any case, the question may lead to discussion.

Specificity (3)

A question should be specific enough that someone who has mastered the material should be able to answer the question correctly and someone who has not mastered the material should not. Thus, a specific question distinguishes between “good” and “bad” candidates. See below for an example of a non-specific (open-ended) question.

Describe the leadership styles of a widely used management theory.

Answer: Hersey and Blanchard’s theory describes four styles.


  • Delegate: leaving tasks to employees, little direction and little support;
  • Support, consult: helping employees, little direction;
  • Persuade, motivate: lots of task-oriented direction and lots of support;
  • Instruct: much direction, little support.

Other answers at the discretion of the proofreader.

The problem with this question is that it is not very focused; there are several management theories and models that are often applied. In addition, there is no mention of what requirements the description must meet. In this way, there are very many answers that should be counted correctly.

Efficiency (4)

To meet the criterion of efficiency, it is important to limit the information in the question to only that which is necessary to answer the question. An example of what we often see is that a case text contains the entire article from a daily newspaper as background information. The advice then is to include such information not in the exam but in the lesson material. Another example is that by including a double negation, the candidate has to read the question several times to understand it properly. Denials would be best made bold or italicized so they attract attention. In fact, language errors and complicated language constructions fall under the efficiency criterion.


In a good test, it is important that a candidate who has mastered the subject matter passes and that all involved have the perception that the test is fair. The above quality criteria of exam questions help to achieve this goal. Want to know more? Stay informed about developments around (digital) testing and Optimum Assessment and follow us on LinkedIn.


Want to know more? Fill in the contact form below:

  • This field is for validation purposes and should be left unchanged.