# Relationship between item difficulty and discrimination

Summary: We have developed a method of quantifying multiple-choice test items in an introductory physical science course in terms of the various tasks. The dependence of the item discrimination index (D) on the item difficulty index ( p), and the relationship of D and p to the phi coeffi- cient (ϕ) are delineated. The effect of item parameters (discrimination, difficulty, and level of guessing) on the item-fit statistic was investigated using simulated dichotomous data.

This test should not contribute heavily to the course grade, and it needs revision. This is the general form of the more commonly reported KR and can be applied to tests composed of items with different numbers of points given for different response alternatives. When coefficient alpha is applied to tests in which each item has only one correct answer and all correct answers are worth the same number of points, the resulting coefficient is identical to KR Further discussion of test reliability can be found in J.

Standard Error of Measurement The standard error of measurement is directly related to the reliability of the test.

Whereas the reliability of a test always varies between 0. For example, multiplying all test scores by a constant will multiply the standard error of measurement by that same constant, but will leave the reliability coefficient unchanged. A general rule of thumb to predict the amount of change which can be expected in individual test scores is to multiply the standard error of measurement by 1.

The smaller the standard error of measurement, the more accurate the measurement provided by the test. Further discussion of the standard error of measurement can be found in J. These test questions, which were power of discrimination.

Each of these examinations was discipline-based, were developed and vetted within the carried out at the end of a term that consisted of 12 to 16 departments that taught the respective disciplines, and weeks of teaching.

We included in this study the MCQ administered by the individual departments. However, all examination papers are now Each end-of-term examination covered different topics, multidisciplinary, with some integration across the grouped generally according to organ-systems and also disciplines, such as in the scenario-orientated SAQ. All included some foundational core topics.

However, some examination questions are now centrally vetted and the degree of overlap in the topics tested between one examination papers are administered by the Office of examination and another occurred. Scoring of MCQs Although some basic form of item analysis of the MCQ tests has been carried out routinely since the beginning of The MCQ paper contained 50 questions drawn from the the NIC, there has been no evidence that the data generated 4 major para-clinical disciplines — Pathology, Medical have been used to help develop or select subsequent MCQ Microbiology, Parasitology and Pharmacology — and could items.

The MCQ paper formed part paper? Have we maintained similar standards of MCQ tests of a 3-hour written paper and was to be completed in 75 from year to year? These are some of the questions we minutes. Each question consisted of a stem and 5 completing attempted to answer when auditing the MCQ of selected phrases, and students were required to categorise each of examination papers. However, there was no relationship between the item difficulty index and the item carrying over of negative marks from one question to discrimination index values in these MCQ tests.

Thus, the maximum total score for any one question was 5 marks while the minimum total score was 0 and not Materials and Methods -5 marks. In this study, the item difficulty index P refers to the central vetting committee, which consisted of mostly senior percentage of the total number of correct responses to the academic staff representing each department concerned test item.

The final is the number of correct responses and T is the total number selection of the MCQ items for an examination paper was of responses i. After the final vetting by the index.

The item discrimination index Dhowever, central committee, the selected MCQ items were formatted measures the difference between the percentage of students by the Office of the Dean for the examination. The higher relationship between the item discrimination index and the discrimination index, the better the item can determine difficulty index values for each test paper was determined the difference, i.

A distractor analysis addresses the performance of these incorrect response options.

### Understanding Item Analyses | Office of Educational Assessment

Just as the key, or correct response option, must be definitively correct, the distractors must be clearly incorrect or clearly not the "best" option. In addition to being clearly incorrect, the distractors must also be plausible. That is, the distractors should seem likely or reasonable to an examinee who is not sufficiently knowledgeable in the content area. If a distractor appears so unlikely that almost no examinee will select it, it is not contributing to the performance of the item.

## Understanding Item Analyses

In fact, the presence of one or more implausible distractors in a multiple choice item can make the item artificially far easier than it ought to be. In a simple approach to distractor analysis, the proportion of examinees who selected each of the response options is examined.

For the key, this proportion is equivalent to the item p-value, or difficulty. If the proportions are summed across all of an item's response options they will add up to 1. The proportion of examinees who select each of the distractors can be very informative.

For example, it can reveal an item mis-key. Whenever the proportion of examinees who selected a distractor is greater than the proportion of examinees who selected the key, the item should be examined to determine if it has been mis-keyed or double-keyed.

A distractor analysis can also reveal an implausible distractor. In CRTs, where the item p-values are typically high, the proportions of examinees selecting all the distractors are, as a result, low.

Nevertheless, if examinees consistently fail to select a given distractor, this may be evidence that the distractor is implausible or simply too easy. Item Review Once the item analysis data are available, it is useful to hold a meeting of test developers, psychometriciansand subject matter experts. During this meeting the items can be reviewed using the information provided by the item analysis statistics.

Decisions can then be made about item changes that are needed or even items that ought to be dropped from the exam.