Organizations that develop and deliver computerized tests encounter many of the same challenges as those working with traditional paper and pencil tests, including issues relating to security, psychometric editing and legal defensibility. New issues also arise with computer-based testing (CBT), specifically as tests are more widely administered via the Internet. These issues include an increased risk of candidate cheating and item overexposure.
To address the ongoing challenges presented by testing in general as well as some of the new issues surrounding CBT, organizations need to follow standard processes for test item development and psychometric editing. For example, using multiple item writers to develop test content is a common practice, but may lead to variation in test item style, format and difficulty. A style guide with templates and item development standards and rules can go a long way in improving item consistency, format and variety. In addition, content development training can ensure that writers have the tools to develop credible, defensible items and item templates that can be used to create different variations of the same question, thereby increasing the size of the item bank in a shorter amount of time.
The statistical evaluation of test items in the field enables organizations to obtain feedback on specific item performance and cognitive levels. This intelligence allows the revision of item development processes and feedback for specific test item developers – helping determine what's effective and how the items fare in the field. This also enables the organization to make decisions on item retention, modification and assignment.
Any organization developing or administering tests should be conscious of the psychometric editing process – one that includes the evaluation of item difficulty levels and takes things such as grammar, sensitivity and style into account. Psychometrics also provide for the review of test item form and function, such as parallel options, sufficient information to answer the question and answer length.
With the importance placed on objectivity, psychometric editing is best performed by test development professionals, not subject matter experts or item writers. Individuals trained in the complexity of psychometric editing evaluate items in a different, critical light than subject-matter experts or item writers. It is important, however, to also have review and approval of the final, edited item by subject-matter experts in the appropriate field.
Items developed for CBT and PPT must be legally defensible to ensure protection in the event of a legal challenge. To ensure legal defensibility, organizations must implement a standard process for item development and psychometric review, as discussed above.
Evaluation of legal defensibility includes a critical review of the exam both from a content and psychometric perspective to ensure that the exam was developed according to the Standard for Educational and Psychological Testing. The courts defer to the Standards when evaluating the credibility of the exam in question. Legal defensibility can be accomplished via several methodologies. The most important aspect of the development process is to follow and document standardized methodologies and include appropriate test development personnel in the process. There are many different steps in the test development process and different methodologies that can be used for each step. For example, when determining the cutscore for an exam, processes such as the Modified Angoff or the Bookmark Method can be used to determine the appropriate standard for passing. Each of the methods uses a different technique to determine the bar that a candidate must reach in order to receive a passing status.
To mitigate the risk of item overexposure, testing companies develop large test item banks from which test content is routinely refreshed. Taking the lead from the large test developers and administrators, organizations administering computer-based tests will want to consider using expanded item banks and scheduled test item refreshment to ensure that candidates do not see the same items or designs, generally decreasing the likelihood of candidates sharing information.
In many high-stakes testing programs, test administrators collect and examine forensic data in order to measure how often testing candidates are exposed to particular test items, the average time candidates spend on items and how candidates responses to items change over time and exposure. This ensures the ongoing adjustment of the item development process and content to ensure credibility, legality and security.
There are also different methodologies that can be implemented analyzing the candidate results after the examination. One of those types of analyses is a differential item analysis, which evaluates group performance on test items (groups may be defined by gender, ethnicity or other factors). Items that perform significantly differently across groups of candidates are then re-evaluated to determine future use.
The multitude of factors to consider in developing content for computer-based testing all lend credibility and integrity to the exam itself. Organizations able to thoughtfully consider the design and implementation of their testing programs proactively fare better than organizations that migrate to computer-based testing in a hurry. A proactive approach that accounts for item development and editing resources as well as security and IT parameters serves the organization better over the long-haul, as it increases test validity, candidate fairness and offers a higher level of protection against legal challenges.