Hiring Using Psychometric Instruments

Add bookmark

Ernie Stark and Jennifer Murnane
08/26/2008

Recent years have witnessed a plethora of instruments from vendors heralding their ability to assist firms in selecting the right applicant for the job and creating that much sought-after human capital. These vendors proclaim that their instruments measure human characteristics across a broad spectrum ranging from honesty to communication style to leadership ability. Their prevalence in the marketplace has become such that one author of this article has, on more than one occasion, cautioned human resources professionals against being beguiled by "silver-tongued devils" whose sole intent is to sell something to an ill-informed and unsuspecting customer.

Why such a strong caution? First, an overwhelming proportion of these instruments are psychometric in nature. That is, they attempt to measure psychological characteristics of potential employees. Many of us lack the expertise required to evaluate the properties of such instruments on our own and too often defer to the appealing claims found in the sales literature. Second, such decisions frequently work against the talent management goals and objectives of a firm, or even worse, places a firm in a legally vulnerable position. Here are some important considerations when making a decision about purchasing a psychometric instrument to be used in personnel selection.

1. Beware an over-abundance of subjective "testimonials" from satisfied customers in the sales pitch from the vendor and a glaring absence of hard, objective data. If purchase of an instrument is to prove ultimately valuable in personnel selection, it must demonstrate an ability to predict job performance beyond what one could expect if selecting an applicant for a job by simply drawing a name out of a fish bowl containing the names of all the other applicants. This means that the vendor must produce evidence of an independently conducted study (someone other than the vendor), empirically validating the instrument’s properties (that is, producing evidence that it truly measures what it purports to measure) and its ability to predict who will be successful once placed into a job. The vendor should be able to direct you to such independently conducted validation studies, if there are any. In fact, several validity studies in differing situations are preferred over a single study. Look beyond the "trusting relationship" and ask the vendor: "Where’s the beef?"

2. Beware the vendor touting the results of a "concurrent validation study" in which scores from the vendor’s instrument correctly distinguished good performers from poor performers using existing performance evaluations. There are numerous pitfalls lurking here. First, far too many performance evaluation scores are produced by measures not based on the results of a systematic job analysis process. As such, performance evaluation measures are too frequently deficient in that they do not measure behaviors directly linked to successful job performance or they are contaminated in that they measure behaviors and characteristics that are irrelevant to successful job performance. Second, one cannot rely on the fact that all managers will consistently administer performance evaluations. A disparate level of training in this area can lead to a wide range of rater errors attributable to human bias.

Many concurrent validation studies are often touted as proof of the product’s employee performance predictive ability. However, any relationship between an employee’s score on the instrument and the results of his/her performance evaluation is likely more reflective of the rapport between the employee and his/her immediate supervisor than evidence of a relationship with actual job performance.

3. Beware a vendor’s claim that the results of a study conducted in another company at another point in time could easily be expected by your firm should you purchase the instrument. It is illogical to assume that jobs are similarly defined across all firms or that the properties of performance evaluation measures are standard across all companies. This is akin to the oft-cited problem of comparing apples and oranges. Because the characteristics of a sample used in an earlier study may not at all represent the characteristics of the jobs and employees for whom you wish to use scores from the vendor’s instrument, you should not be as naive as to expect the same results, but you should consider the probability of eventually hiring a workforce different from what was intended.

4. Beware a vendor’s use of the term "statistical significance." Touting that a test of an instrument’s ability to predict job performance was statistically significant (typically done at the .05 level) indicates the level of certainty as to the correlation between the average instrument score and job performance ratings. The larger the number of employees in the sample studies the more likely it is to show statistical significance. One should pay more attention to the effect size, which is an indication of the magnitude of the relational outcome between job performance scores and scores produced by the instrument. In a sample of 1,000 employees, a correlation of 0.05 between scores on a selection instrument and current job performance (keep in mind that correlations range from -1, through 0, to +1) could be statistically significant, but the scores would explain less than one-tenth of 1 percent of variation in job performance. Request the vendor’s estimate of effect size implied from the studies rather than falling for the "statistical significance" trap.

5. Finally, beware of vendors reporting studies that demonstrate a statistically significant relationship between scores on a selection instrument and job performance but failing to investigate for "differential validity." This is an important factor in assuring that the sample used compensates for differences in the significance of correlations across racial, ethnic and gender groupings within the sample. Unfortunately, it is possible for scores on a selection instrument to demonstrate statistically significant correlations with performance evaluation scores across all members in a sample, but at the same time demonstrate a non-significant correlation for members of protected classes within that sample. Such a situation is termed "differential validity" because the predictive validity of the test scores differs across groupings in the sample. Using scores from instruments that produce differential validity to make selection decisions can quickly produce disparate impact in employment practices because the test scores are predicting group membership rather than job performance. Evidence of disparate impact in employment practices constitutes a prima fascia of discrimination when such practices are challenged in court.

There are a lot of very valuable psychometric tools on the market to support selection and promotion needs, but remember that a tool is only as valuable as its unbiased claims. The next time the latest and greatest selection tool is presented, proceed with caution, do your research and ask for more information to support the vendor’s assertions. Your organization’s legal department will thank you for your prudence.