Specific issues relevant to the Expert Witness, Procedural issues for Testing and Evaluation
Daubert and Fry: Neuropsychological Evaluation/General Principles:
- The battery evaluates the full range of neuropsychological functions dependent upon the brain
- The battery should include tests that relate generally to brain function and specific measurement of cortical areas of function.
- The battery should employ time honored methods of neuropsychology-methodology, norms, intra and inter comparisons allowing the neuropsychologist to rule in and to rule out possible diagnoses. Each test measure should be able to rule in the deficit area and to rule out the specific deficit area.
- The use of prior research and clinical patterns of test performance as well as a comparison of both hemispheres allows for this to occur.
- Each test is carefully evaluated as to its sensitivity to brain function. Sensitivity: Can the measure rule out those individuals who do not have deficits as well as identifying those individuals who do?
- Specificity: Does the measure rule out as well as rule in the specific problem area?
- Tests are used that assess the entire cortex in addition to equal presentation of the measurement of both hemispheres.
Daubert was broadened by the Kumoh Tire Co. case: This resulted in assigning court the gatekeeper role, responsible for making certain that expert testimony if it is to be allowed in court is based upon established theory or practice in the expert’s field.
Guidelines issued by the court for expert testimony to be allowed are as follows:
- Whether the expert will be testifying as to scientific knowledge.
- Whether the testimony will assist the truer of fact in determining the ultimate issue.
- Whether the scientific method has demonstrated validity or reliability.
Indicators of Reliability:
Can the method be tested, or has it been tested.
Has the method been subjected to peer review or publication, which aids in detecting flaws in the method?
- What is the known or potential rate of error?
- Are there established standards to control use of the method?
- Is the method generally accepted in the technical community?
Ziskin (1997) cites criteria regarding the following issues for forensic examination for neuropsychologists.
Rotgers and Barrett (1996) discuss the implications of Daubert for clinicians who testify as expert witnesses or complete forensic evaluations. It is noted that standards for educational and psychological testing were outlined by the APA in 1985. Standards include adequate test reliability and validity, the usage of technical manuals, the knowledge of the specific intent of test use, as well as specific guidelines for the scoring and reporting of test results and the rights of the test takers. Guidelines for the provision of services to ethnic, linguistic and/or culturally diverse populations were outlined in 1991. Specialty guidelines for Forensic Psychologists set forth by APA in 1991 provide an aspirational model for those psychologists who are regularly engaged as experts and who represent themselves as engaged in primarily activity serving the judicial system. There are guidelines regarding dual relationships, informed consent, potential conflicts of interest, non-use of contingency fees, methods and procedures, public and professional communications.
Guidelines for the evaluation of dementia and age-related cognitive decline issued by the APA in 1998 suggest the use of standardized psychological and neuropsychological test instruments, baseline testing for pre-morbid indications, an awareness of sources of variability and error in psychometric performance and the provision of constructive feedback, support and education to the patient as well as to their family.
The following four guidelines have been adopted by the APA, published in their journal, and are expected to be followed when completing forensic work:
- The Forensic Examiner Should Use Theoretically and Psychometrically Adequate Data Gathering Instruments.
Obsolete Test Instruments
It is suggested that when completing forensic evaluations, that only the most scientifically reliable and valid assessment tools available should be used. This would exclude the use of instruments that are obsolete when replaced by a revised or later edition such as the WMS-R which has been replaced by the WMS-III and the WAIS-R that has been replaced by the WAIS-III.
The usage of obsolete measures poses the risk that the norms being applied are based upon outdated normative samples and test measures that are no longer applicable to the times. If revisions are not done, supportive data on the test may become more and more questionable over time and/or inapplicable to the manner in which the test is currently being employed.
Flynn (1984) as referenced in Ziskin (1995) focused on the Wechsler and Stanford-Binet measures, raising questions, cautions and concerns about the use of measures normed at different periods of time. Flynn indicated that obsolete norms can present confounding variables, which remain unknown or unrecognized, misleading researchers to erroneous conclusions in hundreds of studies and/or testing profiles. Variability was cited in test scores (with discrepancies ranging as high as 5 to 10 IQ points) from the predecessor to the revised measure, which can make a considerable difference shifting an individual’s intellectual quotient from average to above or below average. Additionally, numerous problems were noted with the use of batteries that employed test norms or standardization samples from different years and/or times.
Binder and Thompson (1995) suggest that psychologists be careful in basing their conclusions and/or decisions upon tests or measures that are either obsolete or not useful for that current purpose. They stress the importance of knowing the appropriateness of the diagnostic testing procedures and process, the reliability and validity of the findings, the degree of specificity and sensitivity of each measure, and the impact of the above issues upon the degree of confidence in interpretation. Meticulous decisions ruling in and ruling our various variables such as developmental difficulties (learning disability, attentional disorders) psychological or emotional issues, use of medication, legal history and so on require the use of accurate test measurements.
- The Forensic Examiner Should Draw Conclusions Using Only Scientifically Validated Theoretical Positions.
Rotgers and Barrett (1996) maintain that data gathered in a forensic evaluation needs to be integrated into a theoretical accounting of events, providing an underlying premise that is based upon a scientifically valid theoretical construct. This becomes a critical issue on those occasions when the neuropsychological evaluation provide the only objective evidence of cognitive dysfunction.
Use of Measures Specifically Designed to Measure Brain Behavior Relationships
An example of the attempt to draw conclusions based upon an invalid theoretical construct is referenced in the usage of the MMPI to assert inferences about brain damage. The MMPI was not intended for this purpose and thus provides a decidedly inferior method when compared to neuropsychological measures. Similar is the usage of academic testing, achievement batteries and/or intellectual assessment as the sole basis for drawing inferences about brain damage. While prominent neuropsychologists have indicated the usage of information from the latter tests to provide further validation and/or confirmation of brain damage, such measures are clearly not intended for the purpose of ruling in or ruling out the presence of brain impairment.
Neuropsychological Evaluation versus Radiological Assessment
The purpose of neuropsychological testing and its usefulness becomes questionable in the face of clear brain impairment/damage noted by radiological procedures such as CT scan, MRI, SPECT, EEG and so. The role at that point appears to be more that of delineating those specific issues which impact the everyday functioning of the plaintiff in a detrimental manner. Ziskin (1995) references this issue, asking the question of why neuropsychological evaluation would even be suggested when there is already confirming evidence of brain damage. The role of the neuropsychologist, however is to define in a specific manner strengths, weaknesses, deficit areas and coping mechanisms being employed by the individual, to allow the implementation of a rehabilitation program targeted to their needs.
Unfortunately, it is not uncommon that a neuropsychologist will conclude that the plaintiff did not suffer any damage at all in the face of overwhelming documented radiological evidence of brain damage,. Neuropsychologists commonly participating in litigation, often representing the insurance companies and/or the defendant in the matter, tend to continually dispute the discrepancy between pre and post-morbid functioning. The actual damage documented at the time of injury via radiological evaluation is disputed as well with complete disregard of any opposing evidence to date, no matter how valid and/or profound in its measurement of severity.
Another controversy is the use of radiological procedures as the sole means to rule in or rule out damage to the brain. The idea that the EEG identifies seizure disorder is negated by Gummit (1995) who clearly indicates that the EEG alone neither proves or rules out epilepsy. It is noted that 1 to 2 percent of the normal population can evidence epileptiform discharges. The EEG alone is not seen as sufficient to localize the nature of the injury and the usage of other methods of evaluation to provide conclusive evidence is suggested. The QEEG is being debated once again as being a valid measure of injury to the brain this method has a history of being quite successful in the identification and promotion of the understanding mild head injury as well as other developmental delay disorders in addition to the psychiatric disorders.
Duration of Impaired Consciousness and Amnesia
Studies support the presence of mild head injury even when there has been only a brief period of loss of consciousness and/or hospitalization and the GCS score ranges from 13 to 15. There is a wealth of data to support the presence of persistent deficits one to two years post injury despite the absence of any loss of consciousness and/or radiological findings. Mild TBI classified as a GCS score of 13 to 15 is viewed as an alteration in mental status. There are increased risk factors such as skull fracture and/or seizure at the time of injury that is viewed as predictive of subsequent brain impairment.
Plasticity Theory and the Contention that Pediatric Injury provides a better prognosis.
Research is currently disputing the plasticity theory and the idea that the younger the child the more capable they are of being able to recover from early brain damage. Research is currently suggesting a process whereby there is a lack of initial deficit, which gradually worsens over time as children fail to develop the age appropriate developmental skill as quickly as their same age counterparts. Consequently early TBI is currently being viewed as actually having a worse outcome. The above remains consistent with the research that is emerging which identifies abnormal patterns of brain activity following head injury. There are specific transient reactions noted in response to traumatic brain injury, which comprises cellular changes at the site of injury and at nearby injury sites, phagocytic and astrocytic reactions at the border of injury, alteration of cerebrospinal fluid with the onset of edema or swelling, the unmonitored and altered development of undamaged cells and newly developed axon collaterals in the region of the damaged cells. The result of the above changes is the alteration of the general functioning and the overall size of the brain.
Frontal processes, which are generally not fully developed until the age of 10 to 12 years, would suggest that the true understanding of pediatric deficits cannot be seen until this period of time. As the child becomes increasingly proficient at compensating for deficit areas, pediatric head injury often appears to be in a state of recovery just prior to the onset of adolescence. Development appears erroneously equivalent to that of peers, only to witness the child suffer a tremendous decline as the damage to frontal processes becomes more apparent in their school and home functioning. However, the proficient pediatric neuropsychologist can detect the presence of frontal deficits in young children and in so doing predict problematic functioning in the future. There are test measures available for the elementary school child which can be used to determine symptoms of word retrieval, selective attention issues, perseveration, diminished creativity, learning problems and emotional dysregulation. Symptoms of suspiciousness and paranoia can be clinically seen in these children and are significant of the increasing emotional fragility, vulnerability and moodiness that occurs with diminished functioning of the executive reasoning area. Often frontal deficits will appear more behavioral in nature in the young child and these children tend to be misdiagnosed as suffering from a conduct disorder.
Miller (1997) identified a neurosensitization syndrome whereby symptoms are both subjective and objective, lasting well beyond the time period of post-injury resolution. Symptoms remain resistant to conventional medical and psychological remediation/treatment, developing as a result of a progressively enhanced sensitivity or reactivity of the brain at a neurophysiological, biochemical and intracellular level.
Ruling Out Other Diagnoses
Cognitive deficits, commonly affecting both attention and memory, can often be mis-represented as a pre-morbid attentional disorder, head injury or vice versa. Problems suggestive of selective attention, sensory gating and discriminative attention commonly tend to become disturbed after head injury resulting from changes occurring within the neurotransmitter messenger systems. Specifically the cholinergic system in the brain tends to become impacted following head injury. The diminished cholinergic projections are seen as potentially affecting the functioning of the hippocampus and its related systems.
Discriminating deficits following head injury as due to injury to the brain versus a pre-morbid disturbance of this same neurotransmitter system requires rather specific technical experience and skill in working with both populations comprised of these disorders. There are those few occasions that the two disorders overlap the test results to the degree that they cannot be differentiated. There are usually patterns present however, which allows the neuropsychologist to separate the cognitive effects of A.D.D. deficits from impairment due to head injury. Sleep endocrine alterations following head injury can appear similar to that of clinical depression, resulting in mis-diagnosis or under-diagnosis. Changes resulting in HPA overdrive and modulation of hypothalamic and pituitary receptors can lead to permanent sleep endocrine alternations, often seen with clinical depression, although the direct result of head injury.
Symptoms of Post Concussive Syndrome
Symptoms of Post Concussive Syndrome related to head injury and/or some type of brain impairment can present symptoms that are quite similar to that of Post Traumatic Stress Disorder. Symptoms need to be differentiated when observed following head injury that may instead be due to an accident or life threatening event. Symptoms of irritability, anger outbursts, difficulty sleeping, diminished attention and concentration, emotional liability depression and anxiety can be seen with both of the above syndromes. Pre- injury conditions such as leading disability, psychiatric syndromes, consequence of substance abuse and lifelong neurological conditions can become exacerbated with head injury thus creating co-founding variables that require distinction and separation.
Reliability of Evaluation with Children
Issues of pre-morbid syndromes, test administration variables affecting levels of attention and motivation are all seen as factors affecting the reliability of test results. Evaluation that has been used with adults may have questionable validity in its usage with children. Research has supported the reliability and validity of the Halstead Reitan Neuropsychological Test Battery in a population of children. A recent addition to the field of neuropsychological evaluation is the NEPSY, a battery of tests based upon ten years of research and measures specifically designed for children. This battery, using normative data from a ten year research study as well as nationwide norms, provides clear adaptation to the academic setting and as such offers a wealth of information regarding the child’s functioning that generalizes to their external environment.
- The Forensic Examiner Needs to Weigh and Qualify their Statements and Testimony on the Basis of the Adequate Theory and Empirical Research on the Questions Being Addressed.
The clinician is required to base his or her findings upon scientific studies and supportive literature reviews. The background research establishing the presence of injury is primarily with the use of the Halstead Reitan battery, specifically in the areas of mild head injury and exposure to toxins. The limitations for this battery are already documented and therefore clearly delineated, as opposed to some other type of battery and/or compilation of tests. When the clinician uses test batteries or measures whereby there is not an established theoretical construct they leave themselves open to questioning from opposing counsel. The expert is left with the obligation of providing a theoretical construct to explain why certain tests were picked. The clinician needs to be prepared to substantiate their decision for a specific testing regimen employed via an adequate decision tree approach, which is individualized and specific to the case in question. Smith (1994) cites the expert’s knowledge as paramount to evaluation, indicating the necessity of being highly versed in recent literature, able to differentiate the diverse types of brain insults and the factors that affect their outcome.
Mild Head Injury: Does it exist?
The concept of mild head injury as being truly head injured is a subject of great debate and much of the research supports symptoms abating within the first year. Recent research employing the Halstead Reitan Battery revealed significant deficits as being present and able to be adequately measured when symptoms of head injury persist over time. The defined mild head injured group approached the level of diffuse brain damage suggesting that symptoms which do not resolve themselves within the first year are the result of more long term issues involving disturbed brain functioning. The overall GNDS score was found to be sensitive to mild head injury and superior over individual test scores.
The mildly head-injured population has been identified as a rather diverse group who do sustain cerebral damage even when radiological findings post-injury are non-significant. It is not uncommon for radiological tests administered one year post-injury to show findings not seen at the time of injury. It is estimated that approximately 800,000 people in the United States suffer from a mild head injury each year. Mild head injury as a separate category can represent as much as 80% of the total head injured population seen in the emergency room. The subtle sequelae of this injury often creates a difficulty in identifying symptoms and attributing symptomotology seen post-injury to the head injury as a causal factor. In one recent retrospective study, two-thirds of patients identified with mild head injury were scanned and approximately one half of the patients scanned revealed intracranial abnormalities.
Mild head injury can often be overlooked and instead diagnosed as one of the pre-morbid attentional disorders as noted above. The common debate is that of organic versus psychogenic when experts attempt to explain psychological issues following head injury. The emotional distress following head injury that results in increasing moodiness, meeting the criteria of a mood disorder, is often the symptomotology that creates the appearance of malingering. Emotional symptoms can often be negated as malingering related to litigation, despite the presence of research which clearly documents the presence of mood disorders as following head injury in at least one-third of all head injured individuals. Research is beginning to cite dissociative symptoms following head injury, which are organic in origin and directly related to the injury. It is important for the expert to understand the concept that head injury alone can creates the emotional symptoms that can become so misleading.
- The Forensic Examiner Should be Able to Defend the Scientific Status of Their Data Gathering Methods.
The clinician should be familiar with the psychometric properties of the instruments as well as the strengths and limitations of such tests. Clinicians need to be familiar with the reliability and validity coefficients of each measure, the construct and criterion validity as well as the specific application based upon the standardization sample. Measures designed to measure the normal population and standardized on the normal population risk application to the brain impaired population and each measure used to rule in or rule out specific impairment should have available statistics regarding the predictive validity and true positive rate.
Anastasi (1982) presents an outline for test evaluation which includes a description regarding the purpose and nature of the test, its practical application, reliability and validity and reviewer comments in such books as the Mental Measurement Yearbook or the Compendium of Neuropsychological Tests.
Fixed versus Flexible Batteries
Spreen and Strauss (1998) in their recent edition of the Compendium of Neuropsychological Tests noted that only a small number of tests specifically designated for the neuropsychological use are actually available. Excluding fixed batteries such as the Halstead Reitan, many of the tests available have not been formally published, the reliability and validity data remains scattered in various research notations and administration instructions have become modified in the absence of an available published standardized format. Such findings would suggest the necessity to utilize a fixed battery to meet scientific standards under Daubert. The Halstead Reitan Neuropsychological Test Battery and the Luria-Nebraska Neuropsychological Test Battery are the two batteries most frequently referenced as standardized or fixed batteries. There are pros and cons to using the standardized versus the flexible battery including that of administration time, examination of a wide range versus specific set of functions, and the issues of validation, reliability and validity.
The more active group in the Guilmette survey (1990) reported a stronger preference indicated for the full H-R battery. Those trained in both batteries expressed a preference (four out of five) for the H-R battery over the Luria Nebraska battery. In the Sweet and Moberg (1990) for both the ABPP and non-ABPP neuropsychologists, the flexible battery was favored as the primary approach over the standardized or fixed battery. The flexible battery consisting of various measures, which fell into routine groupings to address specific disorders, however these tests were not standardized as an entire battery. The standardized battery offers correlation coefficients to compare the validity and reliability of each measure and statistical indices to address the strength of each measure in its ability to predict and rule out specific symptomotology for predictive validity.
Reitan references the theory of biological intelligence as the adaptive ability of the individual to function in their external environment that is operationalized in the Halstead Reitan battery, consisting of 13 tests from which four factors can be extracted. A theory is proposed of brain behavior relationships derived from detailed evaluation and research of thousands of brain injured subjects suffering from a variety of neurological issues. Specific to the H-R battery is the testing of the adequacy of the theory via prediction of the correct pathological condition. In this manner the battery meets the criteria for the evaluation of an individual’s cognitive functioning as related to the biological condition of the brain. Each test has been carefully validated to meet sensitivity requirements. The battery evaluates a full range of brain behavior functions, including tests that relate to the general as well as the specific areas of cortical function.
The battery employs the time honored methods of clinical neuropsychology (assessing the level of functioning of the individual as compared with their matched counterparts as well as employing inter-individual comparisons, the use of prior research and comparisons for both sides of the body). Each test in the battery represents the entire cerebral cortex and provides an equivalent representation of both cerebral hemispheres. The Neuropsychological Deficit Scale (NDS) provides scoring for each cerebral hemisphere as well as an overall general score for level of brain impairment.
For the above reasons, to date this is the singular battery that has been upheld under Daubert in the courtrooms. Ziskin (1995) in differentiating the fixed, standardized batteries versus the non-standard procedures, cited a case identifying the Halstead-Reitan Neuropsychological Test Battery as the only scientifically valid procedure that the court accepted under Daubert, noted that the battery is based upon medical, scientific evidence supporting its validity. The court focused on the methodology of the experts as opposed to their conclusions. The H-R battery has been cited as having a vast amount of research and validation supporting the viability of this standardized battery to accurately and adequately assess brain functioning.
Confounding Variables Present in Testing Situation
Examiners need to be aware and to be able to address confounding variables such as fatigue, lack of sleep, the use of caffeine, low blood sugar, prior alcohol intake, anxiety, depression and additional issues such as hearing loss and visual acuity. Recently, Crews (1999) in evaluating a depressed population did not find any reliable differences between the depressed and non-depressed population on neuropsychological test measures thus disputing the use of depression as a variable impacting neuropsychological test results. Given the disparity between these recent findings and past research citing, in addition to the fact that there are always several possible explanations for any findings, this research if anything dispels the forgone conclusion of the effects of depression upon test performance.
Research has not supported the link between migraine headaches, chronic pain and impaired cognitive functioning thus ruling out these issues as diagnostic reasons for the presence of impaired test results following head injury.
The use of normative data standardized for the general population clearly lacks validity with specific cultural or ethnic groups. The Halstead Reitan Impairment Index was found to represent an overall valid assessment for three racial/ethnic groups. Problems occurred primarily when neuropsychologists attempted to use subtests apart from the entire battery to form diagnostic conclusions. Otherwise, it does not appear to be necessary to utilize separate norms for specific ethnic/racial groups when using the Halstead Reitan battery.
Ziskin (1997) identified the following errors in the neuropsychologist’s ability to exercise adequate clinical judgement depending upon the examiner’s ability to:
- List alternative diagnoses and seek evidence for each. The expert needs to provide reasons why one diagnosis would be ruled in or ruled out. The employment of decision trees and/or hypothetical inquiry enhances this process. Experts should makes lists of pros and cons allowing the ability to clearly indicate what their conclusions are based upon. Too often decisions are based upon subjective analysis rather than actuarial data.
- Describe symptoms accurately and specifically. The expert maintains a reference list of symptoms identified for each specific patient and to not rely solely upon memory for this information.
- The knowledge of the patient base. The likelihood of certain disorders being observed more than other disorders as related to their patient population characteristics.
- Separate probability from statistical fact. Use of the knowledge of research and adequacy of the test measures implemented in their evaluation to state factual information about the patient.
- Employ the use of a number of measures for one particular function. Experts operate with the realization that one single examination can create problems in the validity of the conclusions drawn from the data.
- Short form versions are problematic. The expert understands that short form versions of a test measure are only estimates of an estimate and that such versions represent questionable validity when applied to forensic evaluation, specifically that of brain damage.
Guidelines issued by the American Academy of Physical Medicine and Rehabilitation’s Board of Governor focus centrally upon the idea that the function of the expert witness is to educate the court as a whole, as opposed to representing either of the parties involved. The ultimate test for accuracy and impartiality would be the willingness of the expert to prepare testimony that would remain unchanged for use by either the plaintiff or by thedefendant. One role of the neuropsychologist is to education and provide feedback from the evaluation to parents and other experts involved with the patient to provide suggestions and to offer or recommend appropriate treatment.
The focus is on the methodology of the experts as well as the conclusions they generate. The entire reasoning process must be valid to qualify for Daubert standards. Under these auspices, the standard fixed batteries are acceptable while non-standard, more flexible approaches/procedures often are not acceptable. Those procedures, which have not been subjected to adequate scientific testing and peer review, remain insufficient and non-scientific.
There are five areas identified by Ziskin (1997) which should be of concern for neuropsychologists who operate within the forensic arena.
- The absence of uniformity in the Neuropsychological assessment.
The neuropsychologist needs to be able to provide an adequate and comprehensive reasoning process as to why they administered specific tests. There is a lack of uniformity among neuropsychologists in terms of the tests they employ.
Following Explicit Test Instructions
Reitan (1993) has clearly specified throughout the years at his training programs that the clinician is to follow very explicit and specific instructions and that deviations in test material are not acceptable. Reitan implores his workshop attendees to disregard the Heaton norms (based upon economic and educational status) citing such norms as mis-leading.
Reitan has published very clear and specific detailed instructions as to the administration of the Halstead-Reitan Neuropsychological Test Battery and yet clinicians commonly deviate from those instructions, despite the warning of inaccurate application of the NDS scoring system and/or problems emerging with the use of normative data. Examples of procedures employing the use of equipment that has been altered is the booklet Category Test or the portable TPT test. Reitan warns that the only authorized version is the one that duplicates the tests exactly as when the validation studies were completed. These issues are stressed repeatedly in any workshop that Dr. Reitan presents.
The problems with the use of shortened forms of tests increases with forensic evaluation.
Short Form Versions/Alterations in Test Administration
This would include short form versions of test measures and/or measures that employ a different format than its original, such as the Booklet Category Test versus the Category Test from the Halstead-Reitan Neuropsychology Test Battery. It is suggested for forensic evaluation that the short form versions of the intellectual assessment be avoided. As noted elsewhere in this text, forensic evaluation should maintain strict procedures for following the test manual and administering the measure in the same manner as the normative sample. After the measure has been administered exactly as directed by the test manual, the forensic examiner can then commence with testing of the limits, which may include some alteration of the administration instructions.
Testing of the limits
An individual’s scores can vary from test to test based upon a number of reasons associated with state dependent factors (mood, health, attention) structural or environmental factors or traits that are more independent and permanent nature reflecting specific personality configurations. The examiner needs to be aware of which tests are more subject to intra-individuality and generally scoring is expected to remain within the range of one standard deviation to account for such factors as those listed above. Repeated evaluation of the same test or a similar test is suggested if the range increases to beyond that of one standard deviation.
These are practices employed by those neuropsychologists utilizing the process approach (a more qualitative exploration of how the individual attained the scores). The process approach requires careful observation and a follow up of unusual approaches or errors by questioning and/or the use of ad hoc tests, including test modifications. The Boston Process approach specifically presents procedural modifications for various tests including the WAIS-R. Modifications do not tend to be standardized and would result in the inability to employ the usage of normative data. Test modifications are suggested after the test has been formally administered under standard format for this reason.
Various manuals for test administration point to the impact of practice effect and this issue has been addressed by general policy statements generated by the APA with the overall agreement of a minimum time window of six months. This means that there would need to be a six month duration for a practice effect to be minimized. An example of the impact of practice effects is noted in the test-retest stability research. A 7 to 8 point increase in the Full Scale IQ of the WISC-III, an intellectual assessment for children, was found with a short retest period and practice effects diminished over a longer test retest interval. Once an individual has been evaluated, the data serves as a baseline against future changes in cognitive functions, magnitudes, rates of cognitive change as well as response to treatment, which can be determined by follow up testing. Repeated, closely spaced testing can obscure cognitive changes or intervention effects.
Re-testing becomes increasingly impervious to practice effects with the greater length of time between evaluations. Generally it is recognized that six months is the minimum amount of time to negate the effects of practice although a period of one year is the optimum to truly rule out practice issues. Practice effects are often not addressed by neuropsychologists involved in litigation who for purposes of litigation and to meet court demands, administer tests within a shorter period of time than the allotted six-month window for re-testing. Data becomes particularly confounding when neuropsychologists have re-administered the same test measures numerous times. Findings become questionable when the examinee is so familiar with the test measure that they recall their response from the prior administration. Re-evaluation if occurring often enough becomes a means for the examinee to improve their score rather than a measure of their brain functioning. There are instances that tests are re-administered measures within days to only a few months of the previous test administration. Memory testing can become particularly suspect when measures are re-administered within a short time window. The validity and reliability of the findings become questionable to an even greater degree when the clinician does not even attempt to consider the effects of practice and/or to address such issues in a debate format or decision tree regarding the utilization and relevance of their data.
The Use of Technicians
Smith (1994) clearly stated that “there is no substitute for personal administration of all tests by the clinical neuropsychologist ” and thus found no excuse for the use of technicians.
Guilmette et al. (1990) in surveying neuropsychologists found that over half of the respondents (53%) administered all of the tests in the neuropsychological assessment themselves. Only 5% indicated that they did not perform any of the tests themselves. On the average 73% of the tests administered were performed by the psychologist, as opposed to the trained technician. Less than 2% of the tests administered were implemented with the use of a computer.
When technicians are used extensively for test administration this will tend to detract from the time that the expert has spent with the examinee. The expert may emphasize the importance of observing testing behaviors or qualitative features that the examinee exhibited during the evaluation. Factors such as manner in which the examinee approached the test items can be used to aid in the determination of brain injury or to determine if the examinee cooperated with the testing procedures. The neuropsychologist who relies upon such data to form their conclusions, although they did not make such observations themselves, relying upon second hand information conveyed by a technician, become readily vulnerable to cross examination.
- There is a non relationship between experience and accuracy for the neuropsychologist.
The neuropsychological field has been a diverse population, a rapidly growing field, resulting in the absence of specifically defined guidelines for evaluation. One of the ways to provide uniformity has been the introduction of specific requirements, training and expertise to become included in a specialized population of neuropsychologists. Typically, the attainment of diplomate status via the American Board of Professional Psychology has been used in court as a means to identify those individuals with more extensive training and experience in the field. However, the practices of the ABPP were not seen as substantially different from the practice of the non-ABPP. More experience does not always mean more accuracy.
A survey completed by Sweet and Moberg (1990) compared the practices of the ABPP versus the non-ABPP neuropsychologist (member of Division 40 of the APA) which were not found to be substantially different. There was a general consensus between the two groups regarding the type of information to be gathered in a neuropsychological evaluation, the types of journals most frequently read, the use of a personal interview and the amount of hours necessary to complete a clinical examination. Differences between the two groups identified the non-ABPP professional as emerging from the private practice setting versus the ABPP professional who was based in medical centers, research sites or universities and more likely to have a greater percentage of time invested in research and teaching. Journals read by both groups were similar; JCEN, JCCP, ACN, IJCN, and Journal of Head Trauma Rehabilitation. The ABPP group had access to more medically based journals (TCN, Annals of Neurology, Neurology, Neurology, Cortex, Journal of Neurology, Neurosurgery & Psychiatry and Brain) as a result of their work setting. The ABPP group tended to use the service of a technician more and did not personally observe as much of the evaluation. Of the ABPP group, 30% did not observe any evaluation at all versus 13% of the non-ABPP group. However, somehow the amount of clinical time spent doing neuropsychological assessment was greater for the ABPP group (71% versus 43%).
Those neuropsychologists who hold diplomate status were not found to differ significantly from their non-diplomate counterparts in methods and practice. There does appear to be some general universal beliefs and specific practices followed by the majority of neuropsychologists. This provides useful information to the attorney who can question those practitioners whose methods fall outside of common practices and deviates considerably from these averages. Involvement in neuropsychological activity versus other areas of psychology identified the least active psychologist as spending less than 10% versus the most active psychologist who spends at least 30% of their time in activities devoted strictly to neuropsychology.
- The limited capacity to predict everyday functioning.
Unfortunately research does not suggest a high correlation between test performance and everyday functioning. Historical information becomes necessary to provide the full picture as well as talking to family members about the examinee’s functioning. Ziskin (1995) references the Dunn et al. study (1990) regarding the predictions of functioning in everyday life based upon performance on neuropsychological tests as failing to establish significant validity at this point in time. There was an absence of scientific data demonstrating the existence of valid relationships between neuropsychological test performance and functioning in everyday life.
It is only by comparing past or prior functioning to current deficits that the examiner is truly able to assess the individual’s losses in an accurate and relevant manner. The method to combat the problem of everyday functioning in the present time frame is the assessment of past functioning to determine whether a change has occurred. The assessment of past or pre-morbid functioning provides the key to the prediction of future functioning. However, research cites an absence of validated methods to accurately determine pre-morbid functioning and limited power of such techniques as the use of pre-morbid intellectual indicators as the NART-R. It becomes the task of the clinician to seek out concrete evidence regarding pre-morbid functioning from collateral sources, school records, achievement testing and occupational records. Failure to do so can lead to false conclusions about prior functioning and an over or under-estimation of the effects of injury via comparisons of pre and post-injury functioning. The use of demographic variables was found to be only slightly better than chance.
Reitan and Wolfson (1999) found their classification of mild impairment on the NDS scoring system represented only a 10% overlap between their groups of brain damage and controls. The NDS score of 26 represented a percentile ranking of 10% of the population with 90% of the population attaining a score of 25 or better. The General NDS score was found to be an excellent cutoff for predicting brain damage at a 75 % rate. This score is representative of 1.5 standard deviations below the normal mean.
Pattern analysis on the intellectual assessment scaled scores is another controversial issue among neuropsychologists who indicate that discrepancies must range from 4 to 5 points or more between subtests to provide a significant indication of subtest discrepancy. However the manual for the WISC-III and the WAIS-III cite the difference of 3 to 4 points between various subtests as generally significant at the .01 probability level. This would mean that there is only one chance in 100 of statistical error in predicting a significant relationship in the pattern of the subtest scores when there are discrepancies of 4 or more points.
Measurement of Change: From Pre-Morbid to Post-Morbid Status
A common debate in neuropsychological assessment is the measurement of change from the pre-morbid to the post-injury state. Pre-morbid IQ indicators are one type of indirect method used to assess pre-morbid/pre-injury status. However, a more reliable and direct method would be the measurement over time on the same neuropsychological instruments. The major concern with re-testing has been the issue of practice effects.
The concept of using the neuropsychological evaluation itself to determine pre-injury abilities is noted by Guilmette (1994) who indicated that examination of the pattern of test results can determine the relative strengths and weaknesses of the patient.
Despite the varied tests available to test this issue there is no guarantee to be able to rule in or rule out malingering. Many experts do not use methods shown to have some degree of scientific validity. Warning the individual that attempts to malinger on neuropsychological testing will be detected has not seen as an effective ways to reduce malingering. Rather, it is suggested that asking the person to do their best and encouraging maximum effort, consistent with accepted testing practices and manual instruction, is the most optimal means to address this issue.
Parker (1994) cites the concept of malingering as being highly complex and indicated the likelihood that malingering has been over-diagnosed given the unreliability of diagnosis with the use of a single test or test battery. “The assertion that a medical condition is faked or exaggerated is equivalent to calling the patient a “thief, neurotic or lazy” and should be bound by the same high standards as any other professional conclusion.” One of the behavioral indices that alerts the examiner to subjective claims is a histrionic or attention getting demeanor. However, the attempt to differentiate organic disease from that of hysterical conversion reaction is seen as a rather difficult challenge and furthermore, individuals diagnosed with hysteria can also have neurological deficit. Experts tend to cite the diagnosis of hysteria when the protocol is different from know physiological patterns. Finally, the warning is issued that the one time decision of a forensic expert can have far reaching consequences on a patient’s life.
Mossman and Hart (1996) referenced by Ziskin (1997) indicated three methods to assess malingering. Inconsistent test performance, differences between test performance and claimed disability, and inconsistency between test performance and history. The clinician arguing against self-report measures as being misleading would tend to reinforce the need to obtain collateral or outside sources of information to confirm the data reported by the examinee.
The MMPI can mistakenly report somatic symptoms often seen with PCS following head injury. Symptoms are seen as representative of an individual suffering from some type of somatic personality syndrome rather than actual symptoms resulting from brain impairment and the cascade of events involved with continuing cellular changes. The problem of testing for malingering is that such tests may only address issues of motivation and/or emotional factors present at the time of testing without addressing the issues of the individual who may be acting without conscious intent. Accurate diagnosis is confirmed as being based on examination of both test and extra test behavior as well as a thorough evaluation of the patients history and pertinent reports, including injury characteristics.
Interviewing approaches are suggested to assess malingering. Indicators are that of a more dramatic presentation, deliberateness and carefulness in speech patterns, inconsistency of self-report and endorsement of a high number of symptoms that are not necessarily relevant and/or unusual and inconsistent with the diagnostic criteria.
Potential indicators of malingering are comprised of: pre-morbid indicators (personality traits, borderline or antisocial, prior injury, work record and prior claims) behavioral indicators during examination (uncooperative or inconsistent, suspicious, evasive) test performance variables (responses of I don’t know, missing random items, giving up easily, inconsistent profile) patient’s presenting post-morbid complaints (absurd reporting of events, over idealized functioning prior to accident, inconsistencies in functioning, endorsement of unusually large numbers of symptoms, great detail in explanation/description of accident) and reported or observed activities of daily living (activities that are not consistent with reported deficits, refusal of employment, discrepant capacity between work and recreation). In addition there are issues to consider such as: lack of reasonable follow through on available treatments, attributing all of life’s problems to the accident, blaming others for all of life’s problems, seeking unnecessary examinations and treatments without consulting experts, and resistance or lack of seeking reasonable remedies. It is suggested that the evaluator review co-morbid symptoms looking for interactions, re-evaluate and re-test, thus strengthening reliability of the findings, review pre-morbid medical and scholastic records, interview the patient, family and friends, determine if secondary gain is present and ascertain the independence of the data measures.
- How data is used to form conclusions.
Do Tests Measure What They Portend to Measure?
Tests that portend to localize to discrete brain regions have met with only limited success. The Wisconsin Card Sorting Test is an excellent example of the controversy regarding those measures assumed to be differentially sensitive to frontal lobe damage. Mountain and Snow (1993) concluded that the clinical utility of this measure as an indicator of frontal lobe dysfunction is not supported nor the use of this test seen as a marker of frontal dysfunction for research purposes .
The sensitivity of the test becomes defined as predictive validity, the ability of the measurement to correctly identify those in the brain damage classification versus the normal subjects.
This classification process is not universally dependent upon the position of the score on the normal curve. There is controversy among neuropsychologists over the clinical practice and general method of ruling in or ruling out the presence of brain impairment. Some neuropsychologists maintain that one to two standard deviations below the mean is significant of impairment while other neuropsychologists routinely rule in brain impairment only when the score falls within two to three standard deviations below the normal mean. Clinically, neuropsychologists do not always consider the variable of the examinee’s pre-morbid intellectual potential and there is disagreement in the degree of consideration of the standard deviation discrepancy. An example would be the finding of average scoring in a previously gifted person, which would point to the presence of a considerable loss of ability. Similarly, performance within well below average limits may not confirm the presence of impairment subsequent to head injury for an individual who was pre-morbidly classified as retarded.
Spreen and Strauss (1998) warn against the common usage use of a particular cutoff indicating that it may result in mis-classification. Overall, it is suggested that multiple comparisons be made and that the process of clinical interpretation become dependent upon a combination of several test results, in addition to clinical observations and the use of theoretical constructs to address the specific disorder and/or reason for referral.
Ruling in Brain Impairment.
Percentile scoring is being reported more often in reports however there is a problem with the accuracy of such reporting given the dependence upon normal distribution and the fact many tests scores are not normally distributed. Further rankings provide data that can become misleading. For example one-half of a standard deviation from the mean translates into a difference from the 50th to the 69th percentile. Further, normal is considered to be a range comprised of a minimum range of two standard deviations, from the 25th to the 75th percentile. This issue becomes further delineated when the percentile ranks are calculated out using standard deviations from the mean as noted below.
99th % + rank =3.0 standard deviations above the mean.
99th % rank = 2.5 standard deviations above the mean.
98th % rank = 2.0 standard deviations above the mean.
93rd % rank = 1.5 standard deviations above the mean.
84th % rank = 1.0 standard deviations above the mean.
69th % rank =.5 of a standard deviation above the mean.
50th % rank = 0 standard deviations above the mean.
31st % rank =.5 of a standard deviation below the mean.
16th % rank = 1.0 standard deviations below the mean.
7th % rank = 1.5 standard deviations below the mean.
If scoring is considered to be abnormal, only when it falls within two to three standard deviations below the mean, using the percentile ranks listed above, this would mean that brain damage is seen as being present in only two percent of the population, which current statistics for head injury negate.
The Use of Normative Data
Ziskin (1995) indicates the following: “The selection of one set of norms over another often becomes a matter of something other than science, such as speculation or bias, and yet the interpretation of testing results may differ markedly depending upon the clinician’s normative selection. “Not infrequently a clinician has reached conclusions based on questionable norms when one or more of these other normative studies suggest opposing interpretations”.
Reitan and Wolfson (1995) strongly argue against the necessity of age and education correction for the diagnosis of brain damage citing research that indicates a lack of impact of these variables. Other researchers argue that the non-inclusion of such corrections is inappropriate. Reitan and associates maintain that the application of corrections for age and education become confounding variables, due to the complex set of rules that cannot be applied uniformly, and tends to lead to erroneous conclusions. It is suggested that the clinician report the various norms and that conclusions be based upon comparisons between the normative findings, findings of the entire evaluation, patterns evidenced by the individual, historical data and the relationship of the test data to the underlying theoretical construct and identified condition.
Age corrected norms have historically been an important issue and contributed to the mis-use of the WAIS-R as noted by Ziskin (1995). This becomes of particular concern when addressing any population outside of the criterion referenced group of 25 to 34 year olds and scoring can be quite different when comparing scaled scores used for this general population versus scaled scores for the age of 50 or 16 years of age.
There are generally accepted criteria for test measures and standards for educational and psychological testing. Six specific criteria have been identified that should be followed for any information or symptom or sign used in formulating a diagnosis or prediction, not only for tests but for any type of information used in the assessment process.
Six Tests for Tests or Clinical Information:
- Test 1: Is the test Reliable?
Standards for education and psychological tests according to APA indicate that there are those rare cases whereby changes in the form of standard procedures would become necessary. As a rule, it is suggested that the examiner should refrain from any deviation from the indicated standard procedures. A psychologist who does not report departures from standard procedures and who nevertheless uses normative data from the test manual for purposes of interpretation would be in violation of proper practices.
- Test 2: Is there a true relationship between the test score or sign and the Criterion of Interest?
The conventional interpretation is that when one standard deviation falls above and below the stated mean, this would encompass 68% of the distribution of scores (34% on either side of the mean). This would leave 16%, meaning, that statistically there is a one in six (to seven) chance of being incorrect when determining that scoring represents a true score (as opposed to measurement error) and that findings are due to the presence of the deficit /ability in question.
Typically test manuals indicate that one to two standard deviations below the normal mean is ruled in as mild impairment while two standard deviations is seen as moderate impairment and three standard deviations is seen as severe impairment. This formula comprises the basis for the Halstead Reitan NDS scoring system. When neuropsychologists report impairment as mild, only when it is two standard deviations below the mean, the score would actually would represent five percent or less of the population.
Each test score is made up of a true score and an error score. The error score comprises those factors or variables that for account for examiner error, test error, erroneous and uncontrolled subject variability. The validity of each measure needs to be such that only a small portion of the score is represented as an error score to be assured that the measure is measuring what it is purporting to measure.
- Test 3: Do positive identifications withstand base rate analysis?
Tests employed to measure damage to the brain must have an established base rate to determine whether the use of a particular test or sign increases or decreases diagnostic or predictive accuracy. For example, research has shown that the WAIS-R has only a 66% correct classification rate to identify brain damage. This would mean that 34% of the population diagnosed with damage based upon test findings on this particular measure would have been inaccurately diagnosed. Using base rates the lowest level of accuracy one can achieve is 50%. Base rates allow specific tests (such as those classified as neuropsychological measures) to be more relevant to measure damage to the brain as opposed to other test measurements.
- Test 4: Does the test produce incremental validity? Does one achieve more accurate results if one uses the test score or signs or discards it?
Tests need to demonstrate incremental validity, that is, there is a gain in the level/degree of accuracy of diagnosis with the addition of a particular test. Tests are added to attain the increased goal of accuracy of diagnosis.
- Test 5: Does the Test or Sign pass tests 1 to 4 noted above for both the persons and questions to which is it applied?
In other words does the test or sign apply to the population and questions for which the tests are applied in an accurate manner? The idea of generalization of the findings, to what extent do the results generalize to other individuals or other questions. The test can be stated as valid for only the specific purposes for which its validity has been determined.
- Test 6: Does the test or sign diagnose or predict with reasonable certainty?
This asks the question of convergent and divergent validity. Can the test rule out those that do not have the sign in question, while ruling in those that do? The neuropsychologist should be aware of the convergent and divergent validity represented by statistical analysis in the form of a coefficient, for every test measure administered in their neuropsychological assessment. Evidence for a test or sign should not be admitted until Tests 1-5 have been passed.
It is supported by the research that the clinician cannot use the results of one measure to predict performance on another measure, even if the two measures are reportedly measuring the same thing. Convergent and divergent validity has not been shown in the research to support the ability to administer one measure and assume that the results are generalizable. Recent evaluation of the correlation of two popular memory measures, the MAS and the WMS-R, actually yielded a rather low correlation coefficient, confirming the problem of generalization and the need to administer several measures to assess a particular functional category such as memory.
In summation, Daubert is a standard that can be imposed by the court and used to accept or deny expert witness testimony. Every neuropsychologist is fearful of being called to task on the issues noted here, however only when scientific principles are applied to the testing and evaluation process and strictly adhered to, can the field of neuropsychology rightfully take its place as a part of the medical process. Neuropsychology offers to the medical field a form of evaluation that has the diagnostic capacity not seen to date in any type of radiological assessment. Neuropsychologists can determine deficit areas with their brain behavior, paper and pencil testing that may take years to be seen on more gross types of evaluation. When neurologists accept the findings of the neuropsychologist and act accordingly, even when such findings are not seen on radiological measures they will add to their practice a diagnostic capacity that is truly preventative. Hopefully, the more scientific the field of neuropsychology, the more standardized the operation of its members, the more seriously their findings will be utilized and only then can this field become the science it was meant to be, maximizing the potential and usefulness that it rightfully has.
Barbara C. Fisher, Ph.D.
The views expressed are those of the author, Dr. Fisher. However, she is always open to new ideas, new research, and new information. Tomorrow’s research about the brain augments or replaces the concepts we know today. Consequently, these viewpoints will naturally change and will be updated by Dr. Fisher on a periodic basis.