Psychometric Properties of the PADDS

Incremental Validity

Overview of PADDS Reliability and Validity

Reliability
Reliability, broadly defined, is the repeatability of test scores and/or research findings (Kaplan & Saccuzzo, 1997). Several forms of reliability exist and can be estimated using a variety of measures. The choice of reliability estimation technique is often limited in terms of the nature of the data collection and/or the nature of the available data. We report internal consistency reliability as well as test-retest reliability and decision consistency/stability coefficients as estimates of the reliability of the PADDS system. While establishing the reliability of a test is a process that is always ongoing, a generally accepted minimal standard of reliability is .70 (Spector, 1992). When clinical decisions are being made, coefficients of at least .85 are recommended (Rosenthal & Rosnow, 1991).

Internal Consistency
Treating the separate subtests of the PADDS as individual variables in a Cronbach's alpha analysis revealed a highly acceptable degree of Internal Consistency reliability, a = .86 for the overall PADDS measure within a larger sample of n = 611 cases. The demographic make up of this larger sample had a mean age of 8.66(SD = 1.71), 60% were male, 70% Caucasian, 25% African American, and 5% Hispanic. This reliability estimate was largely substantiated with alpha values of .86 and .80 in repeated testing of a sample of n = 38 and .80 and .92 in another repeated testing of a sample of n = 27 - samples taken to assess test-retest reliability in two independent samples (reported below). These reliability estimates thereby support the recommended use and comparison of the three subtests for their level of overall agreement in order to generate the maximum diagnostic utility of the test.

Test-Retest - Stability Coefficient
Criterion referenced test-retest reliability was examined by calculating stability coefficients and both the Kappa and Phi coefficients for the degree of diagnostic agreement resulting from two administrations with two separate samples of participants. Statisticians do not agree on whether the Phi or Kappa coefficient is a more appropriate measure of criterion referenced test-retest reliability (Reid and Roberts, 1978), so they are both reported here. The clinical use of the Target Tests combines all three subtest results when determining classification of ADHD or Typical performances. Given this use, assessing the stability of the diagnostic classification at two separate intervals was computed.

The first sample included 65% males, mean age 8.36 with SD of 1.76. The participants ethnic make-up included 70% Caucasian, 23% African American, and 7% Hispanic. The 38 participants test and retest performances were collected within 6 months. Results show that 36 of the 38 participants remained appropriately classified resulting in a stability coefficient of .94. Phi and Kappa coefficients were also calculated for this sample with results of .70 and .69 respectively all ps<.001.
The second sample included 66% males with a mean age of 8.44 and SD of 1.90. Seventy percent of the participants were Caucasian and 30% were African American. The interval of time for test and retest was one to two years. Overall, 23 of the 27 participants remained appropriately classified across the test-retest procedures producing a stability coefficient of .85 and indicating a high degree of stability for diagnostic classification over time. Phi and Kappa coefficients were also calculated for this sample of 27 participants with results of .70 and .73 respectively all ps<.001.

PADDS Research

PADDS Links

Sample Reports(pdf)

Forms and Protocols (pdf)


It is important to note that the psychometric review of the PADDS Target Test performances produced a main effect for age. This effect shows that the older a child is the better they tended to score on the individual subtests. This result would be expected given that the Target Tests were designed as measures of executive functions. Since children typically show greater executive control as they age, it would stand to reason that they would improve on the subtests as they get older. Additionally, effects of time, practice, and physical maturation could also serve to increase the difference between test and retest procedures. Thus, differences between stability and Phi and Kappa coefficients would be expected. Overall, results show that individual subtest scores may vary as a function of age, but classification decisions made using age specific cut scores remain highly consistent even across the span of 6 months to 2 years.

Validity
Validity is defined as the degree to which a test measures what it purports to measure. Ultimately, validity is represented by the degree of support for construct validity; that is, does the measure adequately "tap into" the underlying construct it purports to measure? Evidence of construct validity is generated through the presentation of convergent and discriminant validities, primarily, and then more superficially through demonstration of criterion-related validity. Our primary focus at this stage of test development has been the demonstration of concurrent, convergent, and discriminant validities. Future work will continue to focus on establishing additional evidence of criterion-related validity for the measure as more findings emerge.

Concurrent Validity
All participants in the separate validity studies were drawn from the Savannah Child Study Center, Savannah Georgia. The first sample included 121 children age 6 to 12 (M = 8.77; SD = 2.0) who were administered the PADDS and TOVA. Approximately 73% were males and the sample consisted of 61% Caucasian, 33% African American, and 6% Hispanic. Results from this study are presented in Table 5.1 and show each of the three PADDS subtests are correlated in predicted directions with the TOVA scale except in the case of response time, for which we expected no correlation given that the speed of response measured by the TOVA has no theoretical basis for exhibiting a relationship to any of the three PADDS subtests.



Table 5.1 Correlations between PADDS & TOVA Scale Scores (n = 121)
PADDS Scale
TOVA Scale Target Recognition Target Sequencing Target Tracking
Omission .36*** .43*** .43***
Commission .34*** .30*** .36***
Response Time .05 .08 .02
Variability .33*** .29** .30***
Multiple Response -.39*** -.36*** -.27**

The second sample included 38 children age 6 to 12 (M = 8.37; SD = 1.76) who were administered the PADDS and CPT-II. Approximately 72% were males and the sample included primarily 54% Caucasian, 41% African American, and 5% Hispanic. Results from this study are displayed in Table 5.2 and show the predicted relationships between the PADDS and the CPT-II. As anticipated the PADDS subtests were negatively correlated with a number of the CPT-II subtests. These low to moderate negative correlations occurred because the PADDS and CPT-II are scored in opposite direction fro severity indicating that lower scores on the PADDS are associated with more performance errors on the CPT-II, and thus a higher score. The relatively low to moderate values of the observed correlations indicate that the CPT-II and the PADDS are measuring both similar as well as different components of attention and executive function and that they therefore could justifiably be used in conjunction with each other as a means to better capture a holistic view of the attention/concentration construct.



Table 5.2 Correlations between PADDS & CPT-II Scale Scores (n = 38)
PADDS Scale
CPT-II Scale Target Recognition Target Sequencing Target Tracking
Omission -.38** -.34* -.46**
Commission -.13 -.20 -.14
Mean Hit RT -.40** -.29* -.50**
Mean Hit RT (SE) -.52*** -.33* -.46**
D -.21 -.39** .00
B -.39** -.37* -.47**

The third sample included 59 children age 6 to 12 (M = 8.48; SD = 1.86) who were administered the PADDS and Brief Scale. Approximately 75% were males and the sample included primarily 56% Caucasian, 34% African American and 10% Hispanic. Results as shown in Table 5.3 reveal that the PADDS and the BRIEF are largely unrelated measures. Perhaps these results are best interpreted in light of the rather distinct types of measures the PADDS and BRIEF are, the PADDS being a measure of executive function performance whereas the BRIEF is a rating of an individual's executive function abilities. These are quite different measures of a construct that, in order to observe a strong correlation, would require highly accurate ratings from individuals who may not have the necessary insight into ones' executive functioning abilities due to the tremendous possibility for bias in the ratings (especially on the part of parent who does not wish to have his/her child unfairly labeled). The quite low correlations between the PADDS and the BRIEF provide yet further evidence for the necessity of multiple sources of evidence for diagnosis of ADHD, as advocated by the evidence-based approach utilized in the PADDS system, rather than (over) reliance on any single diagnostic tool and its own particular limitations.



Table 5.3 Correlations between PADDS & BRIEF Scale Scores (n = 58)
PADDS Scale
BRIEF Scale Target Recognition Target Sequencing Target Tracking
INHIBP -.23* -.19 -.18
WMP .12 -.01 .07
EMCP -.23* -.09 -.13
GP -.10 -.10 -.08
INHIBT -.12 -.08 -.10
WMT -.02 .02 -.04
EMCT -.30* -.17 -.24*
GT -.17 -.10 -.13

Note. INHIBP = BRIEF Inhibit Parent Form; WMP = BRIEF Working Memory Parent Form; EMCP = BRIEF Emotional Control Parent Form; GP = BRIEF Global Executive Composite Parent Form; INHIBT = BRIEF Inhibit Teacher Form; WMT = BRIEF Working Memory Teacher Form; EMCT = BRIEF Emotional Control Teacher Form; GT = BRIEF Global Executive Composite Teacher Form.


Convergent Validity
Convergent validity describes the extent to which two measures of the same construct correlate. In terms of overall diagnostic classification, there was low moderate diagnostic (classification) agreement between the PADDS and the Test of Variables of Attention (TOVA; Greenberg, 1991) group's measure of ADHD, r (121) = .38, p <.001.

In a separate sample of 38 children diagnosed with ADHD that improved on medication, percent agreement for classification of the PADDS Target Tests of Executive Function, the Brown Parent or Teacher ratings (Brown, 1996) and the Connors' Continuous Performance Test II (Conners, 1997) was completed to determine percentage of diagnostic utility and percentage of agreement among the measures. For the purpose of this analysis, at least two of the three Target Tests of Executive Function had to be found to be at or below established cut scores for classification of ADHD. Brown ADD scales were reported in T score format and for the purposes of this analysis either the parent or the teacher rating meeting the accepted standard of 1.5 standard deviations (or T score = 65) were considered indicative of classification of ADHD. Regarding the Connors' Continuous Performance Test II, also reported in T-score format, overall confidence index of 65 or greater was considered indicative of clinical classification. This sample included 67% males, 75% Caucasian and 25% African American (mean age =8.70, SD =1.9.



Table 5.4 presents a comparison of the diagnostic utility of the PADDS Target Tests of Executive Function, the Brown ADD scales, and the Connors' Continuous Performance Test II
PADDS Target Tests Brown ADD Scales CPT II
Hit Rate/N 36/38 25/38 26/38
Percentage 94% 66% 68%

As can be seen, the Target subtests produced the highest hit rate of the three measures at 94% followed by the Conners' CPT at 68% and the Brown ADD Scales at 66%.

The improved hit rate produced by the Target Tests of Executive Functions was in keeping with the overall data analysis of the larger data set (n = 725) and likely due to the fact that there are three independent measures used in conjunction with one another. These measures used in conjunction and incrementally have demonstrated outstanding ability to accurately classify ADHD subjects from their non-clinical counterparts.



Table 5.5 A comparison of diagnostic utility of PADDS Target Tests of Executive Function, Brown ADD Scales, and the Connors' Continuous Performance Test II
PADDS/Brown PADDS/CPT II
Hit Rate/N 25/38 24/38
Percentage Agree 66% 63%

Discriminant validity
Discriminant validity describes the extent to which a test is not correlated with constructs it purportedly is unrelated to. Discriminant Validity was assessed with 137 participants (mean age = 8.05, SD = 1.50) of whom 64% were male, 67% were Caucasian, 30% were African American and 3% were Hispanic. In our research, as expected, we found that the overall diagnostic classification using the PADDS system was unrelated to: Full Scale IQ, Verbal IQ, Performance IQ, (as measured by the Wechsler Abbreviated Scale of Intelligence (Wechsler, 1999). Also unrelated were visual or verbal memory, along with indices of attention and concentration, (as measured by the Children's Memory Scale (Cohen, 1997). and the Wide Range Assessment of Memory and Learning II, (Sheslow & Adams, 2003); all ps > .05. It should be noted that neither of the memory tests indices for Attention and Concentration correlated significantly with diagnostic classification.

Construct validity
Executive functions are defined as controls that allow one to perform complex behaviors that require among other things: planning, attending, organizing input, storing and retrieving information, modulating emotions and sustaining effort. Given that the Target Tests were designed to more fully tap these executive areas, in a separate sample of 35 participants 67% males, 75% Caucasian and 25% African American (mean age =8.50, SD= 1.7. was rated on the Brown ADD Scales (both teacher and parent versions) and tested with the PADDS Target Tests of Executive Function to determine the degree to which these subtests correlate. The Brown ADD Scales are a set of commercially available behavior ratings based on the work of Dr. Thomas. E. Brown and are viewed as sensitive to the core domains of executive operations (Brown, 1996). Table 5.6 presents the results of these analyses. As expected, two of three Target Tests of Executive Function correlated with the Brown Teacher rating for diagnosis of primarily inattentive and for combined type. Parent and Teacher ratings did not significantly correlate with one another. Likewise, Parent ratings did not significantly correlate with any of the Target subtests. The lack of correlation may reflect a degree of bias on the part of parent respondents who may have erred on the side of conservative estimate as they completed the Parent scale. This is the most plausible explanation for these findings given that the children in the sample were previously diagnosed with ADHD and had shown significant improvement on medication.



Table 5.6 Intercorrelations of percentile ranks of PADDS Target Tests of Executive Function and Brown ADD Scales (Teacher and Parent-versions)
TR%ile TS%ile TT%ile BP-Inatt%ile BP-Comb%ile BT-Inatt%ile BT-comb%ile
TR%ile
TS%ile .68**
TT%ile .43** 63**
BP-Inatt%ile -.18 .00 .10
BP-Comb%ile -.18 .01 .12 .97**
BT-Inatt%ile -.35* -.39* -.25 .21 .19
BT-comb%ile -.36* -.41* -.26 .26 .22 .86**

TR%ile = PADDS - Target Recognition percentile
TS%ile = Target Sequencing percentile
TT%ile = Target Tracking percentile
BP-Inatt%ile = Brown Parent Rating of Inattentive Type
BP-Comb%ile = Brown Parent Rating of Combined Type
BT-Inatt%ile = Brown Teacher Rating of Inattentive Type
BT-comb%ile = Brown Teacher Rating of Combined Type