Talk:OToPS/ABACAB (NIH R01 MH066647)

Paste of closing report
Improving the Assessment of Juvenile Bipolar Disorder

5R01 MH066647

Closing Report

Principal Investigator: Eric A. Youngstrom, Ph.D.

Co-Investigators: Jennifer Kogos Youngstrom, Ph.D., Norah C. Feeny, Ph.D., Oren Meyers, Ph.D.,

Robert L. Findling, M.D.

Research Coordinator: Heather Marcinick, M.A.

Data Managers: Christine Demeter, M.A., Oren Meyers, Ph.D., & Andrew Freeman, M.A.

Table of Contents

Introduction................................................................................................................................... 3

Aim 1: Develop and Cross-Validate Screening Instruments................................................................ 3

Hypothesis 1: Measures Will Generalize and Remain Valid[EAY1] .......................................... 3

Hypothesis 2: An Abbreviated Instrument Will Cross-Validate Successfully[EAY2]  .................. 6

Aim 2: Clarify the Features of Subsyndromal Bipolar Disorder........................................................... 6

Hypothesis 3: Manic Symptoms Will Statistically Identify a Category of Bipolar Disorder[EAY3]      6

Hypothesis 4: Bipolar Disorder Will Be the Condition with Greatest Average Impairment[EAY4]                       7

Aim 3: Examine Developmental Changes in Mood Presentation from Ages 5 to 17 Years.................... 7

Hypothesis 5: Continuity of Symptom Presentation Across Age Ranges[EAY5] ........................ 7

Reliability and Credibility of Information Sources..................................................... 9

Associations Between Credibility and Validity.......................................................... 10

Hypothesis 6: Bipolar Disorder Will Show Progressively Worse Burden, More Service Use...... 10

Secondary Aims:............................................................................................................................. 11

Base Rate for Juvenile BPSD in a Community Mental Health Setting[EAY6] ............................. 11

Diagnostic Agreement[EAY7] ............................................................................................... 11

Promoting Translation of Assessment Procedures from Research to Clinical Settings............. 14

Other Factors Potentially Changing Performance of Diagnostic Instruments[EAY8] ................. 14

Demographics of Participants.......................................................................................................... 15

Participant Enrollment Report......................................................................................................... 16

References Cited............................................................................................................................ 18

Tables............................................................................................................................................ 21

Appendix I: Publications, Grants, and Training Activities Deriving from Grant.................................... 46

Improving the Assessment of Juvenile Bipolar Disorder

In the time since this grant was initially funded, the debate around the validity of diagnosing bipolar disorder in children and adolescents has remained heated. There have been considerable advances in knowledge about phenomenology, assessment, physiological correlates, and even some data about longitudinal course (Birmaher et al., 2009; Geller, Tillman, Bolhofner, & Zimerman, 2008); but at the same time there continues to be vigorous discussion about the appropriateness of the diagnostic label (Healy, 2006; Leibenluft, Charney, Towbin, Bhangoo, & Pine, 2003), and the implications in terms of treatment (Blader & Carlson, 2007; Moreno et al., 2007), extending from the popular press (Kluger & Song, 2002) to the deliberations about revision of the Diagnostic and Statistical Manual of the American Psychiatric Association (www.dsm5.org). In short, the objectives of the project are timely and informative.

The presentation of findings in this closing report will closely follow the structure of the specific and secondary aims from the funded version of the project. The format will be a recapitulation of the aim, followed by a brief narrative and then extensive tables or figures as appendix material, with the numbering reinforcing the link to each aim.

The overall goal of this project was to create a foundation for better defining and assessing bipolar disorders in pediatric populations. This submission consisted of two tightly interwoven studies. Together they will (a) establish a base rate for bipolar disorders in community mental health, and document the phenomenology of symptom presentation and associated characteristics, (b) evaluate the concordance between research and conventional diagnoses, (c) test the performance of instruments that have shown promise in a research mood disorders clinic when these are exported into a naturalistic community setting, (d) develop an abbreviated screening protocol optimized for use in community samples, and (e) prospectively cross-validate the use of the abbreviated screening protocol.

A. Specific Aims:

Aim #1: To develop and prospectively cross-validate a juvenile bipolar spectrum disorder (BPSD) screening protocol optimized for use in community settings with representative, ethnically diverse populations.

Hypothesis 1: Measures that performed well in pilot data from a research setting will demonstrate statistically significant diagnostic efficiency when exported to a community setting.

The R01 used a LEAD consensus conference (Spitzer, 1983) as a way of assigning final diagnoses, after completing a KSADS interview (Geller et al., 2001; Kaufman et al., 1997) with the youth and primary caregiver. The same rater teams completed diagnostic interviews at both the community mental health center (CMHC) and academic medical center (AMC) sites, eliminating differences in rater experience or training as a potential confound in generating diagnoses.

The criterion diagnosis for the primary analyses of diagnostic efficiency was the presence or absence of a bipolar spectrum diagnosis. The bipolar criterion included bipolar I, bipolar II, cyclothymic disorder, and bipolar not otherwise specified (bipolar NOS), regardless of any potential other comorbid conditions. The inclusion of “spectrum” conditions such as cyclothymic disorder makes it harder for diagnostic efficiency to remain high, because these cases will have less clear cut symptom presentations, and they cannot have distinct manic episodes or else their diagnosis would “upgrade” to bipolar I, by definition. As a result, diagnostic sensitivity would be expected to be lower, as more “bipolar criterion” cases would be likely to earn only moderately elevated scores on instruments attempting to detect them.

Similarly, the comparison group contains all other cases presenting to the clinic. There were no exclusion criteria at the community mental health site, other than an inability to complete the interview in spoken English. This stands in marked contrast to the sampling strategy in prior phenomenologically oriented studies, where the comparison groups were selected to include healthy controls, or cases with attention-deficit/hyperactivity disorder without a comorbid mood disorder. Although these more homogeneous and distilled groups provide valuable comparators for phenomenology or for neurocognitive studies, they are quite different in composition and functioning than clinically representative samples. The high rates of stress, mood, anxiety, attention problems, and aggressive behavior in a clinically representative sample were expected to increase scores on measures attempting to detect mania, creating more “false positives” and reducing the diagnostic specificity.

Both expectations proved correct. Bipolar Spectrum presentations were far more common than bipolar I cases in community mental health, and the average scores on all bipolar screening measures tended to be slightly lower in the bipolar group in the CMHC setting, with large individual differences (and big standard deviations). Similarly, the average scores were higher in nonbipolar cases in the CMHC, reflecting the high degree of distress and complexity in these cases. These combined to lower the diagnostic efficiency of all instruments examined (see Table 1). Secondary analyses of the preliminary data submitted with this R01 proposal demonstrated that the effects of sample composition on the diagnostic efficiency performance of instruments could be marked (Youngstrom, Meyers, Youngstrom, Calabrese, & Findling, 2006). The performance observed in the present sample was lower than reported in the prior validation studies. The decrement ranged from small (e.g., area under the curve for the PGBI Hypomanic/Biphasic 28 item score changing from .78 in Youngstrom et al., 2004 to .77 in the present data) to moderate (AUROC of .86 for the 10 item PGBI mania scale in its development sample (Youngstrom, Frazier, Demeter, Calabrese, & Findling, 2008), down to a .79 in the present sample) and in some cases, substantial – such as the P-YMRS dropping to an AUROC of .67. The P-YMRS performance was sufficiently poor that it is clearly statistically inferior to all the other caregiver report measures examined, and we have stopped using it clinically as a result. The “CBCL Bipolar Profile” index (summing or averaging the Aggressive Behavior, Attention Problems, and Anxious/Depressed Scales from the CBCL) (Mick, Biederman, Pandina, & Faraone, 2003) distinguished the bipolar from nonbipolar cases significantly better than chance, but significantly less well than all the other mood scales (except for the PYMRS), AUROC = .67. Overall, it was encouraging that all of the caregiver-reported measures continued to demonstrate statistical validity, despite the changes in sample composition and demographics.

Caregiver Report. Table 2 reports the percentage of maximum possible (POMP) (Cohen, Cohen, Aiken, & West, 1999) scores on all of the caregiver reported measures that specifically assess the mood symptoms of the youth, comparing scores for youths meeting full DSM-IV criteria for bipolar I (n=28) with youths meeting criteria for another bipolar spectrum disorder (e.g., bipolar II, cyclothymic disorder, or bipolar NOS; n=116), unipolar depression or dysthymic disorder (n=221), ADHD or a disruptive behavior disorder without comorbid mood (n=353), or all other cases presenting to the clinic (n=72). Across all six measures investigated, the two bipolar groups were significantly more elevated than the other three non-bipolar groups, but the bipolar I group was never significantly different from the rest of the bipolar spectrum. Tables 3 and 4 report the same breakdowns for the CBCL broad band, clinical syndrome scales, and DSM-Oriented scales. Examination of the tables indicates that the differences are more moderate on non-mood oriented scales.

Youth Self Report. Another way of measuring manic symptoms would be to rely on self report by the youth. This is consistent with how most of adult psychiatry conducts diagnostic screening or assessment. Analyses of the preliminary data for this grant (Youngstrom et al., 2004) suggested that youth report tends to discriminate bipolar from nonbipolar cases at better than chance rates, but significantly less well than parent report. This trend was also observed in an interim analysis of the data gathered under the auspices of this grant (Youngstrom et al., 2005), and it aligns with the effect sizes observed when comparing caregiver versus youth or teacher report about cases with bipolar disorder (Geller, Warner, Williams, & Zimerman, 1998; Hazell, Lewin, & Carr, 1999). The final analyses from this project continue to fit that pattern. As shown in Table 1, youth self report showed substantially lower AUROC values than did parent report. In the case of a self-report version of the YMRS, scores did not do better than chance at separating the bipolar spectrum cases, indicating a statistically “invalid” test (Kraemer, 1992). We hypothesize that there are two factors undermining the validity of self report about manic symptoms: One is that compromised insight is a feature of mania (Dell'Osso et al., 2002), and the other is that many of the symptoms of mania are things that would bother other people before they become subjectively noticeable to the person experiencing them. The adolescent with mania is not going to be the first person to notice when they are talking “too much,” or when their own mood seems irritable. Tables 5, 6, and 7 provide average scores on youth reports compared across diagnostic subgroups.

Teacher Report. A third source of information this project examined was teacher report. Findings with the Achenbach Teacher Report Form (Achenbach & Rescorla, 2001) replicate prior work in three important ways: (a) youths with bipolar disorder tend to show significant elevations on multiple behavior problems in school settings (e.g., T-scores higher than 50); (b) teacher-parent agreement was as good or higher with regard to bipolar spectrum cases (r=.38) versus nonbipolar cases (r=.30, both comparing favorably to meta-analyses and normative data about teacher-parent agreement); and (c) teacher report operated at near-chance levels in discriminating bipolar from nonbipolar cases (nonsignificant AUROCs in ROC analyses).

A secondary aim of this project was to develop and test “second generation” mania rating scales that could be completed by teachers. Results indicated that it is difficult to measure manic symptoms using teacher report. Many items were difficult to rate, and teachers frequently skipped them. Reliability estimates were lower and factor structures often quite different in teacher report versus the corresponding caregiver or youth report on the same measure. AUROC values on teacher report were  consistently low and usually no better than chance at discriminating bipolar disorder from other cases (Youngstrom, Joseph, & Greene, 2008). Tables 8, 9, and 10 compare averages on teacher measures across diagnostic subgroups.

Hypothesis 2: An abbreviated screening instrument can be developed on the basis of community sample data that will retain comparable levels of diagnostic efficiency versus a lengthier screening instrument when cross-validated in an independent, prospective sample.

During the time that the grant was enrolling, there were several new measures developed and advertised for the assessment of bipolar disorder in youths. These included the parent version of the Mood Disorder Questionnaire (Hirschfeld et al., 2000), which performed better as a parent report instead of self report instrument (Wagner, Findling, Emslie, Gracious, & Reed, 2006; Youngstrom, et al., 2005); as well as the Child Mania Rating Scale (Henry, Pavuluri, Youngstrom, & Birmaher, 2008; Pavuluri, Henry, Devineni, Carbray, & Birmaher, 2006) and other instruments. The protocol included the PGBI, the P-MDQ, and added the CMRS when it became available. When the time came to recommend an instrument for validation as a general screener for bipolar disorder, the Mood Disorder Questionnaire had become a commercially distributed instrument. There were more data available and published on the PGBI than any other instrument except the CBCL, which the PGBI had outperformed in all direct comparisons for the purpose of detecting bipolar spectrum disorder. The combination of validity evidence and low cost led to the selection of a 10 item Mania Scale version of the PGBI (PGBI-10M; Youngstrom, Frazier, et al., 2008) as the tool of choice for cross-validation in a community mental health sample. The 10 item mania scale became a standard part of the intake for the last two years of the grant. It demonstrated excellent psychometric properties, with a Cronbach’s alpha of .90, a one month retest reliability of .62, and an Area Under the Curve of .78 for discriminating bipolar versus all other cases based on a diagnostic interview some weeks later, versus an AUROC of .80 for a concurrently administered version of the PGBI.

Aim #2: To clarify the characteristic features of “subsyndromal” bipolar spectrum disorders, currently referred to as “Bipolar NOS” (Nottelmann et al., 2001), in both research and community settings.

Hypothesis 3: Taxometric statistical procedures will identify a group of children and adolescents experiencing a categorically different pattern of manic and depressive symptoms than expressed by the rest of the participants.

The taxometric statistical methods concentrated on the parent report measures, because these had demonstrated the largest effect sizes in the diagnostic efficiency analyses (i.e., largest AUROC values). Youth self report and teacher report often failed to perform better than chance when comparing groups based on DSM definitions. Although it is possible that other categories exist that would be associated with youth or teacher report, it is highly unlikely that existing measures would provide large effect sizes for sorting cases into previously unsuspected, atheoretical categories. The clinician-rated measures were disqualified for a different reason: Other simulation studies and investigations have found that clinician-rated instruments can create taxonic or continuous data, depending on how raters are trained or how expectations are framed. The emergence of a taxon using data drawn from the KSADS interviews thus would require cross-validation by an independent source of information (such as caregiver report on rating scales that were recused from the consensus diagnostic process). Thus caregiver report was the most likely and most independent source of information for taxometric analyses.

Based on two methods, Maximum Eigenvalue (MAXEIG) and Mean Above Minus Below A Cut (MAMBAC), the parent report data about manic, hypomanic, and mixed symptoms in the youths conformed to a dimensional model. Table 11 presents the power analysis for the taxometric procedures, which ranged from “acceptable” (d=1.6) to high (d=2.7). See Figures 1 and 2 for the visual plots of the results, as well as simulations for categorical and dimensional solutions.

We plan to conduct additional analyses using latent class and latent profile models to search for subgroups, but based on the present findings, we expect that a graded class model where cases are best distinguished by their underlying level of overall mania (similar to Krueger’s findings for other psychopathology constructs) (Krueger, Markon, Patrick, & Iacono, 2005) are likely to be the best fitting models. The dimensional results are also consistent with recently published analyses of adult epidemiological data on mania (Prisciandaro & Roberts, 2010). A dimensional model is also most consistent with emerging findings about the polygenic nature of bipolar disorder, with a large number of genes that each contribute small amounts of risk (WTCCC, 2007).

Hypothesis 4: Bipolar spectrum disorders will be associated with the highest average level of impairment of any major diagnostic category in a community sample.

Consistent with the hypothesis, both the bipolar I and other bipolar spectrum disorders were associated with the lowest level of average functioning, whether quantified as the Global Assessment of Functioning at the end of the KSADS interview, or using the “most severe past” or “highest level of functioning in the past year” Global Assessment Scale (CGAS; Shaffer et al., 1983) ratings. As was the case with the mood ratings (see Table 12A), the bipolar I cases were not significantly different from the rest of the bipolar spectrum on any of these measures of functioning. Table 12B summarizes means across diagnostic subgroups. More than 90% of cases with bipolar disorder reported impairment in more than one setting. Supplemental analyses found that when bipolar was comorbid with ADHD, it was not associated with an increase in the number of settings where the youth experienced impairment. The same pattern held with ODD compared to bipolar+ODD; see Tables 13 and 14 for details.

Aim #3: To investigate developmental changes in symptom occurrence and presentation across the age-span from ages 5 to 17 years (including an examination of the validity of this diagnosis in terms of functional impairment and longitudinal course).

Hypothesis 5: There will be identifiable continuity in the type and pattern of symptoms presented by youths meeting BPSD criteria across the agespan, although this may require the use of different assessment strategies at different ages.

To examine age effects on symptom presentation, we grouped the sample into “prepubertal” (5 to 8 years), “peri-pubertal” (9 to 12 years), and Adolescent (13 to 18 years). When limiting the sample to youths identified as being on the bipolar spectrum per the LEAD diagnoses, there was evidence of significantly higher levels of depression in the peri-pubertal or adolescent groups than the pre-pubertal group, whether measured on the caregiver PGBI Depression scale (see Table 15), the KSADS Depression Rating Scale for past two weeks and worst lifetime, or the CDRS-R (trend) (Table 16). Findings were the opposite for interview-based ratings of mania in the past two weeks, with the adolescent group showing significantly lower ratings than the other two age-groups on the YMRS or KSADS Mania Rating Scale for the current episode. The “worst lifetime” mania ratings showed higher average scores in the older two groups, which makes sense given that these scores either matched the current episode or were higher if there was a more severe past episode. Age trends were not evident on parent report of manic symptoms. The general pattern of increasing depressive symptoms with puberty is consistent with a large body of research (Cyranowski, Frank, Young, & Shear, 2000; Kovacs, 1989). The tendency to see decreases in manic symptoms with age matches clinical observations by Kraepelin (1921, Figure 46, p. 169), and also is consistent with recent analyses of two large USA epidemiological data sets (Cicero et al., 2009). Further investigation is needed to investigate the extent to which these patterns might be influenced by developmental offset of ADHD, which can inflate scores on some symptoms and measures of mania more than others.

A secondary set of analyses also investigated differences across age cohorts in the symptoms reported by the primary caregiver, using raw scores on the CBCL. We used raw scores because the standard scores are age-normed, eliminating age trends in behavior by design. Tables 17 and 18 present the averages and eta-squared effect sizes for the clinical syndrome scales and broad bands. The largest effects were lower levels of Aggressive Behavior and Social Problems in adolescence, and increasing levels of Withdrawn behavior in adolescence. Scores on the CBCL “Bipolar Profile” (now referred to as the “Emotional Dysregulation Profile”) also were significantly lower in adolescence. When examining the DSM-Oriented Scales, there were significantly lower averages on the Attention Problems and the Somatic Problems scales in adolescence (although the corresponding clinical syndrome scales, with broader item content, did not achieve statistical significance). No other differences achieved statistical significance.

A set of secondary analyses are examining the possibility that responses to specific symptoms change developmentally. This more fine-grained investigation is using “Differential Item Functioning” (DIF) techniques developed in the psychometric literature. Analyses conducted so far have focused on parent reported scores, because they provide consistent methodology across the largest age range. DIF could take two forms, sometimes described as “uniform” and “nonuniform” DIF. Uniform DIF describes the situation when there are average differences in the item score between groups (in this case, younger versus older youths) even after controlling for levels of mania. Put another way, there would be differences in average scores on the item in question due to extraneous factors besides mania. Approximately 30% of mania items across four instruments showed evidence of uniform DIF associated with age. Interestingly, roughly half of the items had higher scores in the young group, and half in the older group, resulting in biases that tended to cancel out at the level of total scores. Items asking about distractibility, energy, motor activity tended to have higher scores in younger cases, independent of level of mania. Conversely, items assessing sleep disturbance, increased sociability, hypersexuality, mood swings, irritability, and suspiciousness tended to be higher in the adolescents even after controlling for level of mania.

Non-uniform DIF refers to when the correlation between the item and the mania factor changes between groups. A similar number of items show evidence of non-uniform DIF related to age. Again, the tendency was for each instrument to include some items showing DIF in the opposite direction, with a net result being fairly consistent estimation of total scores within the same scale. On a global level, developmental changes in the rate of symptoms appear not to change the overall performance of rating scales appreciably, although there is evidence of more fine-grained developmental patterns that seem face valid and appear to replicate across instruments.

Reliability and Credibility of Information Sources Across Age Cohorts

As expected, caregiver report was the main source of information that could be used consistently across the ages from 5 to 18 years. Youth self-report on written questionnaires proved impractical before the age of 11 on most instruments. An innovation in this project was to keep track of the degree of assistance needed for the youth or caregiver to complete each instrument. Teacher report proved to be the least informative source of information about mood symptoms, despite our efforts to adapt instruments with good content coverage for use by teachers. Detailed evaluation of the psychometrics of four mania scales adapted for use by teachers have already been published (Youngstrom, Joseph, &  Greene, 2008).

Interview-based information also could be used across the age-span, but the interviews were substantially different for young children versus adolescents. Often with young children the interview changed into a less structured interaction that provided an opportunity to conduct behavioral observations and probe key symptoms or events, whereas with adolescents it was usually possible to conduct an interview asking all the symptoms in a standard order. Based on this observation, we amended the protocol to start to systematically gather observational data using the Guide to the Assessment of Test Session Behavior (GATSB; Glutting & Oakland, 1993) for the last 100 cases, providing intriguing pilot data about the value of systematic observational ratings (consistent with the recent findings from other NIMH sponsored work, e.g., the DB-DOS)(Keenan & Wakschlag, 2002). We also had the interviewer rate the perceived credibility of the caregiver and the youth as a source of clinical information during the KSADS interview.

At the end of the KSADS, interviewers rated 55% of caregivers “good” credibility, 37% “fair,” and 8% “poor;” versus 24% of youths “good,” 47% “fair,” and 30% “poor.” Credible youths tended to have credible caregivers, chi-squared (4 df) = 48.86, p<.00005; but there were still frequent mismatches, kappa=.16. Interviewers perceived caregivers as much more credible on average for the young children (ages 5 to 10 years): Cohen’s d=1.15, p<.00005; whereas caregivers were only slightly more credible for the older youths, d=.27, p=.001.

Significant correlates of caregiver credibility included: younger age youth, better family functioning, better youth functioning, higher caregiver income or education, fewer children, and lower concerns about youth depression or manic symptoms. Regression analyses indicated that a combination of factors could explain 17% of the variance in caregiver credibility (p<.00005), with family functioning (rpart =.26), youth age (rpart=-.21), credibility of the youth (rpart =.18), and caregiver education (rpart =.09) making significant unique contributions. Caregiver credibility was unrelated to caregiver mood symptoms, youth cognitive ability, or independent observations of youth behavior problems (cf. Youngstrom, Loeber, & Stouthamer-Loeber, 2000).

Significant correlations of youth credibility were mostly different from the predictors of caregiver credibility, and included: older age youth, female youth, not having a diagnosis of ADHD or bipolar disorder, lower CBCL externalizing or attention problems, higher caregiver education, having a male primary caregiver, lower caregiver report of manic symptoms, and higher self-report of manic or depressive symptoms. A subset of cases also completed a brief intelligence test and had observational ratings of behavior available. Greater youth credibility was strongly associated with higher cognitive ability (r=.31) and less behavior problems during the KSADS interview (r=.43) or when watched by a different person while the caregiver was completing the KSADS (r=.31). Regression analyses indicated that factors could account for 22% of the credibility in youth report, with age being the strongest determinant. Controlling for youth age eliminated all other correlates except for caregiver credibility (rpart=.18), with age remaining a powerful predictor (rpart=.39). For the subset with cognitive ability and behavioral observations available, the regression explained a similar amount of variance, with age, cognitive ability, and observational ratings of behavior each making unique contributions, but caregiver credibility no longer significant.

Associations Between Credibility and Validity: Ratings of caregiver credibility were strongly related to the validity of caregiver report on mood and behavior checklists. Validity coefficients for caregiver reported manic symptoms changed from .28 for poor credibility to .49 for good credibility when comparing GBI to YMRS ratings, and .51 to .63 for GBI to CDRS ratings. Areas Under the Curve in ROC analyses changes from .61 (ns) to .80 (p<.0005) for poor versus good credibility when comparing GBI scores to bipolar diagnoses. Caregiver-youth and caregiver teacher correlations all significantly increased when comparing good credibility to poor credibility caregivers, and this pattern was found across ratings of externalizing, internalizing, and attention problems as well as for manic and depressive symptoms (all p<.05) (cf. Achenbach, McConaughy, & Howell, 1987). Similar patterns were observed with youth credibility. Interestingly, these patterns were not due to changes in the internal consistency of checklist scores: Cronbach’s alpha was often significant higher (p<.05 based on Feldt’s test) for the low credibility informants (Feldt, 1969). This suggests that many of the “low credibility” informants may have been sticking with a response set, such as minimizing or exaggerating problems, rather than providing nuanced information.

Hypothesis 6: Bipolar spectrum disorders will be associated with progressively greater functional impairment, increased service utilization, increased burden, and decreased quality of life over time.

The hypothesis that bipolar disorder is associated with greater burden and service utilization was partially supported. Bipolar disorder showed the greatest impact of any mental health condition on quality of life, matched only by unipolar depression, and exceeding almost all physical illnesses and disabilities for which benchmarking data are available (Freeman et al., 2009). Tables 19, 20, and 21 examine means by diagnostic subgroup. Table 22 presents impairment as a function of age cohort. Bipolar disorders were also associated with the highest levels of impairment on the Sheehan Disability Scales.

However, there were no significant differences by age group on the level of impairment. Future analyses will investigate more fine-grained dissection of impact on specific domains by particular versions of bipolar disorder or distinct symptom clusters.

Secondary aims:

Base rate for juvenile BPSD in a community mental health setting

A total of 620 families completed the KSADS evaluation at the community mental health center. The average youth met criteria for 3.7 Axis I diagnoses (median = 4, SD = 1.7). The most common category was externalizing disorders (67% of cases), with 65% meeting DSM-IV criteria for ADHD, 37% for ODD, and 13% for conduct disorder. Mood disorders were the second most common category, affecting 41% of youths, with anxiety disorders present in 26% of cases. Of the mood disorders, unipolar depression was the most common, affecting 11% of the sample, followed by minor depression (8%), bipolar NOS (5%), dysthymic disorder (4%), cyclothymic disorder (4%), bipolar I (3%), bipolar II (1%), and the remaining 5% consisting of a mixture of other mood issues not on the bipolar spectrum. Bipolar spectrum illnesses affected 13% of the sample (n=79).

The bipolar NOS cases (n=29) were primarily considered NOS due to inadequate duration of hypomanic episodes (83%), versus having inadequate severity for mania (43%) or insufficient numbers of symptoms at threshold (30%). The majority of these NOS cases (n=18, 62%) would have met criteria for bipolar II disorder if the duration threshold for the hypomania were lowered to 2 days, as is currently being discussed in DSM-5 and was recommended in the ISBD Diagnostic Workgroup Papers (Ghaemi et al., 2008; Youngstrom, Birmaher, & Findling, 2008). The NOS label, as applied in this study, did not include chronically irritable youths without evidence of episodes or changes in mood or energy: 92% of NOS cases showed clear evidence of a cyclical mood course, with at least spontaneous shifts in mood and energy.

Among the bipolar spectrum cases, 78% showed evidence of episodic elated mood, 69% of episodic irritable mood, and 44% showed grandiosity. This pattern was consistent across all four bipolar spectrum subgroups. There were no sex differences in rates of each type of bipolar, nor were their race/ethnicity differences in rates. There were trends for bipolar I and II to be more common in the older youths, and for cyclothymic disorder or bipolar NOS to be more common in the younger youths, but none of these trends achieved statistical significance. All of these patterns appear consistent with general trends emerging in the literature (Goodwin & Jamison, 2007; Kowatch, Youngstrom, Danielyan, & Findling, 2005).

Diagnostic Agreement

Goal: To examine the diagnostic agreement between diagnoses achieved via widely used methods: diagnosis as usual (DAU, reflecting the clinical standard of care), the KSADS-PL (the most widely used research instrument in projects evaluating mood disorder), and a Longitudinal Expert evaluation of All available Data (LEAD; reflecting the consensus diagnosis of clinical experts and raters after reviewing all available information).

There were three different sets of diagnoses available for the cases interviewed at the CMHC: Intake diagnoses, KSADS diagnoses, and LEAD diagnoses. Intake diagnoses were “diagnosis as usual,” conducted following the normal procedures for the agency. More than 80% of the intakes were conducted by a masters level licensed clinical counselor or social worker with more than 10 years of experience, and almost all of the remainder were conducted by predoctoral psychology interns. The DAU intake interview consisted of an appointment of 90 minutes, of which roughly 30 minutes was spent completing consent forms and billing or insurance paperwork, and the remainder of which comprised an unstructured clinical interview assessing the presenting problem, prior developmental and treatment history, risk factors, strengths, and generating a five axis diagnostic formulation and initial treatment recommendations.

The KSADS diagnoses were conducted by highly trained raters, either key personnel or predoctoral psychology interns electing to participate in the training as a formal year-long rotation offered during the internship year. Training focused heavily on the conceptualization of bipolar disorder (including 12 hours of didactic training based on the same content as the 7 hour continuing education offering presented by E. Youngstrom at the American Psychological Association, National Association of School Psychologists, Canadian Psychological Association, and Association for Behavioral and Cognitive Therapy conventions, but with expanded time for question and answer). The training emphasized use of DSM-IV diagnostic criteria, including the consideration of cyclothymic disorder and bipolar Not Otherwise Specified as well as bipolar II and bipolar I. Trainees learned about the “narrow phenotype” definition in 2003 with the publication of the Leibenluft et al. nosology (Leibenluft, et al., 2003), along with the varying definitions of the “broad phenotype” and “severe mood dysregulation.” However, the diagnostic definitions remained the same for the entire course of the grant, focusing on strict DSM criteria. Drift was avoided by using training videos from the early years of the grant as part of the training and certification process for subsequent years of the grant, as well as having the same two lead diagnosticians (EAY and JKY) leading the training for all years of the grant. Also important to note, the conceptualization of bipolar disorder always included the requirement that mood be episodic, involving a clear change from prior functioning. If the same symptom, such as poor concentration or motor agitation, followed a chronic course, then it was coded in a different section of the KSADS and not counted as well in the corresponding mood section. This is a neo-Kraepelinian formulation of bipolar disorder, emphasizing an episodic presentation. It is different from other formulations that allow a more insidious onset and chronic course. It also was expected that the emphasis on “finding each symptom the best home” in the KSADS would reduce the observed rates of comorbidity between bipolar and anxiety disorders or ADHD compared to rates found when using the KSADS without the same emphasis on episodicity.

All raters completed the same training process, which required rating along with at least five interviews (watched live or from videorecording) conducted by a reliable rater, scoring along, and achieving an item-level kappa of .85 or greater. Once the trainee demonstrated the ability to match reliable ratings on five interviews, then the trainee led an interview with a reliable rater scoring along in real time. This next stage was crucial because of the semi-structured nature of the KSADS, which gives the interviewer flexibility to rephrase questions and involves a fair degree of clinical judgment in deciding when sufficient information has been elicited to score the symptom or module. Kappa evaluated agreement at the item level, and all scoring discrepancies were discussed. The reliable rater’s scores were the source document for the main dataset. Once the trainee led five interviews where their scores and the reliable rater’s scores converged kappa >= .85, then the trainee “graduated” to lead interviews independently. This training process was extremely time intensive, taking nine months to complete the training of the first cadre of raters, and never taking less than four months to complete the training. Once raters were reliable, they co-rated a live interview or re-rated from a videotape on a quarterly basis to avoid drift.

More than thirty interns completed the training process, and only one withdrew from the rotation. Interns typically became reliable in 12 interviews (6+6). Feedback from the interns was consistently positive, with comments indicating that they found the training in diagnostic evaluation and conceptualization to be the most rigorous yet clinically useful that they had received in their training sequence. The CMHC also viewed the training as a valuable benefit of the collaboration between the agency and the university: The grant created a new training rotation for the internship which almost all subsequent interns selected, and it created new training materials and experiences that have continued to benefit the agency.

KSADS scoring involved the same person interviewing both the caregiver and the youth. If there were significant discrepancies between the item scores based on the separate interviews, the rater used clinical judgment to resolve the discrepancy. They also were encouraged to re-interview either informant, or to discuss discrepancies with both simultaneously. After generating the summary scores for all symptoms and impairment criteria, DSM-IV algorithms were applied to arrive at the KSADS diagnoses. If the rater’s clinical impression was different than the diagnosis established by the algorithm, they wrote down their impression in addition to keeping the diagnosis from the KSADS, and both pieces of information were brought to the consensus conference. The reliable rater recorded the KSADS diagnoses at the end of each interview, while still blind to other information such as the family mental health history, prior treatment history from the medical record, and all scores on the rating scales.

The LEAD consensus meeting process used a team format. The reliable rater met with a licensed clinical psychologist (JKY, EAY, OIM, or NCF), along with a second person who had gathered the family history (either a research assistant or one of the predoctoral interns completing a fully-structured interview including the MINI and the Family History RDC), and photocopied the relevant prior diagnostic and treatment history from the medical record. The rating scales stayed recused from the diagnostic process, to ensure that the criterion diagnoses were “blind” to the instruments that would be evaluated as diagnostic predictors (Bossuyt et al., 2003). Final diagnoses were based on the KSADS, but could be modified on the basis of the interviewer’s clinical impressions, or additional information about family history or treatment history. The situations where a modification was most likely to occur involved diagnoses that are not formally assessed on the KSADS, such as pervasive developmental disorders (although most cases with clear pervasive developmental disorders were referred to other agencies prior to intake and so were unlikely to participate in the project).

These three forms of diagnosis represent markedly different processes, ranging from a routine unstructured interview billable under CPT code 90801, to a semi-structured KSADS that required most of a day for the rater and family (and involving a minimum of 16-20 days of training and supervision spread over a period of months), to a consensus diagnosis that included all of the time, training and expense of the KSADs, along with an added person’s time gathering the family history and chart, and a licensed psychologist to spend an hour to review and synthesize the information.

A recently published meta-analysis compared the agreement between unstructured clinical diagnoses and semi-structured or structured diagnostic interview results (Rettew, Lynch, Achenbach, Dumenci, & Ivanova, 2009). Typical agreement in the meta-analysis was kappa = .29 for externalizing disorders, .28 for internalizing disorders, but only .14 for affective disorders specifically (including bipolar disorder). Compared to these benchmarks, DAU fared slightly better than expected in terms of agreement with the KSADS or LEAD diagnoses for ADHD (kappa = .43 with LEAD, p < .00005) or disruptive behavior disorders (kappa = .35, p < .00005). However, DAU fared poorly with bipolar disorder: Kappa = .02, p < .01. Contrary to the popular concern about overdiagnosis of bipolar disorder, DAU was extremely conservative, only identifying bipolar in one case. This resulted in a diagnostic specificity of 100%, but a sensitivity of only 1%. Discussion with the clinicians clarified that they often had concerns about mood issues with a youth, but were highly reluctant to label a case as “bipolar” on the basis of a standard unstructured diagnostic interview. Instead, they were likely to diagnose “mood disorder Not Otherwise Specified,” and then often refer for psychiatric evaluation. When comparing this broader definition, combining bipolar disorder with mood NOS to form a “possible bipolar” group, then kappa with the LEAD bipolar spectrum diagnosis rose to .32, but with sensitivity only rising to 33% (indicating that the majority of bipolar cases remained undetected). Specificity appeared to stay high at 91%, but due to the low base rate of bipolar disorder at the CMHC, that still meant that the majority of cases (52%) identified as “possible bipolar” would actually be false positives.

Secondary Aim: To promote the translation of assessment procedures from research tools to clinically useful instruments that are realistic to use in community mental health settings.

Agreement between research diagnoses and clinical diagnosis was better than chance, but low, consistent with meta-analytic findings. DAU demonstrated extremely poor sensitivity to bipolar disorder, combined with a high false positive rate. On the basis of these findings, along with the promising results from the development of the screening instruments, the agency actually modified its intake procedure in several ways. The PGBI-10M became a standard part of the intake paperwork, and high scores on it, or else a clinical impression that suggests possible bipolar disorder, trigger a referral to a new diagnostic evaluation clinic where a specialist conducts a KSADS interview and gathers developmental and family history. Because the screening and clinical intake are sufficient to demonstrate medical necessity, the KSADS interview becomes a reimbursable service for Medicaid and other third party payors, billable under CPT Code 90801. This has navigated the transition from a grant sponsored activity to a self-sustaining clinical service that improves the diagnostic accuracy for the agency.

Other Factors Potentially Changing Performance of Diagnostic Instruments

As described above, there were cohort effects where depression severity tended to increase with age, versus mania severity tending to stay consistent or perhaps decrease slightly in adolescence. Investigation of differential item functioning in the manic symptoms indicated that some symptoms tend to increase in adolescence even after controlling for levels of underlying mania, and other symptoms decrease with age – with the net effect often cancelling out within the same scale.

Similar analyses revealed no differences in total scores for mania comparing males versus females or White versus Black youths. DIF analyses at the item level revealed few examples of uniform or non-uniform bias in manic symptoms comparing across gender or ethnicity.

Initial investigation of the role of parent mental health history suggests that the caregiver’s own history of mood disorder does not significantly change the validity of their report of the child’s functioning. The adult’s current mood status showed weak (r<.2) correlations with discrepancies between their perception and the youth’s own perception, consistent with prior investigations (Youngstrom, Ackerman, & Izard, 1999; Youngstrom, et al., 2000). Although more detailed analyses are needed to describe potential moderators of accuracy, the overarching pattern is that caregiver report is significantly more valid than youth or teacher report of manic symptoms, and this advantage persists even when the caregivers have a history of mood disorder themselves.

Demographics and Participants

The proposed research involves coordinated data collection at two sites: the research clinic at Case Western Reserve University/University Hospitals of Cleveland (CWRU/UHC), and the Cuyahoga County offices of Applewood Centers, Inc. (ACI). The integration of these two sites into one project creates a unique opportunity to translate clinical research into a more representative and applied setting. These two sites are complementary in crucial ways. CWRU/UHC has a referral system that is heavily enriched for BPSD (49% of 749 pilot cases meeting criteria for BPSD, and 18% for bipolar I), making it possible to   investigate a relatively infrequent disorder with smaller samples. Recruitment is enhanced by having a Stanley Foundation CCRC, having an adult Stanley-funded mood clinic (which refers children of identified bipolar parents for evaluation), having a variety of clinical trials available to provide free treatment to youths with bipolar disorder, having a national reputation for research in bipolar disorders, and having an extensive referral network throughout the region. The CWRU/UHC catchment area for mood disorders extends from Erie, PA to Toledo, OH, with many families driving three to four hours for treatment. However, ACI is a community mental health center (CMHC) that is both demographically more diverse and more representative of an urban community: 75% of the clients served by ACI are of diverse backgrounds, and 90% of families receive Medicaid. CWRU/UHC has the research infrastructure to complete detailed evaluations, and we propose adding a longitudinal component to this evaluation. ACI serves a dramatically larger volume of patients, with one county’s offices processing 10 to 20 times as many cases as CWRU/UHC per year. The research plan uses CWRU/UHC as a laboratory to develop and test measurement strategies, and ACI as the “field testing” to determine what is practical in real life community settings. The proposed design delivers on the promise of cross-validating measures in independent samples. This research will produce tools that will replicate better across research groups, as well as generalize to applied settings.

= Inclusion Enrollment Report Table =

References

Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/Adolescent behavioral and emotional problems: Implication of cross-informant correlations for situational specificity. Psychological Bulletin, 101, 213-232.

Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA School-Age Forms & Profiles. Burlington, VT: University of Vermont.

Birmaher, B., Axelson, D., Goldstein, B., Strober, M., Gill, M. K., Hunt, J., et al. (2009). Four-Year Longitudinal Course of Children and Adolescents With Bipolar Spectrum Disorders: The Course and Outcome of Bipolar Youth (COBY) Study. American Journal of Psychiatry, appi.ajp.2009.08101569.

Blader, J. C., & Carlson, G. A. (2007). Increased rates of bipolar disorder diagnoses among U.S. child, adolescent, and adult inpatients, 1996-2004. Biological Psychiatry, 62(2), 107-114.

Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., et al. (2003). Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. British Medical Journal, 326(7379), 41-44.

Cohen, P., Cohen, J., Aiken, L. S., & West, S. G. (1999). The problem of units and the circumstances for POMP. Multivariate Behavioral Research, 34, 315-346.

Cyranowski, J. M., Frank, E., Young, E., & Shear, K. (2000). Adolescent onset of the gender difference in lifetime rates of major depression. Archives of General Psychiatry, 57(1), 21-27.

Dell'Osso, L., Pini, S., Cassano, G. B., Mastrocinque, C., Seckinger, R. A., Saettoni, M., et al. (2002). Insight into illness in patients with mania, mixed mania, bipolar depression and major depression with psychotic features. Bipolar Disorders, 4, 315-322.

Feldt, L. S. (1969). A test of the hypothesis that Cronbach's alpha or Kuder-Richardson coefficient twenty is the same for two tests. Psychometrika, 34(3), 363-373.

Freeman, A. J., Youngstrom, E. A., Michalak, E., Siegel, R., Meyers, O. I., & Findling, R. L. (2009). Quality of life in pediatric bipolar disorder. Pediatrics, 123(3), e446-452.

Geller, B., Tillman, R., Bolhofner, K., & Zimerman, B. (2008). Child bipolar I disorder: prospective continuity with adult bipolar I disorder; characteristics of second and third episodes; predictors of 8-year outcome. Archives of general psychiatry, 65(10), 1125-1133.

Geller, B., Warner, K., Williams, M., & Zimerman, B. (1998). Prepubertal and young adolescent bipolarity versus ADHD: Assessment and validity using the WASH-U-KSADS, CBCL and TRF. Journal of Affective Disorders, 51(2), 93-100.

Geller, B., Zimerman, B., Williams, M., Bolhofner, K., Craney, J. L., DelBello, M. P., et al. (2001). Reliability of the Washington University in St. Louis Kiddie Schedule for Affective Disorders and Schizophrenia (WASH-U-KSADS) mania and rapid cycling sections. Journal of the American Academy of Child & Adolescent Psychiatry, 40(4), 450-455.

Ghaemi, S. N., Bauer, M., Cassidy, F., Malhi, G. S., Mitchell, P., Phelps, J., et al. (2008). Diagnostic guidelines for bipolar disorder: a summary of the International Society for Bipolar Disorders Diagnostic Guidelines Task Force Report. Bipolar disorders, 10(1 Pt 2), 117-128.

Glutting, J., & Oakland, T. (1993). Guide to the Assessment of Test Session Behavior for the WISC-III and WIAT. San Antonio: The Psychological Corporation.

Goodwin, F. K., & Jamison, K. R. (2007). Manic-depressive illness (2nd ed.). New York: Oxford University Press.

Hazell, P. L., Lewin, T. J., & Carr, V. J. (1999). Confirmation that Child Behavior Checklist clinical scales discriminate juvenile mania from attention deficit hyperactivity disorder. Journal of Paediatrics and Child Health, 35(Apr), 199-203.

Healy, D. (2006). The latest mania: selling bipolar disorder. PLoS Medicine, 3(4), e185.

Henry, D. B., Pavuluri, M. N., Youngstrom, E., & Birmaher, B. (2008). Accuracy of brief and full forms of the Child Mania Rating Scale. Journal of Clinical Psychology, 64(4), 368-381.

Hirschfeld, R. M., Williams, J. B. W., Spitzer, R. L., Calabrese, J. R., Flynn, L., Keck, P. E. J., et al. (2000). Development and validation of a screening instrument for bipolar spectrum disorder: The mood disorder questionnaire. The American Journal of Psychiatry, 157(11), 1873-1875.

Kaufman, J., Birmaher, B., Brent, D., Rao, U., Flynn, C., Moreci, P., et al. (1997). Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version (K-SADS-PL): Initial reliability and validity data. Journal of the American Academy of Child & Adolescent Psychiatry, 36(7), 980-988.

Keenan, K., & Wakschlag, L. S. (2002). Can a valid diagnosis of disruptive behavior disorder be made in preschool children? The American Journal of Psychiatry, 159(3), 351-358.

Kluger, J., & Song, S. (2002). Young and bipolar. Time(August 19), 39-47, 51.

Kovacs, M. (1989). Affective disorders in children and adolescents. Special Issue: Children and their development: Knowledge base, research agenda, and social policy application. American Psychologist, 44(2), 209-215.

Kowatch, R. A., Youngstrom, E. A., Danielyan, A., & Findling, R. L. (2005). Review and meta-analysis of the phenomenology and clinical characteristics of mania in children and adolescents. Bipolar Disorders, 7(6), 483-496.

Kraemer, H. C. (1992). Evaluating medical tests: Objective and quantitative guidelines. Newbury Park, CA: Sage Publications.

Kraepelin, E. (1921). Manic-depressive insanity and paranoia. Edinburgh: Livingstone.

Krueger, R. F., Markon, K. E., Patrick, C. J., & Iacono, W. G. (2005). Externalizing psychopathology in adulthood: a dimensional-spectrum conceptualization and its implications for DSM-V. Journal of abnormal psychology, 114(4), 537-550.

Leibenluft, E., Charney, D. S., Towbin, K. E., Bhangoo, R. K., & Pine, D. S. (2003). Defining clinical phenotypes of juvenile mania. The American Journal of Psychiatry, 160, 430-437.

Mick, E., Biederman, J., Pandina, G., & Faraone, S. V. (2003). A preliminary meta-analysis of the child behavior checklist in pediatric bipolar disorder. Biological Psychiatry, 53(11), 1021-1027.

Moreno, C., Laje, G., Blanco, C., Jiang, H., Schmidt, A. B., & Olfson, M. (2007). National Trends in the Outpatient Diagnosis and Treatment of Bipolar Disorder in Youth. Archives of General Psychiatry, 64(9), 1032-1039.

Nottelmann, E., Biederman, J., Birmaher, B., Carlson, G. A., Chang, K. D., Fenton, W. S., et al. (2001). National Institute of Mental Health research roundtable on prepubertal bipolar disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 40(8), 871-878.

Pavuluri, M. N., Henry, D. B., Devineni, B., Carbray, J. A., & Birmaher, B. (2006). Child mania rating scale: Development, reliability, and validity. Journal of the American Academy of Child and Adolescent Psychiatry, 45(5), 550-560.

Prisciandaro, J. J., & Roberts, J. E. (2010). Evidence for the continuous latent structure of mania in the Epidemiologic Catchment Area from multiple latent structure and construct validation methodologies. Psychol Med, 1-14.

Rettew, D. C., Lynch, A. D., Achenbach, T. M., Dumenci, L., & Ivanova, M. Y. (2009). Meta-analyses of agreement between diagnoses made from clinical evaluations and standardized diagnostic interviews. Int J Methods Psychiatr Res, 18(3), 169-184.

Shaffer, D., Gould, M. S., Brasic, J., Ambrosini, P., Fisher, P., Bird, H., et al. (1983). A children's global assessment scale (CGAS). Archives of General Psychiatry, 40(11), 1228-1231.

Spitzer, R. L. (1983). Psychiatric diagnosis: Are clinicians still necessary? Comprehensive Psychiatry, 24(5), 399-411.

Wagner, K. D., Findling, R. L., Emslie, G. J., Gracious, B., & Reed, M. (2006). Validation of the Mood Disorder Questionnaire for Bipolar Disorders in Adolescents. Journal of Clinical Psychiatry, 67, 827-830.

WTCCC. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), 661-678.

Youngstrom, E. A., Ackerman, B. P., & Izard, C. E. (1999). Dysphoria-related bias in maternal ratings of children. Journal of Consulting and Clinical Psychology, 67, 905-916.

Youngstrom, E. A., Birmaher, B., & Findling, R. L. (2008). Pediatric bipolar disorder: Validity, phenomenology, and recommendations for diagnosis Bipolar Disorders, 10(Supplement 1), 194-214.

Youngstrom, E. A., Findling, R. L., Calabrese, J. R., Gracious, B. L., Demeter, C., DelPorto Bedoya, D., et al. (2004). Comparing the diagnostic accuracy of six potential screening instruments for bipolar disorder in youths aged 5 to 17 years. Journal of the American Academy of Child & Adolescent Psychiatry, 43, 847-858.

Youngstrom, E. A., Frazier, T. W., Demeter, C., Calabrese, J. R., & Findling, R. L. (2008). Developing a 10-item mania scale from the Parent General Behavior Inventory for children and adolescents. Journal of Clinical Psychiatry, 69(5), 831-839.

Youngstrom, E. A., Joseph, M. F., & Greene, J. (2008). Comparing the psychometric properties of multiple teacher report instruments as predictors of bipolar disorder in children and adolescents. J Clin Psychol, 64(4), 382-401.

Youngstrom, E. A., Joseph, M. F., & Greene, J. (2008). Comparing the psychometric properties of multiple teacher report instruments as predictors of bipolar disorder in children and adolescents. Journal of Clinical Psychology, 64(4), 382-401.

Youngstrom, E. A., Loeber, R., & Stouthamer-Loeber, M. (2000). Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. Journal of Consulting and Clinical Psychology, 68, 1038-1050.

Youngstrom, E. A., Meyers, O. I., Demeter, C., Kogos Youngstrom, J., Morello, L., Piiparinen, R., et al. (2005). Comparing diagnostic checklists for pediatric bipolar disorder in academic and community mental health settings. Bipolar Disorders, 7(Special Issue: Pediatric Bipolar Disorder), 507-517.

Youngstrom, E. A., Meyers, O. I., Youngstrom, J. K., Calabrese, J. R., & Findling, R. L. (2006). Comparing the effects of sampling designs on the diagnostic accuracy of eight promising screening algorithms for pediatric bipolar disorder. Biological Psychiatry, 60, 1013-1019.

Table 1. Diagnostic efficiency of caregiver, youth, and teacher report of manic symptoms (Aim 1, Hypothesis 1) a Sensitivity = .90: the first column provides the score threshold that yields a 90% sensitivity (i.e., percentage of true BPSD cases meeting or exceeding that threshold), and the second column provides the corresponding specificity (i.e., the percentage of cases without BPSD scoring below the threshold).

b Specificity = .90: the first column provides the score threshold that yields a 90% specificity, and the second column provides the corresponding sensitivity for the same threshold.

* p< .05, ** p<.01, ***p<.005 two-tailed.

Table 2. Percentage of Maximum Possible (POMP) Scores on caregiver report of manic symptoms compared across diagnostic subgroups (N=790)(Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Note that the CMRS was added after its publication several years after the start of the grant. Subgroup sizes for the CMRS were n=16, 89, 157, 235, and 42, respectively.

Table 3 CBCL T Scores compared across diagnostic subgroups (N=750) (Aim 1, Hypothesis 1) Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 4

CBCL T DSM Oriented Scales compared across diagnostic subgroup (N=750) (Aim 1, Hypothesis 1) Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 4 Percentage of Maximum Possible (POMP) Scores (N=455) (Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 6 Achenbach Youth Self Report (YSR) T Scores (N=440, ages 11 to 18 years) (Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 7

YSR DSM Oriented Scale T-Scores (N=440) (Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 8

Teacher Report on Mania Scales -- Percentage of Maximum Possible (POMP) Scores (N=166) (Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 9

TRF T Scores (N=292) (Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 10

TRF DSM Oriented Scale T-Scores (N=292) (Aim 1, Hypothesis 1)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 11.

A Priori Taxometric Power Analysis (Aim 2, Hypothesis 3)

Note : h =  P-GBI hipomanic item content, b = P-GBI biphasic item content, and d = P-GBI depressive item content.

Figure 1

MAMBAC results for analysis of caregiver-reported mood symptoms on the Parent General Behavior Inventory (Aim 2, Hypothesis 3)

Figure XXX – MAXEIG Results

Table 12A

Interview-rated Mood Severity -- Percentage of Maximum Possible (POMP) Scores (N=716) (Aim 2, Hypothesis 4)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 12B

Global Functioning (Measured as CGAS  Score) compared across diagnostic subgroups (N=797)(Aim2, Hypothesis 4) Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 13

Impairment in ADHD compared to comorbid BPSD +ADHD (N=445) (Aim 2, Hypothesis 4)

* p< .05, ** p<.01, ***p<.005 two-tailed.

Table 14

Impairment in ODD compared to ODD with comorbid BPSD (N=290) (Aim 2, Hypothesis 4) * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 15

Mood symptoms as a function of age in bipolar sample -- POMP (N=144) (Aim 3, Hypothesis 5)

Note : a Prepubertal group  significantly different than All remaining groups; b Adolescent groups significantly different than Peri-pubertal group

* p< .05, ** p<.01, ***p<.005 two-tailed.

Table 16

Age in Bipolar Sample POMP values (N=133) (Aim 3, Hypothesis 5)

Note : a Prepubertal group  significantly different than All remaining groups; b Adolescent groups significantly different than Peri-pubertal group;

c Adolescent groups significantly different than Prepubertal group

* p< .05, ** p<.01, ***p<.005 two-tailed

Table 17

CBCL Raw Scores compared across age cohorts (N=132) (Aim 3, Hypothesis 5)

Note : a Prepubertal group  significantly different than All remaining groups; b Adolescent groups significantly different than Peri-pubertal group;

c Adolescent groups significantly different than Prepubertal group

* p< .05, ** p<.01, ***p<.005 two-tailed.

Table 18

CBCL DSM Proxy Raw Scores compared across age cohorts within bipolar spectrum (N=132) (Aim 3, Hypothesis 5)

Note : a Prepubertal group  significantly different than All remaining groups; b Adolescent groups significantly different than Peri-pubertal group;

c Adolescent groups significantly different than Prepubertal group; d Adolescent group  significantly different than All remaining groups

* p< .05, ** p<.01, ***p<.005 two-tailed.

Table 19

Sheehan Scale Scores compared between diagnostic subgroups (N=537) (Aim 3, Hypothesis x)

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 20

KINDL Parent POMP Scores sample above 8 years (N=610) Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 21

KINDL Parent POMP Scores sample below 8 years (N=159)                       

Note : a Bipolar groups significantly different than All remaining groups; b Bipolar groups significantly different than Unipolar Mood; c Bipolar groups significantly different than Disruptive Beh; d Bipolar groups significantly different than Residual Diagnoses; undefined * p< .05, ** p<.01, ***p<.005 two-tailed.

Table 22

Sheehan Scale Scores in BPSD stratified by age groups (N=105) (Aim 3, Hypothesis 6)

Note : a Prepubertal group  significantly different than All remaining groups; b Adolescent groups significantly different than Peri-pubertal group;

c Adolescent groups significantly different than Prepubertal group; d Adolescent group  significantly different than All remaining groups

* p< .05, ** p<.01, ***p<.005 two-tailed.

Appendix I. Publications, Grants, and Training Activities

Peer Reviewed Publications (* indicates mentee)

Youngstrom, E.A., Arnold, L.E., & Frazier, T.W. (in press). Bipolar and ADHD comorbidity: Both artifact and outgrowth of shared mechanisms. Clinical Psychology: Science and Practice. Special issue: Comorbidity (Editors: Phillip Kendall and Deborah Drabick).

Youngstrom, E.A. (2009). Definitional issues in bipolar disorder across the life cycle. Clinical Psychology: Science and Practice, 16, 140-160. Special Issue: Bipolar Disorder (Edited by E. Youngstrom & P. Kendall).

Youngstrom, E.A., *Freeman, A.J., & *Jenkins, M.M. (2009). The assessment of bipolar disorder in children and adolescents. Child and Adolescent Psychiatric Clinics of North America, 18, 353-390.

* Freeman, A. J., Youngstrom, E. A., Michalak, E., *Siegel, R., *Meyers, O. I., & Findling, R. L. (2009). Quality of life in pediatric bipolar disorder. Pediatrics, 123, 446-452.

Youngstrom, E.A., *Frazier, T.W., Demeter, C., Calabrese, J.R., & Findling, R.L. (2008). Developing a ten-item mania scale from the Parent General Behavior Inventory for children and adolescents. Journal of Clinical Psychiatry, 69, 831-839.

Youngstrom, E.A., Birmaher, B., & Findling, R.L. (2008). Pediatric bipolar disorder: Validity, phenomenology, and recommendations for diagnosis. Bipolar Disorders, 10, 194-214. Special Issue: International Society for Bipolar Disorders advisory papers for DSM-V Revisions.

Youngstrom, E.A., *Greene, J., & *Joseph, M. (2008). Comparing the psychometric properties of multiple teacher report instruments as predictors of bipolar disorder in children and adolescents. Journal of Clinical Psychology, 64, 382-401. Special Issue: Bipolar Disorder (Guest Editors: Sheri Johnson and Mary Fristad).

* Meyers, O.I., & Youngstrom, E.A. (2008). A Parent General Behavior Inventory subscale to measure sleep disturbance in pediatric bipolar disorder. Journal of Clinical Psychiatry, 69, 840-843.

Youngstrom, E.A., *Meyers, O.I., Youngstrom, J.K., Calabrese, J.R., & Findling, R.L. (2006). Comparing the effects of sampling designs on the diagnostic accuracy of eight promising screening algorithms for pediatric bipolar disorder. Biological Psychiatry, 60, 1013-1019.

Youngstrom, E.A., *Meyers, O.I., Kogos Youngstrom, J., Calabrese, J.R., & Findling, R.L., (2006). Diagnostic and measurement issues in the assessment of pediatric bipolar disorder: Implications for understanding mood disorder across the life cycle. Development and Psychopathology, 18, 989-1021. Special issue: Bipolar Disorder (Guest Editors: David Miklowitz & Dante Cicchetti).

Youngstrom, E.A., Youngstrom, J.K., & Calabrese, J.R. (2006). Screening for bipolarity: A brief review of available measures and recommendations for future research. Aspects of Affect, 2, 1-6.

Youngstrom, E.A. (2006). Reasons to consider assessing for bipolar disorder in children and adolescents, and practical steps to take. Report on Emotional and Behavioral Disorders in Youth, 5, 13-17.

Youngstrom, E. A., *Meyers, O. I., Demeter, C., Kogos Youngstrom, J., Morello, L., Piiparinen, R., Feeny, N. C., Findling, R. L., & Calabrese, J. R. (2005). Comparing diagnostic checklists for pediatric bipolar disorder in academic and community mental health settings. Bipolar Disorders, 7, 507-517. Special Issue: Pediatric Bipolar Disorder.

Youngstrom, E. A., Kogos Youngstrom, J., & *Starr, M. (2005). Bipolar diagnoses in community mental health: Achenbach CBCL profiles and patterns of comorbidity. Biological Psychiatry, 58, 569-575.

Youngstrom, E. A., Findling, R. L., Youngstrom, J. K., & Calabrese, J. R., (2005). Towards an evidence-based assessment of pediatric bipolar disorder. Journal of Clinical Child and Adolescent Psychology, 34,433-448. Special Issue: Evidence-Based Assessment.

Youngstrom, E. A., & *Duax, J. (2005). Evidence Based Assessment of Pediatric Bipolar Disorder, Part 1: Base Rate and Family History. Journal of the American Academy of Child and Adolescent Psychiatry, 44(7), 712-717.

Youngstrom, E. A., & Kogos Youngstrom, J. (2005). Evidence Based Assessment of Pediatric Bipolar Disorder, Part 2: Incorporating Information from Behavior Checklists. Journal of the American Academy of Child and Adolescent Psychiatry, 44(8), 823-828.

Youngstrom, E. A., Findling, R. L., Calabrese, J. R., Gracious, B. L., Demeter, C., DelPorto Bedoya, D., & *Price, M.E. (2004). Comparing the diagnostic accuracy of six potential screening instruments for bipolar disorder in youths aged 5 to 17 years. Journal of the American Academy of Child & Adolescent Psychiatry, 43, 847-858.

Youngstrom, E.A., Findling, R.L., & Calabrese, J.R. (2004). Effects of adolescent manic symptoms on agreement between youth, parent, and teacher ratings of behavior problems. Journal of Affective Disorders, 82S, S5-S16.

Subsequent Grants

Evaluating the Impact of Evidence Based Assessment Strategies on the Accuracy and Outcome of Diagnoses of Pediatric Bipolar Disorder.

Agency: UNC University Research Council

Mechanism: University grant

PI: Eric Youngstrom, Ph.D.

Funded. Start date: July 2008.

Two years. Total costs: $5000.

Longitudinal Assessment of Manic Symptoms.

Agency: NIMH

Mechanism: Collaborative R01; CWRU as coordinating site, MH073967

PI: Robert L. Findling, M.D.; Co-I and PI for UNC subcontract: Eric Youngstrom, Ph.D.

Renewed until 2015. Start date: August 2005.

Five years. Total costs: $5,411,274 for CWRU (coordinating) site during initial award;

NOGA pending for renewal.

NIMH Developing Center for Study of Bipolar Disorder.

Agency: NIMH

Mechanism: P20, MH066054

PI: Joseph Calabrese, M.D. Project #2 PI: Eric Youngstrom, Ph.D (40% effort); also Director of Research Methods Core and Director of Data Management and Statistical Analysis Unit. Switched to consultant status with move to UNC Chapel Hill in July 2006.

Funded. Start date: July 2003.

Five years. Total costs $2,477,644.

Effectiveness of Manualized Cognitive Behavioral Therapy for Pediatric Bipolar Disorder in Community Mental Health

Agency: Ohio Department of Mental Health

Mechanism: Investigator Initiated Proposal

PI: Jennifer Kogos Youngstrom, Ph.D. (8% effort).

Co-I: Norah Feeny, Ph.D. (8% academic year) & Eric Youngstrom, Ph.D.

Funded. Start date: July 2005. Total costs: $120,000.

Subjective Experience of Attention Deficit Hyperactivity Disorder (ADHD) and Bipolar Spectrum Disorders (BPSD) in Youth and Families

Agency: Ohio Department of Mental Health

Mechanism: Investigator Initiated Proposal

PI: Eric Youngstrom, Ph.D. (2% academic year).

Co-I: Janis Jenkins, Ph.D. (2% effort) & Elizabeth Carpenter Song, M.A.

Funded. Start date: May 2005.

Two years. Total costs: $59,458.

Research Supplements for Underrepresented Minorities:

Improving Assessment of Juvenile Bipolar Disorders.

Agency: NIMH

Mechanism: Research Supplements for Underrepresented Minorities to MH066647

PI: Eric Youngstrom, Ph.D., to support Kelly Constant, M.A.

Funded. Start date: August 2004.

Four years. Total costs: $158,565.

Experiential Learning Fellowship: Improving the Assessment of Mental Health Problems in Underserved Families.

Agency: Case Western Reserve University

Mechanism: Undergraduate Experiential Learning Fellowship

RI: Eric Youngstrom, Ph.D., mentoring Maya Brown and Katherine Bobak

Funded. Fall 2006 to Spring 2007. Total costs: $4,600.

Research Supplements for Underrepresented Minorities:

Improving Assessment of Juvenile Bipolar Disorders.

Agency: NIMH

Mechanism: Research Supplements for Underrepresented Minorities to MH066647

PI: Eric Youngstrom, Ph.D., to support Foluso Williams, M.A.

Funded. Start date: February 2005.

One year. Total costs: $10,632.

Student and Young Investigator Development and Training

Kirschstein NRSA F31 Fellowship awarded to Melissa Noya, M.A.

Research Assistantships for nine graduate students

NIH Underrepresented Minority Supplements for two fellows

Doctoral dissertations: Data for Elizabeth Carpenter Song, Ph.D., Guillermo Perez Algorta, Ph.D., Mary Ann McDonnell, Ph.D.

Masters theses: Data for Andrew Freeman, M.A., Anna Van Meter, M.A., Maria Martinez, M.A., Melissa Jenkins, M.A.; financial support for Kelly Bhatnagar, M.A.

Undergraduate Honors Theses: Julia Seay, Katy Hawks, Kristen White, Katherine Bobak, Maya Brown

Undergraduate Independent Studies: More than 40 at Case Western Reserve University and University of North Carolina at Chapel Hill

Establishment of Writer’s Workshop for Center for Excellence in Research and Treatment of Bipolar Disorder (CERT-BD) at the University of North Carolina

Clinical Continuing Education

Data from project presented at more than 50 continuing education events internationally, including workshops and institutes sponsored by the American Psychological Association, the Canadian Psychological Association, the American Academy of Child and Adolescent Psychiatry, the Association for Behavioral and  Cognitive Therapy, the International Review of Bipolar Disorders, and the National Association of School Psychologists, as well as state (Ohio Psychological Association, North Carolina Psychological Association) and regional entities.

Eyoungstrom (discuss • contribs) 20:29, 16 November 2022 (UTC)

Helpful Links
Here's a sheet with the list of variables, labels, value labels, missing value codes, etc. (but not the actual data -- just the codebook!).

Eyoungstrom (discuss • contribs) 00:29, 9 December 2022 (UTC)