Comment on “Timing of Increased Autistic Disorder Cumulative Incidence”
Posted Jan 19 2012 1:40pm
In 2010 a paper was published called “ Timing of increased autistic disorder cumulative incidence .” The paper has made very little, if any, impact on the scientific community. But it has become part of the stable of poor quality papers used by those claiming that vaccines caused an autism epidemic.
The paper took data from other published papers and applied a “hockey-stick” analysis to try to identify change points in the administrative prevalence of autism in California, Japan and Denmark. Here’s the main figure from that paper ( click to enlarge )
The idea of a hockey-stick analysis is to fit the data to two lines of different slopes which meet at a change point. Those two lines look like a hockey stick, hence the name. For multiple reasons, I believe this analysis was not appropriate for these data.
In my view, much like Ms. Ratajczak’s review, the major impact of “Timing of Increased Autistic Disorder Cumulative Incidence” has not been in the scientific literature. An internet search quickly shows that both papers have been quite well received by those promoting vaccines as a cause of autism, both within part of the autism/parent community and from the anti-abortion community. The “Timing” paper was immediately promoted by (Mr. Kirby was a major promoter of the idea that mercury caused an autism epidemic). The paper has since been picked up by many, including Andrew Wakefield who attempts to give his interpretation of a “hockey stick” analysis in his talks ( click to enlarge ):
The “Timing” paper is, quite frankly, weak at best. Weak enough that I am unsure why the authors’ superiors at the EPA chose to approve it even with the disclaimer, “Approval does not signify that the contents reflect the views of the Agency” (a disclaimer which Mr. Kirby ignored as he made comments like “according to the EPA” in his piece). With much better analyses of the California Data by Peter Bearman’s group at Columbia and Irva Hertz-Picciotto ’s group at U.C. Davis, the time for such simple analyses as in the MacDonald and Paul paper is past. Especially in a highly charged area such as autism.
If I had room given the word count restrictions on a reply I would have included some of these points. Instead in “ Comment on Timing of Increased Autistic Disorder Cumulative Incidence ” I focused on three points. First, the source that MacDonald and Paul used for their California data has a very clear and explicit disclaimer about the fact that those data are not high enough quality for scientific research. Second, the data are exponential. One can fit a “hockey-stick” to exponential data but the results are meaningless. There is no change point in an exponential curve. Third, plotting the data shows that there are change points, but at 1960 and 1974, not 1988 as MacDonald and Paul claimed from fitting one of the exponential regions of data.
In their original paper, MacDonald and Paul point out: “All data were taken from the publications with no attempt to access the original data.” This, as I pointed out in my comment, was unfortunate because the CDDS makes their data available to the public. This would allow one to double check hypotheses, such as whether a “hockey-stick” analysis is appropriate. For many reasons, it is an inappropriate analysis.
First, the California Department of Developmental Services (CDDS) make it clear that these data are not to be used to draw scientific conclusions. From the report where the EPA authors gathered their data:
The information presented in this report is purely descriptive in nature and standing alone, should not be used to draw scientifically valid conclusions about the incidence or prevalence of ASD in California. The numbers of persons with ASD described in this report reflect point-in-time counts and do not constitute formal epidemiological measures of incidence or prevalence. The information contained in this report is limited by factors such as case finding, accuracy of diagnosis and the recording, on an individual basis, of a large array of information contained in the records of persons comprising California’s Developmental Services System. Finally, it is important to note that entry into the California Developmental Services System is voluntary. This may further alter the data presented herein relative to the actual population of persons with autism in California.
If one ignores this major point (as the EPA authors did), there are still other reasons why their analysis method is inappropriate. One big reason is that trying to look for a single “change point” year in California isn’t supported by the data. The fact that autism rates vary dramatically by geography within California (as shown by both Prof. Hertz-Picciotto’s group and Prof Bearman’s group) points away from any universal exposure (such as vaccines). The data I have from the CDDS which breaks down the counts by region only go back to the early 1990’s, so with this and space considerations I did not included these data. These geographic data make it clear that not only do the autism rates vary by region, the time trends of those rates vary a great deal from one region to another. In other words, what is a change point for one region may not be one for another. Applying a single change point to all of California is not warranted using these data.
Another reason why the hockey-stick analysis is inappropriate is the fact that it forces a functional form to the data which is plainly a bad fit. A hockey-stick analysis fits the time trend to two lines with a “change point” where the lines intersect. Unfortunately, the data are exponential. The result is quite remarkable, really, given the geographic variability and the changing social influences on autism rates.
If one takes one of the CDDS datasets (I used one from 2007) and combines it with census type data, one can produce this figure (Figure 1 from the published comment):
Graphing the data on a log-normal graph such as this shows that the data are exponential. Going all the way back to birth year 1930. It isn’t a simple exponential, though. There is a region around 1960 to 1974 where the growth stalled. It is remarkable that the same time constant fits the data all the way back to 1930, with the exception of this 1960-1974 region.
Fitting exponential data to two lines just doesn’t make sense. There is no “change point” in an exponential. One can force a fit onto exponential data, but it isn’t meaningful.
Using the log-normal plot I supplied one can see that there are change points in the trend. Obvious to any observer. But they are in 1960 and 1974, not in about 1987/88 as MacDonald and Paul calculated.
As is customary, MacDonald and Paul supplied a reply to my comment . In this they make only a very brief reference to the fact that the very document from which they pulled the California data states it is inappropriate to use it the way they did: “We agree with Carey (3) that analysis of long term epidemiological studies would be desirable and that there are a number of potential confounding issues associated with analysis of administrative databases.”
One mistake I made was in not clearly spelling out that fitting a hockey-stick to exponential data is inappropriate. It is obvious, but rather than address this problem MacDonald and Paul state:
Changepoints were determined by fitting a hockey-stick model (10) to the data for each dataset. This approach uses ordered data and piecewise linear regression to split the response variable into two groups. A linear regression line is generated for each group, and the point of intersection for these regression lines and the residual sum of squares for each line are determined. The intersection point that minimizes the residual sum of squares is the changepoint.
Carey (3) used a log transformation of the cumulative incidence to produce a log-linear relationship for the CDDS data of the form: Log (Cumulative Incidence) = B0 + B1 (Year). Subsequently, he states that he could not observe changes in the log-linear relationship of CDDS cumulative incidence at or around our changepoint year of 1989, but no other analysis was performed. Examining original CDDS data in the inset of Carey’s (3) Figure 1, it certainly seems likely that there is a changepoint in the 1985-1990 range, and being unable to observe such a change in the log-linear plot may be purely an artifact of the scaling of the plot. We conducted a changepoint analysis on transformed CDDS data from 1970 to 1997 (from (7)) and found a changepoint in 1984. The shift to an earlier changepoint using the log transformed data may result from stabilization of the variance associated with the transformation, and the resulting shift in the minimization point for the residual sum of squares for the regression line for the larger cumulative incidence values in later years.
It’s an odd response. The authors are focused on defending their original result of a change point in the 1980’s rather than considering the entire new dataset. They ignore the problems inherent in claiming a change point in exponential data, but I should have stressed that more in my comment. Even if MacDonald and Paul claim it is appropriate to make this fit, they ignore the obvious change points in the log-normal graph. Consider the change point at about 1960. It is abundantly clear in the log-normal graph. In the inset of my figure, the linear graph, that change point is still obvious to the eye.
If the real goal of their work was to identify change points there is no reason to ignore those which were (a) outside of their original time span and (b) obvious in a different presentation of the data. This is not just flawed, it is irresponsible. They are ignoring their own stated goal:
As we point out in the paper, while artifacts associated with observed increases in various studies cannot be ruled out, from a precautionary standpoint, it seems prudent to assume that at least some portion of the observed increases in incidence is real and results from the interaction of environmental factors with genetically susceptible populations. Since exposure to environmental factors is potentially preventable, identification of relevant candidate factors should be a research priority.
Why, I would ask, are potential environmental candidates which might involve change points in 1960 and 1974 not important, but one in the late 1980’s is?
Autism Blog – Comment on “Timing of Increased Autistic Disorder … | My Autism Site | All About Autism:
[...] Go here to read the rest: Autism Blog – Comment on “Timing of Increased Autistic Disorder … [...]