As I mentioned a few days ago , Michelle Dawson et al published yet another paper on "the level and nature of autistic intelligence". I didn't go into any details about the paper at that point - even though I had read it and had started writing this post - because it had disappeared from the journal's website and I wasn't sure what it was going to be in when it came back. Well, the paper is back and since Ms. Dawson was kind enough to call me a " renowned blogger " I thought the least I could do was give the paper a serious look.
I am actually going to be talking about two papers by Dawson et al - "The Level and Nature of Autistic Intelligence" and "The Level and Nature of Autistic Intelligence II" - because they use some of the same underlying data and they talk about the same idea. The text of both papers is freely available, here and here , so if you are really interested in the subject I suggest that you read them for yourself.
With that said, the basic premise in both papers is to that people with autism are actually more intelligent than is commonly thought. Conventional wisdom (and science) holds that people with autism are often intellectually disabled and, even when they aren't, have intellectual challenges that places them at a disadvantage to a "typical" person. These papers try to show that people with autism have a different way of thinking and that it isn't so much that they lack intelligence but rather it is the tests that are used to measure their intelligence that are lacking.
Or, in the words of the of the second paper, "autistic spectrum intelligence is atypical, but also genuine, general, and underestimated".
As I said before, in some ways I completely agree with that statement. Conventional intelligence tests rely on certain abilities, such as the ability to understand verbal communication and a ready understanding of the environment, and are very challenging for people with autism. A person with autism might very well score lower than a typical person because they have problems with certain core skills, have problems focusing, or have sensitivities to the immediate environment, not because they lack intelligence.
But that is the nature of the disorder called autism - it disrupts a person's ability to function in a "typical" manner. It doesn't necessarily mean that they lack intelligence but it makes the application of that intelligence difficult.
Getting back to the papers, in both papers the authors gave two different intelligence tests (not really, but more on this in a minute) to several groups of children and adults who either were "typical", had autism, or had Asperger's. The first paper focused on children and adults with autism while the second focused on children and adults with Asperger's. In each paper, there were four groups - typical children, typical adults, children with autism/Asperger's, and adults with autism/Asperger's.
Some of the "typical" children and adults were in both papers although it is never spelled out exactly how many were in both or whether they were retested for the second paper. That last bit is important because the way the IQ tests were administered differed between the papers. So if the data from the first paper was just reused in the second then it might have skewed the results of the second.
All of the groups were given two different styles of intelligence tests, the Wechsler Intelligence Scales III and Raven's Progressive Matrices. The children were given the Wechsler Intelligence Scale for Children (WISC-III) and the adults were given Wechsler Adult Intelligence Scale (WAIS-III). All of the participants were given the standard form of Raven's Progressive Matrices. There are two other forms of the Raven's test, including one that is meant for younger children or children with learning disabilities.
There are two main differences between how the tests were administered between the papers.
In the first paper, the Raven's test was given to all participants with no time limit whereas in the second paper, the standard time limit (40 minutes, I think) was applied. I think the impact on the scores from that difference would be obvious.
The second difference is that the tests in the first paper were scaled according to North American norms while in the second paper Canadian norms were used. This is a little bit obscure, so let me explain.
The basic idea with modern intelligence tests is to give a bunch of questions and then score the number of correct answers. But since this raw score does not really tell you anything meaningful, these scores need to be translated into some more useful form, such as an IQ score or a percentile. To do that, the test scores are "normalized" by giving the test to a large number of people and then using the resulting scores to establish what the typical score is and what range of scores would be expected to be. The typical score is set to be an IQ of 100 (50th percentile) and 1 standard deviation is set equal to 15 IQ points.
This picture from wikipedia might make the idea clearer -
So the problem is that the two papers used two different sets of translations from the raw test scores - North American norms and Canadian norms - and that there are differences between the mappings. So you cannot directly compare the final results without first recalculating the result using the proper normal ranges.
So was the "typical" data that was reused in the second paper re-normalized with the Canadian norms or was it kept under the North American norms? And were the comparison charts from the first paper that were included in the second paper (i.e. Figure 1) adjusted as well? After reading both papers several times, I still can't say one way or the other for certain.
But let's set that aside for now and consider the intelligence tests that were used - Raven's Progressive Matrices and Wechsler Intelligence Scales for children and adults .
The Raven's test is a an old and somewhat simple test that presents a series of progressively more difficult visual puzzles. The visual puzzles take the form of shape that has a piece missing and a set of possible answers. This site has an example of what one of the questions might look like. The person taking the test has a fixed amount of time to answer as many of these puzzles as possible.
The Raven's tests were initially based on the idea that intelligence was a single, unified general ability. Under this model, you either had "intelligence" or you did not. But like all primitive models, this idea of a single unified intelligence has been gradually replaced by the idea that there are many different types of intelligence and that a person is going to have a varying level of intelligence depending on the exactly what part of their intelligence you are measuring.
Which is where the Wechsler tests come into play. These tests attempt to measure the different types of intelligence by the use of different sub tests, each with a specific focus. Under this this newer model of intelligence, Raven's test is no longer measure thought to measure "intelligence" but rather one subtype of intelligence called fluid intelligence. Fluid intelligence is the ability to think logically and solve problems in novel situations, independent of acquired knowledge
So on one hand you have the Raven's test that is measuring the ability to think logically and solve problems and on the other you have the Wechsler tests that are trying to measure actual abilities and the ability to apply what you know to a given situation.
I don't want to go into and more details about the differences between the tests because that would take a long time and I am nowhere close to an expert (or even that knowledgeable) on the subject. If you are interested in the differences between the tests or the history and current theories of intelligence, I suggest starting with the Wikipedia entry on the subject and working your way outward from there.
But let me just say that if you have spent any time with children who have even moderate autism, you would know that the differences between these two tests highlight one of the core challenges of autism. That being while it can be challenging to teach a child with autism it is equally, if not more, challenging to get the child to apply what they know to a given situation. There is a very large gap between being able to learn, actually learning, and being able to generalize that knowledge.
But getting back to the papers, the core data point from both paper is that, while the Wechsler test shows a fragmented and uneven profile of intelligence in people with autism, the Raven's test often shows a significantly higher level of intelligence than the Wechsler in the same group. Furthermore, this significant difference is not present in "typical" children and adults.
So the authors concluded that, since the Raven test is thought to measure a more general form of intelligence, the difference between the two tests represents a problem with how the Wechsler tests measure intelligence with respect to people with autism. They concluded that the Raven test is a more accurate measure of true "atypical" autistic intelligence.
As I said, I agree with this idea up to a point. But (you knew that was coming), there are quite a few problems not only the idea in general but also with the data in both papers.
As I alluded to above, this interpretation ignores the fact that people with autism (and children in particular) have a hard time with the generalization of knowledge. It is one thing for them to know something when you are teaching it to them and asking highly structured questions, it is quite another for them to be able to take that knowledge or reasoning and apply it in a novel situation.
Another problem is that this interpretation ignores the widely accepted idea that people (and again children in particular) with autism have what are called splinter skills. Splinter skills are what happens when a person has uneven development of skills and are substantially behind in some areas, ahead in others, and at the appropriate level for the rest. So instead of a person having a fairly even level of skills, they would be have an extremely uneven level of skills. For example, some children with autism will learn to read before they develop receptive or expressive verbal skills.
You can see evidence of splinter skills in the results from the Wechsler test. You can also see it very clearly if you give a child on the spectrum a developmental test such as the Battelle. So the data from the Wechsler and Raven's tests could easily be yet another example of splinter skills.
In my opinion, if you combine these two ideas, you could say that one of the core traits of autism is an uneven level of skill and difficulties in applying those skills. The other core traits are an extreme difficulty in teaching skills in the first place (at least in some people) and the behaviors of autism.
But let's set all of the above aside. Let's assume that all of the data is in the proper terms and let's assume that the difference between the test values can't be explained by known properties of autism.
The next question is whether what the two tests measure is an equally valid view of intelligence or whether the tests measure different things. Can we really look at one repetitive test of intelligence and assume that it better represents potential intelligence better than another test?
I think the answer is obvious, each test provide a different view of a person's intelligence. But to arrive at a true measure of a person's intelligence you have to consider all of the available evidence.
The next follow up question is whether the end results of the tests are directly comparable. Does a final score of the 80th percentile on one of the test mean the same thing as an 80th percentile test on the other? For this to be true, both tests would have to be an equivalent measure of a person's intelligence, i.e. they would both have to measure the exact same thing.
I think it should be obvious by now that they don't, so I think that you would have to be careful in directly comparing the results between the two tests, doubly so if you wanted to do any calculations based on the numbers.
But again, lets set that aside for now and look at the actual data underlying the papers. I normally don't like to criticize the presentation of paper directly, but if I had to describe the data in these papers I would call it sloppy and disorganized. There are numerous inconsistencies in how the data is presented, a few blatant mistakes, and neither paper gives a clear view of what the data actually is.
Just to give you an idea of what I am talking about.
In the first paper, there is no table that summarizes the data, you have to piece the individual pieces together from the text. There are figures that are presented without any real description of what the data is, such as Figure 1 that says it presents "mean subtest scores" but then charts percentiles. I have to wonder what the percentiles are of, correct answers or normalized results. And the data in figure 1 is presented only for one of the four groups in the paper which begs the question what the other groups look like.
In the second paper, there is a table (Table 1) that presents some of the data. But then that data is contradicted by the first figure in the results section and that figure is central to the results being presented. You would think that someone would have checked that. Later in the paper you are directed to non-existent figures. And again, you never are presented a clear view of the data that is being discussed. Some of the data is contained in the table while other parts of it are presented only in the text and then you only get to see one small part of the data. And then there is another strange chart, Figure 2, that presents data that is similar to the Figure 1 in the first paper but instead of means or percentiles presents scaled scores. And, again, the data for the other groups in the paper are left off.
I could put together a better presentation of the data and that is really saying something. But after sending several quality hours going over the papers and trying to put all of the pieces together, I have some concerns about how the data was actually analyzed.
The main result in both papers was that the percentile difference between the Wechsler and Raven's tests was significantly larger in most of the autism/Asperger's group than it was in the "typical" groups. Most of the groups (with the exception of the Asperger children) did better on the Raven's test than they did on the Wechsler. But the the Asperger adults and both the autism groups showed a significantly larger improvement than the others.
Which leads me to my main problem with the data - how the difference was calculated. To put the problem simply, you cannot accurately compare the difference between two percentiles and get a meaningful result because percentiles themselves are not linear. I think I can illustrate this best with an example.
If I have two numbers - 5 and 1 - that represent the differences between two sets of percentiles (50 and 55, 98 and 99), which one would you assume represents a greater change in intelligence? The obvious answer is of course 5 - the change from the 50th percentile to the 55th percentile.
You would assume that a change of 5 percentiles always represents a greater change in intelligence than a change of 1 percentile. But in this case you would be wrong, the increase from the 98th percentile to the 99th percentile represents a greater change in intelligence than the 50th percentile to the 55th percentile does.
You can see this if you change the percentiles into IQ points (see above). The 50th percentile represents an IQ of 100, the 55th an IQ of 102, the 98th an IQ of 131, and the 99th an IQ of 134. So the 5 percentile change equates to a change in IQ of 2 while the 1 percentile change represents a change in IQ of 3 IQ.
The reason for this discrepancy is that percentiles, at least as they are used in this paper, are meant to provide a relative ordering of everyone who takes a particular test. So scoring in the 80th percentile means that you did better than 80% of the people who took the test and worse than 20%. The percentiles do not tell you anything about the magnitude of the difference between the groups.
So, even if you had a set of percentages that are all from the same test, you could not subtract them and do anything meaningful with the results. You cannot take a set of differences and order them from the smallest to the largest (which is required for the statistics used in the second paper) because you do not know which change in percentile represents a larger change.
The first paper's main conclusion is in doubt because the statistics not only assume the ability to order the results, but also assume a linear scale and a normal distribution of the data. Even a quick look at the statistics shows that the distribution cannot be normal (e.x. range 0 to 100, mean 36, SD 26) and the differences aren't ordinal let alone linear.
The second paper at least used statistics that did not depend on a normal distribution. But even still, the main statistics depends on the data being ordinal.
So when the second paper says this in the results section -
"The Asperger adults demonstrated an advantage of RPM over Wechsler FSIQ that was significantly greater than that of the non-Asperger adult controls, Mann-Whitney U=366.5, p<.01"
That statement is completely unsupported by the data. In pure numerical terms, the difference might seem to be larger, but in terms of actual increased of intelligence that statement is very much in doubt.
Another quibble with the results is the use of averages (means) to represent the group rather than a median. If you have a set of non-linear values such as these percentiles, if really isn't valid to take an average because it is going to misrepresent where the middle of the group is. That goes double when the data is badly skewed, as is the case of the Asperger adults' Raven's test in the second paper. In that case the "average" was 74 but the standard deviation is 50(!). For that to happen, the bulk of the data has to be well below the 74th percentile which means the median value would be significantly lower.
Although, to be fair, there are some valid secondary results. For example, when the paper reports that "the Asperger adults’ Wechsler VIQ was significantly higher than their PIQ (55th vs. 39th percentile), Z =3.43 p<.01", that could be valid because the data is in the same terms and the statistics were (apparently) used properly. What it means without the main result though is an entirely different question.
Who knows, maybe I am missing something fundamental about the data here or am completely wrong about the percentile thing. But from that I can see in the paper and what I know about statistics, it looks like the conclusions are based on a faulty analysis. If anyone sees something obvious that I missed, please point it out in the comments.
I really could go on to point out quite a few other problems with the data such as the fact that the differences are percentiles aren't even based on the same test, or that the number of participants in the papers is rather small, or that confounding social/demographic factors weren't adjusted for. But since the main result is likely invalid, I don't really see the point in beating a dead horse.
Whew. Anyone still reading this?
Now that I have rambled about these two papers far longer than I had wanted, let me just say that while I think these two papers are mostly worthless, the idea that people with autism can be intelligent isn't. There is nothing implicit in autism that says that everybody who has autism is automatically intellectually disabled, although there appears to be a large group that is.
What I think is obvious is that autism disrupts a person's ability to apply their intelligence. Even if you throw out every problem that I pointed out with these papers and took their data at face value, the data would fully support that idea that there is a break between what a person can do and what autism allows them to do.
1. Dawson M, Soulières I, Gernsbacher MA, Mottron L. The level and nature of autistic intelligence. Psychol Sci. 2007 Aug;18(8):657-62. PubMed PMID: 17680932.
2. Soulières I, Dawson M, Gernsbacher MA, Mottron L. The Level and Nature of Autistic Intelligence II: What about Asperger Syndrome? PLoS One. 2011;6(9):e25372. Epub 2011 Sep 28. PubMed PMID: 21991394.