Scorolli, C., Borghi, A.M. (2007). Sentence comprehension and action: effector specific modulation of the motor system... unfortunately, they fell at the first hurdle and carried on tumbling. I thought this was a nice illustration of how not to conduct and write up an experiment and would provide a good template of things to avoid.
The paper tries to provide evidence for the weak form of embodied cognition, which basically argues that concepts are not transduced into amodal symbols but re-enacted in the same or similar modality specific systems used in perception and action. This is often demonstrated as bodily states influencing mental states.
The authors hypothesise that "reading sentences related to actions to be performed with different effectors (mouth and foot) [would] activate the same neural systems activated during the effective execution of these actions". In other words, the same neurons should be activated while understanding an action sentence as while performing an action with a specific effector. They, therefore, predict that participants will respond more quickly when the action in the sentence is related to the response modality. Mouth sentences such as 'sucking the sweet' will be recognised as meaningful when responding with a mic more quickly than hand or feet sentences and the same with foot sentences when respnding with a pedal.
There is actually a huge conceptual issue with this weaker form of embodied cognition, which still relies on representations and classical computation but I shall leave this in the capable hands of Andrew and Sabrina. I shall focus on the more standard, bread and butter problems of the paper, which are bad methodology, bad analyses and a bad conclusion.
Think carefully about the design and implications for the necessary statistics to analyse it
On the word of my lecturer, the design the researchers chose was a '2x2x3 partially nested design'. The participants had 2 response modalities - mic vs pedal, there were two between-participant blocks, and each compared either 'mouth vs hand sentences' or 'feet vs hand sentences' for each of the two response modalities.
If this sounds complicated - it is -it's a joke. I was going to attempt to explain the methodology further but the risk of boring myself and you is just not worth it and the paper is free to marvel at here. Besides, the fact the experiment is such a bloody pain to explain and understand, serves to illustrate just how overly and unnecessarily complicated it is.
The design of an experiment is crucial and if you get it wrong, no amount of complex post hoc analyses will cover it up. As a general rule the simpler and more elegant the better - otherwise a ridiculously complicated design leads to ridiculously uninterpretable results.
Know your stats - p values, effect sizes, interactions and the difference between the differences
Faced with some pretty complicated results, the authors were forced to conduct two separate mixed factor ANOVAs for each block. Straight away, this meant they couldn't report any interaction and certainly couldn't compare results from each ANOVA.
The results however weren't too convincing. Expectantly, they found that mouth sentences were quicker than hand sentences when responding using the mic but they also found that feet sentences were also quicker than hand sentences for mic - this does not support their hypothesis. In order to justify their findings the authors argued "the marked difference between the effect sizes (p < 0.009 vs p < 0.05) confirms that the simulation is effector specific".
There is so much wrong with this statement it is almost comical.
The authors do not report the effect sizes so we don't know what the difference between them is.
Instead they quote the difference in the p values which aren't defined (p less than...well how much less than!? We need the exact values.)
But don't worry because even if we did have the exact values, this tells us nothing about the difference in effect sizes. The p value gives us the chance of making a type 1 error - it tells us nothing about the how large the effect size is. In other words we could have a really good p value < 0.0001 but with a really small effect size, conversely we could have a larger p value say 0.05 with a large effect size.
Alas, most importantly, even if the authors did report the effect sizes and they seemed to be quite different, they cannot conclude that there is a significant or 'marked difference'. This is because the results came from two separate ANOVAs. The authors have correctly found a significant difference between hand and mouth (for mic) and between hand and feet (for mic) but it is incorrect to say there is a significant 'difference between the differences'. Any differences or correlations require analyses to show if they are significant or simply occured by chance and this example is no...um...different!
Be critical and tentative when drawing conclusions from your results
Even in the most straight forward experiment, with large effect sizes or whatever, it is important to consider what else might be happening. It is commonplace to see "our results support our hypothesis but [enter potential confounds + limitations]". Results almost never prove a hypothesis.
Having found that half their results don't support their hypothesis and following a botched attempt to salvage them, Scorolli and Borghi, conclude their results "support the view that the act of comprehending sentences leads to the creation of internal simulation of the action read".
No they don't. At a basic level half the results do not support the hypothesis and there is no way to investigate further interactions, since the poor design has forced the use of two separate ANOVAs.
There's so much more wrong with this paper, but I want to talk a bit about the implications of bad research. If anyone is interested, I welcome comments on further problems!
Back to Science and Faith
The main point of this blog was to give a few important pointers with regards to a scientific approach in psychology. But I do think there is another message here. I wrote a previous blog on the importance of faith in science. That is, whilst (arguably?) scientists can be confident of the scientific method, in order to move forward we must have faith that others have successfully implemented this method. This may sound obvious or even trivial but I think it has stark implications. If this kind of research, which makes the basic errors can be trashed by a bunch of 3rd year psych students, can get published, how much other dodgy research is out there?
Of course, it isn't just the job of the Journal (ahem, sorry, reviewers) to critique the paper and any scientist should be reading through skeptical lenses. But this just doesn't seem to happen. This particular paper has been cited 51 times and after a brief gander, not by papers pointing out how crap it is. It's easy to see how a completely unscientific and unsupported model or theory could become popular to the point of becoming a given.
A quick caveat
I think it's important to note the aim of a review is not to metaphorically (or literally) obliterate the article in front of you. As Robert Sternberg points out in his book, Reviewing scientific works in psychology, whilst it is tempting, especially for young reviewers, to measure their own competence in terms of how many faults they can find, a good reviewer finds a balance between the errors in a paper and possible reasons why the paper would be beneficial or insightful. This is termed the 'gate keeper vs generative' mindset.
In other words, there is nothing impressive about pointing out the many faults in a paper, if you fail to spot that, with minor or major revision, the paper actually has something very important, insightful, surprising or even paradigm shifting, to contribute. I also think it is much easier to spot the floors in someone else's experiment than to critically design your own. This blog is not a quick fix ego boost in the form of slating someone else's hard work.
Scorolli and Borghi make some classic school-boy (or school-girl rather) errors that even students should not make, let alone professional psychologists. What's worse is the paper isn't even particularly insightful or even conceptually well grounded. I suppose at least they have served here to provide us with a few do's and don'ts in experimental design.
Oh and in case you're wondering the recommendation was a unanimous REJECT.