Thursday 25 July 2013

Further thoughts on pre-registration

As far as I can tell pre-registration of scientific studies has been proposed as a solution to three important problems with the current model of scientific publications: replication, negative results and p-hacking. I personally think that it is great that this format will allow more replication studies to be undertaken and more importantly it will enable the publication of replication studies even if the results are negative. This will be of clear value to the field.

All my reservations about pre-registration concern p-hacking, more specifically how this has been 'sold' to the community. Pre-registration is a solution to the problem of p-hacking. However, those promoting pre-registration have often argued that pre-registered articles should somehow be considered "more truthful" than those that have been published by the traditional route. I strongly disagree with this kind of statement. It is true that pre-registered studies will not be p-hacked, however not all studies that are not pre-registered are p-hacked. The danger of promoting pre-registration as more 'truthful' is that the community will stop believing results from someone who has decided not to pre-register for whatever reason - maybe they wanted the freedom to publish their results wherever they wanted, maybe they did not want to deal with reviewers who might disagree with the experimental design or maybe they just wanted to start the study rather than wait for months before approval.

P-hacking is clearly a problem, particular as it can occur subconsciously. However, at the current time I can not see how pre-registration will work in practice to prevent p-hacking and I worry that it has been promoted in a way that potentially will denigrate equally good science that has been published via a different route.

14 comments:

  1. Hi James,
    Thanks for this. I find your position in this debate interesting. Two questions:

    1) if, as you argue, pre-registration is bad because it threatens to denigrate equally good science that hasn't been pre-registered, why isn't this also a problem for replication studies? Don't you think registered replications would threaten to denigrate unregistered replications?

    2) it seems to me that your argument isn't so much pre-registration won't "work in practice to prevent p-hacking" (since it manifestly does, as you state earlier), but rather that by doing so it threatens the face validity of approaches that *might* have been p-hacked, but equally might not have been. Have I interpreted you correctly here?

    thanks, c.

    ReplyDelete
  2. Hi Chris,

    I think it is the same for replication studies, however, I like the fact that the results will be published irrespective of the outcome.
    To answer your second question - yes, you have interpreted my thoughts correctly.

    ReplyDelete
    Replies
    1. Ok - so if we just focus on the question of p-hacking.

      Do you agree with the following premises:

      1) p-hacking reduces reliability of study conclusions

      2) pre-reg prevents p-hacking

      3) status quo does not prevent p-hacking, so some studies will be p-hacked and others won't be

      This means pre-reg studies are not susceptible to the loss of reliability caused by p-hacking in a subset of status quo studies. So, if you were to choose which set of results you believe more - the pre-reg or the status quo (and, for this exercise to do so solely on the basis of the likelihood of p-hacking) - wouldn't it be entirely rational to put more faith in the pre-reg?

      Delete
    2. I think we need to differentiate between two different things. Theoretically pre-registration could prevent p-hacking. However, for this to occur ALL possible degrees of freedom in the analysis pipeline need to pre-specified. e.g number of subjects, number of trials, how you eliminate outliers, all filters, all normalisation etc. For the most complicated analyses where p-hacking might be the most common, e.g. fMRI, then for pre-registration to work all papers would have to be reviewed by an fMRI analysis expert that understands all possible analysis pathways. I will be interested to see how this works. Any wiggle room in the analysis pipeline will not eliminate p-hacking.

      The second problem is whether we judge papers 'on average' or 'paper by paper'. Clearly if pre-registration eliminates p-hacking then on average pre-registered papers will have a lower frequency of p-hacking than papers that are not pre-registered. However, if you randomly select one pre-registered paper and one paper that has not been pre-registered there is no way of knowing the relative validity of the two analyses. In my opinion it would be wrong and unfair to state that the non-pre-registered paper is less "truthful".

      Delete
    3. "Any wiggle room in the analysis pipeline will not eliminate p-hacking."

      I agree, and this is why we've tried to make Cortex RR criteria as strict we can in terms of outlining details of the experimental procedures and analysis pipeline. It's true that pre-registration will probably not be perfect in this respect at preventing p-hacking, but I'd also wager it will be a lot more successful than the mechanisms we currently have in place (i.e. nothing at all).

      "For the most complicated analyses where p-hacking might be the most common, e.g. fMRI, then for pre-registration to work all papers would have to be reviewed by an fMRI analysis expert that understands all possible analysis pathways."

      Agreed again, and I don't see why this should be any more difficult when reviewing a paper before data collection vs. after. Journals will still seek the comments of specialist experts. I don't see how or why an fMRI expert would be any less expert in assessing a proposed analysis pipeline compared to a completed one (as per status quo papers).

      "The second problem is whether we judge papers 'on average' or 'paper by paper'."

      This is an interesting point. We should clearly be judging papers in groups via meta-analysis rather than on a one-by-one basis. Actually I would argue that this is another prevailing problem in cognitive neuroscience - that we expect single studies to be overly conclusive, which raises the bar for authors to construct overly deterministic narratives.

      Even judging papers one-by-one, I contend that a rational observer can still judge the *likely* reliability of single papers. Lets's suppose the following:

      1) in accordance with the survey of John et al 2012, that 72% of a given literature is currently p-hacked.

      2) pre-registration prevents (or largely prevents) p-hacking. Lets suppose there is a 5% rate of p-hacked studies in pre-reg papers.

      3) given two single papers: one from the population of status quo articles and the other from the population of pre-reg papers. The probability of the pre-reg paper *not* being p-hacked is 95%. The probability of the status quo paper *not* being p-hacked is 28%. As you say, there is no *certainty* that one paper is more reliable than the other. But on the balance of probabilities, wouldn't it be entirely rational to conclude that the findings of the pre-reg paper are likely to be more reliable?

      Delete
    4. Thanks Chris,

      One of the biggest differences between the current role of a reviewer and how I imagine the role of a reviewer of a pre-registered article is that when reviewing a registered paper the reviewer has to assess whether all details and parameters are present that describe the analysis pipeline. So for example, in fMRI will pre-registered articles require all parameters of the unwarping and realignment steps? I would imagine it would. In which case the reviewer has to assess whether these are correct or not and whether they are comprehensively described.

      I agree that 'on average' pre-registered articles have a lower probability of p-hacking. My concern is the individual paper. The logic of 'on average' also applies to judging the impact of individual papers by the impact factor of the Journal they are published in. Which is also not something I agree with even though 'on average' it is also true.

      In my opinion pre-registration is an excellent method for publication of replication studies as it will enable both positive and negative results. For other studies it will reduce, but not eliminate, p-hacking and importantly pre-registration can not be used to argue that any one pre-registered article is 'more truthful' than an article published by the traditional route.

      Delete
    5. "So for example, in fMRI will pre-registered articles require all parameters of the unwarping and realignment steps? I would imagine it would. In which case the reviewer has to assess whether these are correct or not and whether they are comprehensively described."

      Yes. For a status quo manuscript that includes a full description of the methodology, the reviewer's task is also to judge whether the methods reported are correct. The only difference between this and a pre-reg paper is that the reviewer does the same thing before the authors run the experiment.

      "The logic of 'on average' also applies to judging the impact of individual papers by the impact factor of the Journal they are published in."

      I don't think this is a comparable analogy. The reason the 'on average' argument fails with JIF is because of the heavily skewed distribution of citations at most journals. This leads to the JIF being biased by a small number of highly cited papers, which in turn is why the citation rates of individual articles don't correlate with JIF. In the case of impact factors, it is this lack of correlation between JIF and individual citation rates that undermines the 'on average' argument.

      The situation with p-hacking is demonstrably different. For the purposes of this discussion, p-hacking is a dichotomous variable (it either happens or it doesn't), and we have evidence that it is is engaged in by the majority of published papers. This is unlike JIF, which is determined by a small minority of the most extreme data, therefore the average is unrepresentative of any individual data point.

      "importantly pre-registration can not be used to argue that any one pre-registered article is 'more truthful' than an article published by the traditional route."

      So - to be clear - do you agree or disagree with point 3 in my previous comment?

      I suspect you disagree, but I'd like to be absolutely clear why. Do you see a basic logical problem with my argument (point 3 above) that the findings of a single pre-reg paper are *likely* to be more reliable than a single status quo paper? Or do you agree with that logic but have a more specific objection to the use of the term 'truthful'?

      Delete
    6. "the reviewer's task is also to judge whether the methods reported are correct. The only difference between this and a pre-reg paper is that the reviewer does the same thing before the authors run the experiment" I am not sure this is correct. In pre-registration the reviewer has to make sure not only the methods are correct but also that they are totally complete and fully described, with all parameters pre-specified. This is hardly ever the case in typically fMRI papers that state something like "XXXXX was used for spatial preprocessing and subsequent analyses" where XXXXX is one of the major software packages. The details are rarely given and so these would need to be included in pre-registration papers and assessed by an expert reviewer.

      The statement you make in your point 3 I agree with as written. It is less likely that a pre-registered paper has been p-hacked than one that has not been pre-registered. However the two papers could be as reliable as each other, there is no way of knowing. I feel strongly that it would wrong to claim that pre-registration will enable anyone to know the relative reliability of any individual papers and they should not be judged any differently.

      Delete
  3. "The danger of promoting pre-registration as more 'truthful' is that the community will stop believing results from someone who has decided not to pre-register for whatever reason"

    Couldn't exactly the same thing be said about peer review? Or using p<0.05 rather than p<0.1 as ones significance criterion?

    That is to say, given that pre-registration would prevent p-hacking i.e. it would add 'rigor', then yes people would consider non-registered studies to be less rigorous.

    But that is not a problem with pre-registration. It's a problem (or a feature) of rigor itself.

    ReplyDelete
  4. I think the two examples are very different. I think it is critical to distinguish between 'on average' inference and 'paper by paper' inference (see comments discussion above with Chris). Pre-registration will 'on average' reduce the likelihood of p-hacking, however, it will be impossible to tell the relative merits of individual papers.

    ReplyDelete
  5. Besides p-hacking, pre-registration of studies will help prevent from other forms of undesirable (if not questionable) research practices, such as HARKing (Hypothesizing After the Results are Known, see Kerr, 1998; http://bit.ly/17aMytG), and might shift the focus on intellectually sound hypotheses, model or theories instead of counter-intuitive appealing-but-not-reliable effects that are treated as if they were totally expected on the basis of patchwork theories.

    ReplyDelete
  6. Call me a cynic, but I'm afraid that these days I just assume that most papers based on studies with many variables and no obvious a priori hypothesis are p-hacked.
    Indeed, the art of getting published in a top journal seems to consist of being creative enough to invent a post hoc hypothesis that looks as if it could have been a priori.
    It's all pretty depressing, as I no longer know what to believe. So it would be nice to have a reliable means of distinguishing studies that aren't p-hacked.
    If you don't like pre-registration, then can you suggest another way of doing this?
    Replication is one obvious possibility, but it's not often done, and for some studies not very feasible.

    ReplyDelete
    Replies
    1. I agree with all you have said. Just to be clear I have little doubt that pre-registration will decrease p-hacking and other 'dodgy' practices in those papers. It is not this that I object to. My concern is that the pre-registered papers might be considered more 'truthful' or more 'reliable'. On average it may likely be the case that pre-registered papers will have a lower likelihood of false positives, but, it will tell us nothing about the merits of individual papers whether pre-registered or not. It is this sort of language that has been present in some of the discussions that concerns me.

      Delete
  7. All in all I think prereg options are good to have. I do find myself very skeptical that things will automatically work out the way prereg proponents seem to think. The most well intentioned ideas have unintended consequences. Impact Factor was not originally intended for how many use it now. I can see it already brewing with prereg, these "merit badges" for studies that are basically pre-publication metrics. You can say they aren't meant to be judgmental but that is how other people will interpret it. If prereg papers are seen as better to have on your CV that means people will try to get them, both through ethical and non-ethical means. Science is done by humans, and you know what we're like.

    ReplyDelete