It’s been about a month since Science, the journal of the American Association for the Advancement of Science, ran the authors’ retraction of a high-profile paper on the genetics of living to an advanced age. The paper had been accepted for publication about a year earlier. Part of its high profile is attributable to a publicity campaign mounted by the Association on behalf of… well, itself, but also the papers’ authors and the institutions where they work.
Many people have written about the incident. Science is a visible journal. The work reported in the paper is technically sophisticated and immediately interesting to many in both the lay and professional communities. There was a lot of publicity specifically about this paper.
Now really is the wrong time to discuss the technical merits of the work, since the paper is still being reviewed by other scientists for publication (just not at Science), and the authors can’t discuss the matter, show the paper to reporters, or answer questions until the work is published. However, the incident itself, what has already happened, raises some questions about how elite science is practiced.
Some commentators think that the paper’s original acceptance entails some failure of the process of pre-acceptance “peer review.” For example,
In this piece, I argue that peer review functioned adequately. The desirable objective in scientific publication is quality control before a report enters the archival or permanent record of scientific achievement. In our century, quality control includes the distribution of a work before its final publication. It did in this case, and it was in that penultimate stage before final publication that the undisputed technical problems were caught. To all appearances, the difficulties revealed have been much mitigated, and for all anyone knows, fully remedied.
There is a success story here that is being overlooked. The jury is still out on the final disposition of the underlying scientific work, but so far, the process has worked.
So what did happen?
As is typical of scientific journals in the Internet Age, the paper was published electronically very soon after it was accepted for publication. Had all gone well, then some weeks or months later, the paper would have appeared in print and on the journal’s archival website. Until then, it was available in a special web-only section, Science Express, where papers come with a warning that they are subject to editorial revision until they appear in final form.
In addition to the paper being available to the scientific community, Science’s publisher arranged a press conference and podcast featuring the paper’s two senior authors, Paola Sebastiani and Thomas Perls. These promotional efforts increased awareness of the paper among both specialists and the public at large.
Almost immediately, a problem surfaced. One of the gene chips the authors had used on some of the people they studied had a flaw. News of this flaw was not yet published when their paper was accepted, but the appearance of their work got the news out fast. Within a few weeks, Science Express included a notice to readers of the paper, explaining what was known of the nature of the methodological difficulty.
The authors undertook a considerable months-long program of reanalysis, as was described in an expression of editorial concern a few months after the paper had been accepted. For all practical purposes, the original paper was dead letter at that point. The only paper of any enduring scholarly interest would be a new version whose findings weren’t affected by the faulty chip problem, and which also incorporated additional methodological safeguards to ensure the reliability of the results.
The only suspense, really, was whether or not Science would accept this new paper. That was the editor’s and publisher’s prerogative, subject to the authors’ prerogative to retract. After months of re-review, the parties failed to reach an accord. Last month, several coordinated events happened simultaneously to resolve the matter. The authors did retract, they announced their intention to pursue publication elsewhere , and Science’s editor and publisher issued a statement that strongly implied, but hardly explained, that they wouldn’t have run the revised paper anyway. The new version of the paper is currently under review at another, undisclosed, journal.
What should have happened?
Peer review, in the form that is now ubiquitous for scientific archival publication, became the default front line of quality control in the midle of the last century. An earlier system had relied on specific, often public, recommendation for publication by leaders in the field. An aspiring author would send Albert Einstein, say, a copy of the paper. If the professor liked it, then he would tell a journal editor that it was good stuff.
There is a prominent bottleneck in the recommendation system, Professor Einstein was busy and could attend to only so many aspirants’ papers. So, something more cost-effective and easy to operate on an industrial scale developed. In the modern peer review, an editor solicits critiques of a paper from some people knowledgeable about its subject, perhaps three to six. The journal makes a publication decision based on the advice provided by the group, often after some revisions have been made by the authors in response to the reviewers’ comments. Reviewers are usually anonymous, and each one bears a limited responsibility for the final decision reached.
The system also allows people of different expertise and viewpoint to participate in the decision. The Sebastiani et al. paper, for instance, ideally called for reviewers familiar with Bayesian statistics and its application to machine learning, with genome-wide association studies, with the technology of genetic microarrays and with the distinctive characteristics of extremely long-lived people. It is unlikely that this body of diverse expertise could be assembled on short notice for a one-shot project except by a team approach.
What can this team accomplish? One commentator, the Boston Globe’s Carolyn Y. Johnson, cited earlier, offers an aspirational and abstract description of the goals of peer review,
[O]utside scientists are supposed to ensure that evidence presented by researchers support a paper’s findings and that it is worthy of publication.
Fair enough for the length, but literally to ensure something as broad as publication worthiness? No. More modest, more concrete, and narrower goals are actually served. The team can be expected to catch obvious methodological problems, including any lapses in careful reasoning, such as that alternative interpretations of the results are overlooked, or indeed, that the findings are inadequately supported by the actual evidence.
The reviewers also check that the paper is interesting, novel, useful to a reasonable portion of the journal’s readers, that it complies with the journal’s editorial policies, and that the submission is in a form ready for publication. “Readiness” covers a wide swath, from the completeness of the underlying work being discussed, through the adequate consideration of what has already been reported in the literature on the subject, all the way down to plain vanilla good English grammar and diction.
What does “pre-publication publication” add to the peer reviewers’ work?
The specific individuals who reviewed the original version of the paper were plainly not the individuals who knew of the fault in the chip used with some of the subjects. Except for the pre-publication availability of the paper, with the warning from the very beginning that the paper was subject to revision (and with progressively more detailed warnings as the situation developed), the people with the right information would probably not have intercepted the paper before it became archival.
These people, with genuinely relevant information, were not the only people to participate in the “last chance” review of the paper. Many people volunteered their take.
Some people thought the paper was promoting an unjustifiably large role for genetics in explaining extreme longevity, in comparison with environmental and lifestyle factors. The actual design emphasized identification of what the genetic contribution comprised, including the possible variety of distinct ways in which genetic endowment might contribute to an individual’s long life, maybe different ways for different people. To the extent possible, the “ordinary” controls were matched to the centenarians in attributes except genetics. Controlling for something is not how to assess the relative importance of what is controlled versus what is studied.
Some of the criticism was based on a preference for frequentist statistics rather than the Bayesian approach for which Sebastiani is well-known. Sometimes this took on a comic aspect. One of the paper’s statistical exhibits, which was actually drawn using a common Bayesian measure of importance, was sometimes mistaken for a similar graph based upon a common frequentist measure.
There is a potential importance to the difference between the two measures. The frequentist statistic often imparts a qualitative feature to the graph which, to some extent, “confirms” the “significance” of local differences in the genome. That’s important, because the “significance,” even within the frequentist theory, is a shaky guide to the truth. That the paper’s graph generally lacked this qualitative confirming feature was one basis on which some people claimed to know immediately that something was wrong.
There was in fact something wrong, of course. But was the absence of this frequentist feature in a Bayesian statistic really a diagnostic indicator? Maybe; the two measures are conceptually related, even if very different in quantative properties. Then again, maybe not. Only when the paper is published, with what we know are now heavily audited, squeaky clean data, will we have a good Bayesian example from which to learn whether graphs based on the Bayesian statistic actually display the qualitative feature so admired by frequentists, and with what similarity.
Taken as a whole, the public comments based on the pre-publication version of the paper resemble what happens in peer review, especially in the first round. Some reviews point out crucial problems, others show understandable misunderstandings of key points, and still others are more about other general issues (like Bayes versus frequentist statistics) than about the specific work being evaluated.
The differences, of course, are that the pool of commentators is huge, participants nominate themselves, and typically sign their names to their “reviews.” It requires very little imagination to see that this is a wonderful, technologically enabled complement to the first-line quality control efforts of the venerable peer review.
The bottom line in the current case is that the system worked. Let us rejoice. And if there is a lesson to be learned, maybe it is for readers to take that boilerplate warning at Science Express seriously. The papers there are subject to change. Maybe, too, the “warning” should be expanded to include an invitation, “Please take your best shot at helping us to improve what you read here.”
See the Unlinks page for a collection of public statements surrounding the acceptance, revision and eventual retraction of the paper. Or click here.