Talk:PLOS/Approximate Bayesian computation

Comments of Darren Logan on things to do before moving the article to Wikipedia

 * The title should probably be "Approximate Bayesian computation" (small "c"). The "...in computational biology" is probably redundant for WP article. You can always make a redirection from the long title to the short one.
 * Moved. --Spencer Bliven 15:35, 18 May 2012 (PDT)

The following comments apply specifically to the wikipedia-version of this article --Cdessimoz 03:41, 22 May 2012 (PDT)


 * In the summary and elsewhere, you use terms like "over the last years" and "recently". These should be avoided, as WP articles are not dated and thus non-specific time-frames are not meaningful. If you need to refer to time, be specific (e.g. "Since 1999..." or "In 2010...")


 * Example. In general, Wikipedia articles should not contain worked examples. That type of content is better suited to Wikiversity or Wikibooks. There are exceptions, however. The guidance on this can be seen at WP:NOT, specifically: "An article should not read like a "how-to" style... the purpose of Wikipedia is to present facts, not to teach subject matter. It is not appropriate to create or edit articles that read as textbooks, with leading questions and systematic problem solutions as examples... Some kinds of examples, specifically those intended to inform rather than to instruct, may be appropriate for inclusion in a Wikipedia article." I think your example might be ok, but you should be careful of the tone to ensure it doesn't seem like a "how-to" guide.


 * Wikipedia articles do not have conclusion sections.


 * Throughout the article you should try and avoid using a narrative voice and remove all self-references. For example:
 * "As the previous section suggests..."
 * "This section attempts to review important recent developments..."
 * "...should be considered with sober caution, as discussed below."
 * "Interestingly..."
 * "This section discusses these potential risks and reviews possible ways to address them.."
 * "As the above makes clear..."
 * "This section reviews risks..."
 * "This section attempts to review important recent developments."

To follow up on these remarks, the history section in such Wikipedia articles is typically the first after the lead section, as it puts the topic into its historic context. I have thus moved it there. However, this possibly breaks some of the narrative flow and should thus be checked again during the revision. --Daniel Mietchen 19:05, 27 June 2012 (PDT)
 * Response: We have verified the coherence of the narrative flow. --Cdessimoz 07:52, 5 July 2012 (PDT)

Comments of Christian P. Robert on the entry
A few comments on the specific entry on ABC written by Mikael Sunnåker et al.... Response: We now first talk only about parameter estimation. We have also rewritten the section about model selection for better coherence of the text. Response: We have corrected the typos and grammatical mistakes found during the revision. Response: This has been changed. Response: The title has been changed to “Summary statistics” (see also Dennis Prangle's comment below) Response: We have toned down the issue of sufficiency. For clarity reason, we prefer to defer the discussion on predictive performance to the "pitfall and remedies" section. Response: The section on model selection has been rewritten. In the process, the reference to Jeffreys's table has been removed. Response: We have merged the two sections about quality control. Response: We have included a new figure (Fig. 3), which shows ABC with large n for full data, and summary statistics ($$\epsilon = 0$$ and $$\epsilon = 2$$). As suggested, it also compares the ABC results with the theoretical posterior. Response: We would like to keep the discussion on prior distribution and parameter ranges. However, a sentence was added under “Pitfalls and remedies” to emphasize that the problem related to “Prior distribution and parameter ranges” is not specific to ABC. Response: This has been changed accordingly. Response: A section listing ABC software has been added, including a new table with references to the corresponding papers (Table 3).
 * The entry starts with the representation of the posterior probability of an hypothesis, rather than with the posterior density of a model parameter, which makes it seems likely it could lead the novice reader astray. After all, (a) ABC was not introduced for conducting model choice and (b) interchanging hypothesis and model means that the probability of an hypothesis H as used in the entry is actually the evidence in favour of the corresponding model.
 * (There are a few typos and grammar mistakes, but I assume either PLoS or later contributors will correct those.)
 * When the authors state that the "outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution", I think they are leading some of the readers astray as they forget the "approximative" aspect of this distribution.
 * Further below, I would have used the title "Insufficient summary statistics" rather than "Sufficient summary statistics", as it spells out more clearly the fundamental issue with the potential difficulty in using ABC.
 * (And I am not sure the subsequent paragraph on "Choice and sufficiency of summary statistics" should bother with the sufficiency aspects... It seems to me much more relevant to assess the impact on predictive performances.
 * Although this is most minor, I would not have made mention of the (rather artificial) "table for interpretation of the strength in values of the Bayes factor (...) originally published by Harold Jeffreys". I obviously appreciate very much that the authors advertise our warning about the potential lack of validity of an ABC based Bayes factor!
 * I also like the notion of "quality control", even though it should only appear once.
 * And the pseudo-example is quite fine as an introduction, while it could be supplemented with the outcome resulting from a large n, to be compared with the true posterior distribution.
 * The section "Pitfalls and remedies" is remarkable in that it details the necessary steps for validating a ABC implementation: the only entry I would remove is the one about "Prior distribution and parameter ranges", in that this is not a problem inherent to ABC... (Granted, the authors present this as a "general risks in statistical inference exacerbated in ABC", which makes more sense!)
 * It may be that the section on the non-zero tolerance should emphasize more clearly the fact that ε should not be zero. As discussed in the recent Read Paper by Fearnhead and Prangle when envisioning ABC as a non-parametric method of inference.
 * At last, it is always possible to criticise the coverage of the historical part, since this is such a recent field that it is constantly evolving. But the authors correctly point out to (Don) Rubin on the one hand and to Diggle and Graton on the other. I would suggest adding in this section links to the relevant softwares like our own DIY-ABC ...

Review after revision
Christian Robert wrote: "I have nothing to add to my earlier review, I am completely happy with the current version!" --Daniel Mietchen 18:02, 21 September 2012 (PDT)

Review by Dennis Prangle
This is a well written and accessible introductory article. I particularly like the balance struck between describing the simplicity of implementing ABC and the potential drawbacks.

Major comments
(nb I've included full references only for papers not in the original article.)

Response: The section has been removed and most of the material has been incorporated into the “approximation of the posterior” section. Response: We have added a sentence about applications of ABC, with references to these review papers, at the end of the “Example” section. Response: We have added a reference to the Toni & Stumpf SMC-ABC method for model selection. Response: See our response to Christian Robert’s comment above.
 * Much of the material in the "recent methodological developments" section is well established and no longer recent relative to the age of the field (e.g. the Marjoram et al paper was published in 2003). I'd suggest at least renaming this section.  Alternatively, much of this material could be incorporated into the "approximation of the posterior" section, as regression correction ideas and MCMC / SMC algorithms are tools commonly used to improve the approximation.
 * A little more coverage of applications would be nice. One way to do this without increasing the length of the article would be to explicitly reference recent review papers (Beaumont 2010, Bertorelle et al 2010, Csillery et al 2010, Marin et al 2011 ) for further details.
 * The model comparison section should explain how the ABC rejection sampling algorithm can be adapted to perform inference between models (or give a reference). A reference to more advanced algorithms (e.g. Didelot et al, Toni and Stumpf 2009 ) would also be helpful.
 * I agree with Christian Robert's comments that the discussion of a hypothesis H in the motivation section is somewhat confusing, and that links to code could be helpful. Some additional suggestions are the "abc" R package and ABC-SysBio.

Minor comments
Response: This has been changed. Response: This has been changed. Response: We have also added a sentence to point out that it is only an example application, and that the posterior can be computed exactly. Response: We agree and have reformulated this sentence. Response: This has been changed. Response: This has been changed. Response: A sentence was added with a reference to the paper. Response: This has been changed. Response: The formulation was changed to “may therefore be misinformative”. Response: A link has been added. Response: This has been changed. Response: We have added references to these papers. Response: This formulation has been changed.
 * The acceptance criterion should be $$\rho (\hat{D},D) \le \epsilon$$ not $$\rho (\hat{D},D)<\epsilon$$ if $$\epsilon=0$$ is to correspond to acceptance of exact matches only.
 * "Sufficient summary statistics": As Christian writes, it would seem more natural to discuss general summary statistics first, then the special and less practically useful case of sufficient statistics.
 * "Example": I'd point out that this is an example application only, and more accurate inference is possible here by particle filtering methods. If there were some missing data this would be a more natural ABC application e.g. if only the summary statistic was observed.
 * "Approximation of the posterior": "...has been justified theoretically under some limiting conditions". The word "limiting" doesn't seem (to me) to describe the measurement error case.
 * "Choice and sufficiency of summary statistics": "Sufficient statistics are optimal..." I'd change to "Low dimensional sufficient statistics".  For some models (e.g. iid Cauchy) the only sufficient statistics are the full data set, which would be a poor choice.
 * "Choice and sufficiency of summary statistics": "...which is approximated with a pilot run of simulations". Something like "...which is approximated by linear regression based on simulated data" would be more accurate.
 * "Choice and sufficiency of summary statistics": It might be useful to reference a recent comparison (disclaimer: which I contributed to) between methods of choosing summary statistics.
 * "Bayes factor with ABC and summary statistics": "...can also be used to..." it might be more accurate to say "...is sufficient to..."
 * "Bayes factor with ABC and summary statistics": "meaningless" seems too strong as the next sentence suggests a potentially useful alternative way of doing inference.
 * "Prior distribution and parameter ranges": "...based on the principle of maximum entropy". A link to the general topic of objective priors might be helpful here.
 * "Large data sets": "which may be a tractable approach for ABC based methods". Note it is already easy to parallelise many of the steps in ABC algorithms based on rejection sampling and SMC.
 * "Curse of dimensionality": Some theoretical results have been proved here.
 * "Conclusion": "With faster evaluation of the likelihood function..." I'm not sure what this is getting at; in ABC applications the likelihood function typically cannot be evaluated!

Review of updated article
I have read the revised article and discussion of the amendments, and am happy to accept it for publication.