Talk:PLOS/Flow cytometry bioinformatics

Wikification
The article looks good from a wiki perspective, though it would be good (as commented here) to expand the lead section a bit to have it serve as a summary of the Wikipedia article and also as the abstract at PMC. I have not checked details of formatting (e.g. references) yet. --Daniel Mietchen (talk) 17:04, 5 May 2013 (PDT)
 * I agree -- the lead is a little brief, and could benefit from expansion to something more resembling an abstract. I'll work on that once the comments from the reviewers come in and we've addressed those. -Kierano (talk) 11:22, 17 May 2013 (PDT)


 * I have gone though the article again and fixed most remaining issues in terms of wikification. What I could not do is check all of the dated statements ("as of February 2013"), and in two cases, these statements (updated or not) should be supported by a reference (see "citation needed"). Finally, please provide figures in SVG if possible (these are only for the wiki - the journal still wants TIFF/PNG), so that adaptations (e.g. translations) are possible. --Daniel Mietchen (talk) 05:59, 11 July 2013 (PDT)
 * Fixed. I'll be able to sort out the SVG before we push the content to Wikipedia, but not until the end of the month. -Kierano (talk) 17:35, 12 July 2013 (PDT)

Reviewer 1: Holden Maecker
I find this to be a good and wide-ranging summary of topics associated with flow cytometry analysis and bioinformatics. It spans the territory from basic flow cytometry concepts and gating, to newer bioinformatics approaches like SPADE and PCA, and routines for data processing such as those in Bioconductor. Few people's expertise spans all of these areas, but this page provides a good synthesis for folks who work in one or more of these areas, and want to learn more. I would suggest expanding the section on Gating, to make some basic but missing or merely implied points, e.g.: -Gating is hierarchical, usually focusing in on specific subsets by sequential selection of populations, usually in two dimensions at a time (e.g., Lymphocytes->T cells->CD4+ T cells->naive CD4+ T cells). -This approach suffers from the inability to visualize all other relevant dimensions when gating on only two dimensions at a time; it may even make it difficult to distinguish closely spaced populations that could be better separated in >2-dimensional space. And it suffers from "tunnel vision", in that an overview of the entire dataset is virtually impossible. -Boolean gates can be created (to some extent, automatically in software such as FlowJo) that divide a population of cells into all logical combinations of markers. This is a complementary approach to automated gating algorithms that find "where the cell clusters are"; in a Boolean approach, one asks "what are all the possible cell phenotypes" and then monitors those compartments to see which ones are populated and to what extent. It is, however, a deterministic approach, assuming that cells are either positive or negative for a given marker, and the user decides the positive/negative boundary. The number of compartments can also become staggering with increasing dimensions. Clustering algorithms are, by contrast, unsupervised, in that they do not require any user input about what is positive or negative; they simply find regions of cell density, inflection points, etc. -Holden Maecker


 * These are some excellent suggestions. As there is some overlap between these comments on gating and the comments of reviewer 3, we have addressed both reviewers' comments in our response there. -Kierano (talk) 11:48, 27 June 2013 (PDT)

Reviewer 2: Nolwenn Le Meur
This topic page gives a good review of the field of flow cytometry bioinformatics. It covers the fundamentals of data handling and analysis for flow cytometry. It also highlights new approaches and ongoing developments, notably for cell population identification where room for improvements remains.

My main comment is on the lead paragraph. The sentence “Flow cytometry bioinformatics is the application of bioinformatics, computational statistics and machine learning to analyze flow cytometry data” is confusing. As mentioned in the Wikipedia page for Bioinformatics, this interdisciplinary field uses many areas of computer science, mathematics and engineering and therefore includes the concept of data analysis with notably machine learning technics. I would rather say: “Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools." Maybe it could be added that flow cytometry bioinformatics requires and contributes to the development of computational statistics and machine learning methods. In addition, the introduction could be developed with examples of application fields. Indeed flow cytometry is used in wide range of domains from medicine and environment for human health to the analysis of the microbiome in seawater (e.g. Wang, Y et al. (2010). Past, present and future applications of flow cytometry in aquatic microbiology. Trends in Biotechnology, 28(8), 416–424. doi:10.1016/j.tibtech.2010.04.006.)

A minor comment is on the description of the different steps in computational flow cytometry analysis. This description is well done although the concept of workflow could be emphasized. Some software allows storing analysis workflows, which are notably useful for qualitative and reproducible research. For instance, for gating which is a hierarchical process, it is especially required to keep track of the process used for population selection. It is also essential when flow cytometry is used as a diagnostic tool to automate population selection. Finally, workflows saved in standard file format such as XML can be played by different software, which can be useful in terms of reproducible research.

Nolwenn Le Meur


 * The comments on the lead section were extremely helpful, and have been taken into account in the expansion of that section.


 * We have added a listing of some of the applications of flow cytometry to the introduction section.


 * We have added a paragraph to the section overviewing the steps in flow cytometry analysis to emphasise the importance of workflows and their interchange for reproducibility. -Kierano (talk) 11:48, 27 June 2013 (PDT)

Reviewer 3: Jorge Pardo
This page provides an informative overview of the type of multidimensional data generated by flow cytometry and the role of bioinformatics in analyzing increasingly complex data sets. The introductory section on the basics of fluorescence based flow cytometry is missing a description of spectral cross-over compensation.This would seem an oversight, as compensation and compensation matrices are mentioned in other sections of the page.

Manual gating in the analysis of flow data should be described earlier in the page, certainly before describing Gating-ML, and with a bit more detail. The authors describing the process as "error prone" and "non-reproducible".Given the same data set, two investigators may use different hierarchical manual gating strategies to define a cell population, but this does not imply intrinsic non-reproducibility in the process. Indeed, clinical flow cytometry laboratories are certified based on their ability to reproduce results while testing a defined sample, and this testing involves manual gating. As for "error prone", the inference is that there is a correct way to gate flow data, and that when this process is done manually, it is likely to be done incorrectly. This statement is then ignored in the discussion of combinatorial gating approaches, like flow type/RchyOptimyx, that use manual gating. On the other hand, the discussion of automated gating using clustering algorithms fails to mention that repeated analysis of data sets with large number of clusters may report different cluster partitions (http://www.biomedcentral.com/1471-2105/14/S1/S8). I would invite the authors to present a balanced characterization of manual gating that recognizes its limitations in the analysis of increasingly complex flow data; it is a time consuming hierarchical approach that is limited to two dimensional analysis at each step.


 * Re. manual gating:
 * We have made several changes to address this comment:


 * We have re-organized the content to discuss manual gating earlier as requested. We have also clarified that manual analysis can indeed be reproducible specially in controlled clinical settings and have better described the cases in which it can cause inaccuracies. We have also clarified that despite the recent advances in computational analysis, manual gating still is the main solution for identification of specific rare cell populations (e.g., for gating rare populations for the combinatorial gating algorithms). Finally, we have explained that the computational gating algorithms we have discussed here can automatically select the number of cell populations using different methods and that this choice can affect the sensitivity and specificity of the results. -Nnimaa (talk) 15:21, 28 June 2013 (PDT)

Lastly, I'd emphasize the need for informative representation of cell populations identified through automated gating of complex multidimensional flow data. It is not informative to show all cell populations defined through multidimensional analysis on two dimensional dot plots. The SPADE software does a great job as it organizes defined cell populations in hierarchies of related phenotypes and it also allows for the comparison of individual markers across all the cell populations. This facilitates the identification of cell lineages, identification of rare cell types and comparison of different samples. -Jorge Pardo


 * Re. visualization:


 * First, we would like to clarify that the SPADE algorithm is not always suitable for identification of lineages (as spanning trees are not necessarily representing lineages) or rare cell populations (due to the down sampling). Several approaches are being considered for addressing these limitations. This being said, we agree with the reviewer that SPADE is a fantastic algorithm for visualization of an entire sample to identify major cell populations and have discussed it in the "gating guided by dimension reduction" section. -Nnimaa (talk) 15:16, 28 June 2013 (PDT)


 * Re. compensation:


 * Initially we had thought to exclude compensation, as the methods for performing it, while computational and automated, are standard and have not advanced significantly since the development of multicolour flow. However, on re-reading, it did indeed feel missing, and we have consequently included a section discussing the computational aspects of compensation. -Kierano (talk) 11:48, 27 June 2013 (PDT)

Response to reviewers
We have added our responses to each reviewer below their review. All of the comments were extremely helpful and we feel have strengthened the paper; we thank all of the reviewers for their input.

Full details of the changes can be seen at this diff. -Kierano (talk) 11:53, 27 June 2013 (PDT)