Talk:Preprint/Chemical Graph Theory

Review by Bono Lučić
The manuscript provides an interesting overview and, in my opinion, deserves to be published after the necessary corrections have been made. My comments on it can be found below.

In some parts of the manuscript it is necessary to correct the English or improve the pronunciation. As an example, I give a suggestion for correcting the Abstract.

“Chemical graph theory is a subfield of mathematical chemistry that applies classical graph theory to chemical entities and phenomena. In chemical graph theory, molecular structures are represented as chemical graphs. In such a chemical graph, the nodes and edges represent atoms and bonds. Chemical graphs are the most important data structures for representing chemical structures in cheminformatics. Computable properties of graphs form the basis for (quantitative) structure activity and structure property predictions - a core discipline of cheminformatics. These graphs can then be reduced to graph- theoretical descriptors or indices that reflect the physical properties of molecules. One of the best- known examples of a graph-based molecular descriptor is the Wiener index, which corresponds to the sum of the lengths of all the shortest paths in a molecule and correlates with its boiling points. In addition to chemical indices, the application of graph theory in chemistry includes many other topics such as isomer enumeration, searching for molecular substructures in chemical databases and generating molecular structures.”

Specific comments:

1. All Figures are numbered as Figure 1, although in the text they are referenced with numbers 2-7. I find a total of 8 pictures in the text - that needs to be fixed/corrected.

2. Molecular descriptors are defined by Todeschini and Consonni as: “The molecular descriptor is the final result of a logical and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment.”[54][55] should be corrected to “Molecular descriptors are defined by Todeschini and Consonni as: “the final result of a logical and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment.”[54][55]”

3. For the part: "a way to describe and quantify a chemical structure with" maybe it would be more precise to write "a way to describe and quantify an attribute of chemical structure with".

4. “This study was extended by Randic’s topological index in 1975 [65].“ Strictly, before this index, the M1 and M2 indices were introduced, which are its predecessors (in the paper I. Gutman, N. Trinajstić, Chem. Phys. Lett. 1972, 17, 535–538), and which were later named the first and second Zagreb index. Also, in the formula for the Randić index, the exponent is -1/2 (instead of 1/2, as given in the manuscript). The same is for the Balaban index.

5. This sentence should be improved "In the literature, there are more than 100 different chemical indices, often described as topological indices, which they are generally similar to." Namely, there are more than 100 descriptors named as topological descriptors (one example is here: http://michem.disat.unimib.it/chm/Help/edragon/ListTopolDesc.html). However, there are much more other molecular descriptors. For example, the collection of molecular descriptors in the program alvaDesc contains more than 5500 molecular descriptors (https://www.alvascience.com/alvadesc- descriptors/ or in (Mauri, A. (2020). alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. In K. Roy (Ed.), Ecotoxicological QSARs (pp. 801–820). Humana Press Inc. https://doi.org/10.1007/978-1-0716-0150-1_32).

6. In "Thus, the first usage of chemical graphs was for representing the hypothetical forces between the molecules and atoms [4]." the part "between the molecules and atoms", in my opinion, should be corrected to "between atoms in molecules".

In addition to ref. 4, the authors should also consider some other references such as Boscovich, R. J. Philosophiae Naturalis Theoria; Kaliwoda: Vienna, 1758. English translation (of Venetian ed. 1763): Child, J. M. A Theory of Natural Philosophy, Way Forward and Explained by Roger Joseph Boscovich; Open Court Publishing Co.: Chicago, 1922.

which is given as ref. 10 in Israelachvili, J.; Ruths, M. Brief History of Intermolecular and Intersurface Forces in Complex Fluid Systems. Langmuir 2013, 29, 9605–9619, doi:10.1021/la401002b.

Namely, in that book Boscovich (in Croatian: Josip Ruđer Bošković, in English: Joseph Roger Boscovich; Dubrovnik, Croatia, 1711 – Milan, Italy, 1787) introduces the concept of interatomic forces, which is closest to the concept of chemical bonds, and atoms themselves are entities ( points) without dimension. As such, atoms and the bonds between them as defined by Boskovich, are closest to the concept of a chemical bond as it is depicted in the structure of a molecule represented by a mathematical graph.

This is also confirmed by the paper (with Figure 1 in it): Spencer, J. Brookes. "Boscovich's Theory and Its Relation to Faraday's Researches: An Analytic Approach." Archive for History of Exact Sciences, vol. 4, no. 3, 1967, pp. 184–202. JSTOR, http://www.jstor.org/stable/41133268. Accessed 23 Oct. in 2022 in which ideas by Boscovich are connected with the research of Michael Faraday, who in his research on electromagnetism continuously used the Boscovich hypothesis of point atoms from his earliest productive years.

PLOS Computational Biology journal (discuss • contribs) 11:55, 1 November 2022 (UTC)

Review by Denise Slenter
I enjoyed reading this page on chemical graph theory, which I believe is essential for the chem(o)informatics community. This text could be fine-tuned for the intended audience (chemists/cheminformaticians), while the focus now seems to be more on mathematicians. I would propose including the following updates/changes, before publication:

Comments per section:
 * 1) Abstract:
 * 2) the last sentence seems to include a typo: "such as isomer enumeration, molecular substructures searching in chemical databasesmolecular structure generation." --> "such as isomer enumeration, molecular substructures searching in chemical databases, and molecular structure generation."
 * 3) Graph theory background:
 * 4) Typo in name: "was first used by Slyvester " --> "was first used by Sylvester " **(also found in the applications section, isomer enumeration).
 * 5) This section cannot be understood by people unfamiliar with mathematical graph theory, e.g. "Vertex degrees and edge multiplicities correspond to atom valences and bond multiplicities." --> What is a degree, and what is meant by multiplicities?
 * 6) Try to avoid including mathematical graph theory descriptions for key elements in your texts, without further explanations e.g. "The distance between two vertices is the number of edges in the shortest path." I would check if concepts introduced in this section are useful for the following sections, and if yes, describe these with your target audience in mind.
 * 7) History:
 * 8) The cross-refs to key scientists in this area seem to be added sparsely (e.g. Isaac Newton and John Dalton do not have a link, while Alfred Werner does). Try to make your text more complete, by adding these cross-refs.
 * 9) Figure 1 is not referenced in the text, I assumed it refers to this text "However, August Kékule showed both physical positions and orientations of atoms in a molecule. In his “Tetrahedral Carbon Atom” model (Figure 2)" --> Figure "Tetrahedral carbon atom model." seems to be missing one bond (from the left-hand side H to the C).
 * 10) I am missing a section on drug discovery here (either after or before the metabolomics section).
 * 11) The last section on metabolomics introduces quite some new concepts, without explaining them; for example, MS and NMR are mentioned, but only MS is used for the following examples. Using the information obtained from mass-to-charge ratios is often not enough for structure elucidation, which is where chemical-graph theory could come in.
 * 12) Abbreviation is not explained: "Many CASE suites require" --> "many Computer Assisted Structure Elucidation (CASE) suites require". An additional reference for this section could be: https://doi.org/10.1002/mrc.5115.
 * 13) Applications
 * 14) Figure "Stereoisomers of C2H2Br2." and "Alkyl radicals." use a different font, which looks slightly pixelated in a small version; for consistency, please use the same font throughout your figures.
 * 15) Add the references for each specific chemical group, not only add the end of the sentence for " In the literature, isomer enumeration studies were mostly for special compound classes such as alkanes, aromatic hydrocarbons and polycyclic aromatic compounds [27][28][29]."
 * 16) typo: " BFS or BFS based " --> " BFS or DFS based" ; what is the relevance of these algorithms (they are used in shortest-path problems, but that's not apparent from the text directly).
 * 17) The following section is missing some details, for example on the accuracy of these methods: "In the field, MOLGEN was the fastest and an efficient structure generator for decades. As an alternative to MOLGEN, MAYGEN [42] was developed. It is an open-source chemical graph generator and approximately 3 times slower than MOLGEN. Following MAYGEN, the same team developed another open-source generator, SURGE [43] which is the fastest chemical graph generator in the field." --> Perhaps creating a Table as an overview, listing several relevant parameters, can aid your users in finding the right tool/method for them, iso reading an elaborate text.
 * 18) The 'molecular fragmentation' section lists LC and NMR as relevant for the identification of unknown structures; LC (or GC for that matter) is only useful in combination with another technique in metabolomics, while NMR does not directly lead to fragmentation comparable to MS. I would only list techniques here that are relevant (MS and MS/MS), and explain these in a bit more detail, so your readers will understand the issues at hand (you could elaborate here on plant metabolomics, for example, containing many more (unknown) chemical structures as compared to human samples).
 * 19) Toolkits
 * 20) Interesting overview, but, if I were new to the field, I wouldn't know where to start. Extending this Table with relevant parameters (availability of software/license (Open Source?), original publication, programming language, etc.) would be very helpful here.

General comments:
 * 1) The terms 'vertices' and 'nodes' are used interchangeably; these terms are related but have a slightly different meaning: "vertices describe where specific points are, while nodes describe the topological structure of a feature." Please select which term better fits within your field, and use that throughout the text.
 * 2) Figures are not correctly numbered in the caption; most captions are very short and do not provide the reader with additional information.

DeSl (discuss • contribs) 14:51, 18 November 2022 (UTC)