Books and Book Editorships:
T. Gärtner, A. Haywood, J. Redshaw, A. Taylor, A. Mason, J. Hirst (ed.):
"Machine Learning for Chemical Synthesis";
The Royal Society of Chemistry,
Synthetic chemistry is at the heart of medicinal and materials chemistry. Anticipating how moleculeswill react (forward synthetic planning)and how a molecule can be synthesised (retrosynthesis) rely on the expert knowledge and experience of a synthetic chemist. Repositories such as Reaxys and SciFinder contain millions of compounds, reactions, references, and bioactivities. These repositories allow chemists to search through vast regions of chemical space to fill in any gaps in their knowledge. Reaction precedents on analogous structures and/or reaction types to molecules with no reported synthesis routes can be retrieved. Although integrated tools enable the search results to be filtered, analysing the information to make informed decisions is a manual process that requires the intellect and time of an expert synthetic chemist.
Automating retrosynthetic analysis, the recursive breakdown of a target molecule until reactant precursors are reached that are commercially available, would significantly reduce the time taken to plan chemical syntheses. The concept of using a computer to plan synthetic routes was first proposed around 50 years ago. The history of CASD tools has been reviewed in detail, from the pioneering work of E. J. Corey in the development of Logic and Heuristics Applied to Synthetic Analysis (LHASA) to contemporary commercially available programs. Thus far, chemists are yet to implement these tools when planning synthetic routes to new molecules. A survey was completed in 2017 (13 chemists in two companies)3 to determine what chemists expect from CASD tools. The following were identified as the most important aspects to include: (i) user-friendly, (ii) provide supportive literature examples, (iii) define possible bonds to be broken, (iv) lead to commercially available reactant precursors, (v) recognise conflicting reactivity and suggest protecting groups, and (vi) prioritise results. While aspects (i) to (v) can be implemented regardless of the chemistīs area of expertise, prioritising results is a challenge. Chemists working in diﬀerent parts of the chemical industry have diﬀerent priorities when designing reactions. Their criteria for a suitable reaction could be based on cost, greenness, reaction conditions (such as temperature or catalysts), and so on. Therefore, CASD tools need to be flexible and rank pathways based on the criteria provided by the chemist. Forward synthetic planning is the prediction of products given the reactants, reagents, and a set of reaction conditions. Experiments, which are predominantly used to identify reaction outcomes, are expensive, time-consuming, and require experienced chemists. It would therefore be beneficial for computational tools to identify the major product, any side-products, and validate retrosynthetic predictions. Optimising reaction conditions, such as catalysts and solvents, is also an important part of synthetic planning.Changing a set of reaction conditions, even slightly, could result in the formation of a diﬀerent major product or a failed reaction. Integrating CASD with high-throughput screening and robotic equipment holds much promise for the future of reaction optimisation. Advances in artificial intelligence, machine learning methods and availability of big data have provoked renewed interest in CASD. In this chapter, we will outline sources and representations of reaction data, give a brief description of machine learning methods focused on synthetic chemistry, and provide detailed approaches to CASD with contemporary examples and comparisons.
Machine Learning, Chemical Synthesis, Chemical Data, Molecular Descriptions, Machine Learning Methods, Synthetic Route Design
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
Created from the Publication Database of the Vienna University of Technology.