wikidata-chemistry-curation

Adding additional information

This chapter describes various efforts that have taken place in the past to add content to Wikidata from various (peer reviewed) sources. The serve as an example.

Adding chemical compounds

The first 200 thousand chemical compounds were added in the work by Waagmeester, Stupp, Burgstaller-Muehlbacher, and others [1]. Willighagen wrote a CDK and Bacting based script to add chemical structures , now available as Wikidata/createWDitemsFromSMILES.groovy. This has been used to add many chemical compounds. By default, it only adds compounds with full stereochemistry defined. It add the SMILES, InChI, InChIKey, and mass. If the InChIKey gives a match in PubChem, then the PubChem CID is added too.

Melting points

Adding properties follow a similar process. If a SMILES is given, an InChIKey can be calculated, which can be used to find the Wikidata items to which a property belongs. This has been used to add melting points from the Jean-Claude Bradley Open Melting Point Dataset [2] using another Groovy script, MeltingPoints/createQuickStatements.groovy.

Boiling points

Earlier this year, another set of bioling points have been added, sourced from a 2004 article [3]. Yet another Groovy script, BoilingPoints/createQuickStatements.groovy, uses this gist as input to create QuickStatements.

References

  1. Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith O, et al. Wikidata as a knowledge graph for the life sciences. eLife [Internet]. 2020 Mar 17;9. Available from: https://elifesciences.org/articles/52614 doi:10.7554/ELIFE.52614 (Scholia)
  2. Bradley JC, Williams AJ, Lang ASID. Jean-Claude Bradley Open Melting Point Dataset [Internet]. Figshare. 2014. Available from: https://figshare.com/articles/Jean_Claude_Bradley_Open_Melting_Point_Datset/1031637/2 doi:10.6084/M9.FIGSHARE.1031637.V2 (Scholia)
  3. Rücker C, Meringer M, Kerber A. QSPR using MOLGEN-QSPR: the example of haloalkane boiling points. JCICS. 2004 Nov 1;44(6):2070–6. doi:10.1021/CI049802U (Scholia)