Wikipedia has a separate chemistry community, and while some Wikidata chemistry content is visible on Wikipedia, it also happens regularly that Wikipedia has a SMILES for a chemical compound, where Wikidata does not. DBpedia helps here [1].
The following SPARQL query finds ten thousand (the default limit in DBpedia) Wikipedia pages with
a ChemBox and checks for those if Wikidata has a SMILES:
SPARQL sparql/missingSMILES.rq (run, edit)
PREFIX dbpedia2: <http://dbpedia.org/property/>
SELECT ?s ?article ?item ?itemLabel WITH {
SELECT DISTINCT ?s ?article WHERE {
SERVICE <https://dbpedia.org/sparql> {
?s dbpedia2:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Chembox>.
?article_db foaf:primaryTopic ?s.
}
BIND (IRI(REPLACE(STR(?article_db), "http://", "https://", "i")) AS ?article)
}
} AS %DBPEDIA WITH {
SELECT DISTINCT ?s ?article ?item WHERE {
INCLUDE %DBPEDIA
?article schema:about ?item .
MINUS { ?item wdt:P233 [] }
MINUS { ?item wdt:P2017 [] }
MINUS { ?item wdt:P10718 [] }
}
} AS %CHEMICALS WHERE {
INCLUDE %CHEMICALS
VALUES ?chemicals { wd:Q113145171 wd:Q59199015 }
?item wdt:P31 ?chemicals.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
The results look like this:
| s | article | item |
| http://dbpedia.org/resource/Vanadium(V)_chloride_chlorimide | https://en.wikipedia.org/wiki/Vanadium(V)_chloride_chlorimide | vanadium(V) chloride chlorimide (edit) |
| http://dbpedia.org/resource/Potassium_tetracarbonyliron_hydride | https://en.wikipedia.org/wiki/Potassium_tetracarbonyliron_hydride | potassium tetracarbonyliron hydride (edit) |
| http://dbpedia.org/resource/(Triphenylphosphine)iron_tetracarbonyl | https://en.wikipedia.org/wiki/(Triphenylphosphine)iron_tetracarbonyl | (triphenylphosphine)iron tetracarbonyl (edit) |
| sparql/missingSMILES.rq | ||
Many polymers can have a CXSMILES property and the following query lists those that do not have this property:
SPARQL sparql/polymersWithoutCXSMILES.rq (run, edit)
SELECT ?cmp ?cmpLabel WHERE {
?cmp wdt:P31 wd:Q81163 .
MINUS { ?cmp wdt:P10718 [] }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
This returns values like this:
| cmp | |
| polyisoprene (edit) | |
| Styrene-acrylonitrile resin (edit) | |
| chondroitin sulfate (edit) | |
| sparql/polymersWithoutCXSMILES.rq | |
We can do the same thing for functional groups:
SPARQL sparql/functionalGroupsWithoutCXSMILES.rq (run, edit)
SELECT ?fg ?fgLabel ?cxsmiles WHERE {
?fg wdt:P31/wdt:P279* wd:Q170409 .
MINUS { ?fg wdt:P10718 ?cxsmiles }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". }
}
Here too, the list provides a list of curation opportunities:
| fg | cxsmiles |
| acetamido group (edit) | |
| peroxyacetyl group (edit) | |
| benzhydryl (edit) | |
| sparql/functionalGroupsWithoutCXSMILES.rq | |