Knowledge graphs are often used to represent and visualize the semantic relationships within an interconnected dataset. They are also helpful for querying and exploring those relationships between different data types, as well as for imputing knowledge through machine learning-based methods.
The CFDE Data Distillery Partnership is building a Data Distillery Knowledge Graph (DDKG) that integrates data from each DCC into a unified knowledge graph. Each DCC provides standardized, machine-readable assertions generated from their datasets -- the "distilled" data" -- which follow a common schema. Specifically, the DDKG schema is based on the Unified Biomedical Knowledge Graph (UBKG) schema originating from the Unified Medical Language System (UMLS), and supports over 180 ontologies for representing Common Fund data. The goal of the project is to provide a single resource for querying and visualizing cross-DCC data relationships in order to further knowledge discovery and integration across the CFDE.
PUBCHEM 60795 indication SNOMED 13746004
nodes.tsv
: A table containing metadata on entity nodes
node_id
(Required): The unique identifier for the node; preferably a term in the format {SAB}<space>{Code}
, e.g. PUBCHEM 60795
node_namespace
: The source abbreviation (SAB) for the term, e.g. PUBCHEM
node_label
(Required): The preferred term or human-readable label for the node, e.g. aripiprazole
node_definition
: A definition for the nodenode_synonyms
: Text synonyms for the node, if any; these should be separated by vertical bars |
node_dbxrefs
: External database references for the node, preferably in {SAB}<space>{Code}
formatvalue
: A numeric decimal value, usually reserved for nodes representing some quantitative measurementlowerbound
: A numeric decimal value representing the minimum allowed value for the node, if applicableupperbound
: A numeric decimal value representing the maximum allowed value for the node, if applicableunits
: Units for the value
, lowerbound
, and upperbound
fieldedges.tsv
: A table describing relationships between entities defined in nodes.tsv
subject_id
(Required): An existing node_id
that is the subject of the assertion, e.g. PUBCHEM 60795
relationship
(Required): A custom string or an IRI referencing a relational ontology relationship type, e.g. indication
object_id
(Required): An existing node_id
that is the object of the assertion, e.g. SNOMED 13746004
evidence_class
: Any evidence or value specific to the SAB and relevant to the relationshipedges.tsv
, then fill in nodes.tsv
.
a. First identify all relationships captured within a dataset, identify the ontologies/SABs involved, and build the edges.tsv
table with just the triples.
b. Then extract all unique nodes represented within edges.tsv
, identify the relevant SAB terms and metadata, and fill in nodes.tsv
.