CORD-19 vs. PubMed: HOW TO GET THE MOST FROM COVID-19 KNOWLEDGE SOURCES
May 8, 2020
One of the silver linings of the current SARS-CoV-2 global pandemic is the enormous collaborative effort among scientists, who are joining forces to quickly develop a more complete understanding of COVID-19 and possible treatments. At PercayAI, we have spent the past several weeks using our augmented intelligence knowledge mapping tools to quickly ingest information surrounding COVID-19. Here, we analyze and compare two COVID-19 data sources: CORD-19 and PubMed.
COVID-19 Open Research Dataset (CORD-19) is a COVID-19-specific knowledge source generated by The Semantic Scholar team at the Allen Institute for AI to help researchers access all relevant research findings on the topic, including pre-print articles. In response, several groups have developed tools to explore this dataset (recently discussed in this GeekWire article).
We sought to contribute to this effort by incorporating the CORD-19 knowledge source into PercayAI’s AI knowledge mapping tool. Our contextual language processing capabilities allow us to quickly incorporate many different types of text-based data sources into our memory model. While CORD-19 is a powerful and specific knowledge source, we wondered whether the ability to form novel connections would be lost in the absence of other non-COVID-19 articles. Here, we present the results from three projects related to COVID-19, comparing the findings from using either CORD-19 as the knowledge source or all of PubMed as the knowledge source.
We focused on these three queries to compare the connections each knowledge source yielded:
Understanding the virus behind the outbreak: What is known about novel coronavirus SARS-CoV-2?
Comparing proposed treatment of COVID-19: Of the drugs in clinical trials, what mechanisms are they targeting? What else is known about these drugs that might affect their effectiveness against COVID-19 (known side effects, etc)?
Digging into mechanism: What is the role of ACE2 (the receptor CoV-2 uses for entry into host cells) in the disease?
Below are knowledge maps created for each of the above projects using either PubMed or CORD-19 as the knowledge source. We will break down these knowledge maps by common theme, and then by findings within that theme.
Understanding the virus:
What is known about novel coronavirus SARS-CoV-2?
Insights gleaned from PubMed and CORD-19 knowledge sources about coronavirus:
CoV-2 genetics (purple): CoV-2 has a simple RNA genome with few genes. From genetic sequence analysis, it has been found to be highly similar to other beta coronaviruses such as SARS-CoV (known for the 2002-2003 SARS outbreak) and MERS (known for outbreak starting in 2012).
Host cellular effects (red): CoV-2 interacts with ACE2 and TMPRSS2 to enter the host cell. Through CORD-19 analysis, we see more focus on the mechanism of cell entry and viral replication. We found that CoV-2 main protease, Mpro, is key to viral replication within the host as it is required for proteolytic maturation of the virus.
Symptoms (orange): A number of COVID-19 symptoms have been described from those typical of viral respiratory infection (cough and fever) to severe complications that have been observed in COVID-19 such as ARDS and lymphopenia.
Transmission (green): There are many considerations the scientific community is weighing relating to how the virus is transmitted and diagnosed as well as public health concerns about how to prevent the number of cases from overwhelming the medical faculties. Compared to analysis through PubMed, CORD-19 has a higher enrichment of public health discussion articles (higher enrichment score; illustrated visually through larger sphere size of Transmission Themes).
We ran the search term “2019 novel coronavirus” in our AI tool using the PubMed knowledge source and then using the CORD-19 knowledge source.
Coronavirus map using PubMed
Coronavirus map using CORD-19
Next, we look into the drugs currently in clinical trials for the treatment of COVID-19. To do this we input the clinical trial data for drugs entered into clinical trials for COVID-19 as of March 25.
Comparing proposed treatment of COVID-19:
Of the drugs in clinical trials, what mechanisms are they targeting? What else is known about these drugs that might affect their effectiveness against COVID-19 (known side effects, etc)?
Cellular targets (blue): Both knowledge sources identify autophagy/endolysosomal systems and regulators of blood pressure (RAS system and cycling GMP) as targets for the drugs currently in clinical trials.
COVID-19 related clinical outcomes (purple): Viral infection, respiratory complications, and other clinical outcomes themes were found. Both mechanisms of the viral infections and the complications that they lead to are targeted by the drugs under consideration for COVID-19 treatment.
Immune effects (red): Important immune processes are key targets in the current clinical trial drugs for COVID-19, notably JAK/STAT signaling.
Other indications and adverse effects (green): Within the PubMed knowledge source, but not CORD-19 are the previous indications for drugs considered for repurposing and currently in clinical trials, as well as known adverse effects. Since CORD-19 does not include literature surrounding the original uses for potential COVID-19 drugs, it was not able to provide this information.
Map of COVID-19 clinical trial drugs using PubMed
Map of COVID-19 clinical trial drugs using CORD-19
Finally, we wanted to compare the available mechanistic information available in each knowledge source. To accomplish this, we ran the search term “ACE2” in our AI tool using the PubMed knowledge source and then using the CORD-19 knowledge source.
Digging into mechanism:
What is the role of ACE2 (the receptor CoV-2 uses for entry into host cells) in the disease?
Physiological role (green): ACE2 is an enzyme in the renin–angiotensin system (RAS) on the outer surface of the cell membrane. ACE2 is active in many tissues such as the lungs, heart, arteries, kidneys, brain and intestines. ACE2 is the counterbalance to ACE and works to elicit many physiological effects including reducing blood pressure, inflammation and fibrosis.
Viral Infection (red): CoV-2 uses ACE2 to enter the host cell through the viral spike protein binding to the enzymatic domain of ACE2 on the cell surface. This leads to endocytosis of the virus. In addition, TMPRSS2 primes the virus for entry. Through the CORD-19 analysis, there is more detail on the mechanism of CoV-2 viral infection including viral attack to certain cell or tissue types (epithelial and mucosal cells of the lungs and GI tract are particularly vulnerable).
Pathophysiology role (blue): Outside of viral infection, ACE2 is associated with many health complications as RAS comes dysregulated. This can lead to a number of pathological states including cardiovascular diseases, metabolic disorders, sympathetic nervous system disruptions and pulmonary complications.
Map of ACE2 using PubMed
Map of ACE2 using CORD-19
To sum up the comparison, let’s look at the high-level comparison of the two knowledge sources.
Key Takeaways from Data Source Comparison:
Subject-specific literature databases, such as CORD-19, offer greater depth.
Rich content surrounding COVID-19 allows scientists to synthesize the vast amount of work being done on this topic.
E.g., The knowledge map of ACE2 generated from CORD-19 holds more details of the mechanism of action around SARS-Cov-2 infection.
Large, non-specific scientific literature databases, such as PubMed, offer greater breadth.
Novel ideas not yet known to associate with COVID-19 are more likely to be displayed, better enabling hypothesis generation.
E.g., For drugs that are currently in clinical trials to be repurposed for COVID-19, we find that PubMed holds much needed information from previous clinical trials, such as the efficacy and safety outcomes.
The recent explosion of publications surrounding COVID-19 contain a greater proportion of commentaries and public health-related articles than is typically seen in PubMed
The CORD-19 knowledge base thus has a greater focus on public health concerns, which PubMed’s broad knowledge base focuses more on basic research.
We have found benefits in each knowledge source for scientific discovery and analysis. By providing frequently updated content through these sources, scientists can address a number of key questions related to the current pandemic. We shared a few examples that we have been working on to illustrate some of the differences between using PubMed or CORD-19 for analysis.
Scientists can address a number of key questions related to the current pandemic using these sources’ frequently updated content, but when deciding which knowledge source or combination of knowledge sources will be most helpful, it’s important to look at the type of insight you need. PubMed is a much broader knowledge source that provides valuable background information to the data fabric; whereas, CORD-19 provides a deeper understanding of ideas specifically around COVID-19, including information about specific viral mechanisms involved.
PubMed is a broad knowledge source that covers all areas of biology, while CORD-19 provides a deeper understanding of ideas specifically regarding COVID-19.
In conclusion, our AI tool allowed us to quickly convert available research findings into an easy-to-understand visualization, which helped our computational biologists make connections of their own. The ability for researchers to become well-versed in the current literature and identify scientifically grounded, testable hypotheses in a timely manner will accelerate the timeline to effective drug treatments.
Leave a comment below with your impressions, or send us a message here. If you have COVID-19 data you'd like us to run, free of charge, reach out here, so our team can contact you with potential next steps.
Share your impressions
Send us an idea or share your feedback below. We may test it in CompBio, and create and share hypotheses on the blog.*
*We are launching this webpage to share knowledge maps of Coronavirus and to encourage hypotheses generation. We invite you to share your ideas, insights, and feedback (“Feedback”) to help us create hypotheses in connection with Coronavirus therapies, treatments, and solutions. Please click on the links below which cover our Right to Use Feedback, and important Disclaimers. By submitting Feedback, you agree that you have read and agree to these terms. Thank you for your participation in this project.
Drugs entered into clinical trials for COVID-19 as of March 25:
(1) ribavirin, (2) methylprednisolone, (3) oseltamivir, (4) chloroquine, (5) tocilizumab, (6) remdesivir, (7) favipiravir, (8) thymosin, (9) hydroxychloroquine, (10) ritonavir, (11) azithromycin, (12) camostat, (13) baricitinib, (14) losartan, (15) sarilumab, (16) eculizumab, (17) bromhexine, (18) cobicistat, (19) darunavir, (20) nitric oxide, (21) bevacizumab, (22) sildenafil