Methods Description for the PercayAI Software Platform
To reference the PercayAI platform in a publication please use the following text:
PercayAI Software Platform, PercayAI, LLC, 4220 Duncan Ave Suite 201, St. Louis, MO 63110 USA
The PercayAI Software Platform performs a literature analysis to identify relevant biological processes and
pathways represented by the differentially expressed entities (genes, proteins, miRNAs, or metabolites). The PercayAI Software extracts all abstracts from PubMed that reference entities of interest (or their synonyms), using contextual language processing and a biological language dictionary that is not restricted to fixed pathway and ontology knowledge bases. Conditional probability analysis is utilized to compute the statistical enrichment of biological concepts (processes/pathways) over those that occur by random sampling. Related concepts built from the list of differentially expressed entities are further clustered into higher-level themes (e.g., biological pathways/processes, cell types and structures, etc.).
Within the PercayAI Software Platform, scoring of gene, concept, and overall theme enrichment is accomplished using a multi-component function referred to as the Normalized Enrichment Score (NES). The first component utilizes an empirical p-value derived from several thousand random entity lists of comparable size to the users input entity list to define the rarity of a given entity-concept event. The second component, effectively representing the fold enrichment, is based on the ratio of the concept enrichment score to the mean of that concept’s enrichment score across the set of randomized entity data.