EXTRI EXtracted Transcription Regulation Interactions

image1

Previously, extended Transcription Factor – Target Gene (TF-TG) resources have been generated by manual curation after mining the literature with PubMed queries (HTRIdb) or text mining (TRRUST). For each of these resources, the initial information extraction mainly served the purpose to generate sources for manual curation, during which relevant abstracts and information was selected. Here, we will present the results of a machine learning assisted text mining approach that allows to automatically extract information on TF-TG with high precision and recall and report on the more than 40.000 unique TF-TG interactions retrieved from abstracts in the entire Medline. Transcription Factors (TFs) were strictly as evidenced by TFClass (Wingender et al. 2015). All interactions have been converted to RDF specifying Transcription Factor to Target Gene interactions, constituting the Prot2Gene graph of the BioGateway triple store, and are available for network building through the BioGateway Cytoscape App. Currently we are refining the text mining pipeline to increase the robustness of the results. We hope to update BioGateway with the improved results starting July 2019.

The combined Prot2Gene resource represents a valuable contribution to the systems oriented Gene Regulation Knowledge Commons, and serves as an important platform for generating and retrieving information and knowledge about parts of the Gene Regulation domain that still need to be characterized.