PHARMAKON : a platform to explore the chemodiversity of living organisms

Head :  Guillaume CABANAC

Pharmakon is the result of a process of integration of open data (10 sources such as Universal Natural Product Database, Human Metabolome Database, KnapSack and CheBi) or data under toll access (source Dictionary of Natural Products). In total, 626,970 natural products are listed and described with metadata useful for biochemistry research: molecular descriptors, biological origin of each compound, classification in the taxonomy of living organisms (kingdom, family, genus, species), chemical ontology (alkaloids, terpenes, fatty acids, etc.), referencing of interoperability identifiers with open databases, and bibliographic resources for each compound. To our knowledge, Pharmakon is today the most comprehensive relational database in terms of coverage of the chemical diversity of living organisms.

Pharmakon is perfectly suited for metabolomics research whose aim is to identify molecules present in complex mixtures of plants, microorganisms (fungi, bacteria), biofluids (plasma, urine…) analyzed by liquid chromatography coupled to mass spectrometry (LC/MS). Chromatographic data from LC/MS are annotated thanks to the resources listed in the Pharmakon database. This approach is is forefront in clinical research (patient stratification, search for physiopathological biomarkers) and in agronomic research (symbiosis, rational biocontrol) and is part of holistic approaches such as “One Health”. Pharmakon reduces data preparation time for LC/MS spectral data interrogation from an engineer’s morning to only a few minutes (one input and one download). The MS-CleanR algorithm developed by the consortium around this project (Fraisier-Vannier et al., 2020) improves the relevance of the results obtained while considerably reducing analysis time.

This resource is also exploited to understand the underlying mechanisms at the origin of the chemodiversity of living organisms and their interactions. Gathering several layers of orthogonal information such as the taxonomic origin of compounds and their respective ontologies associated with their biological activities (anticancer, antibiotics…) allows inferring and prioritizing the search for new actives in taxa of interest. This approach was the subject of a first publication in 2019 (Chassagne et al., 2019).

Pharmakon is used by the PharmaDEV (20 researchers and EC) and the LRSV (80 researchers and EC) laboratories of the Paul Sabatier University. Some of our developments are available to the members of Metatoul (40 C and EC): the Toulouse platform (INRAE, INSA, INSERM, CNRS, UT3) of the national metabolomics infrastructure MetaboHUB. Partners abroad also operate Pharmakon (Fabio Espichan Jauregui, PhD student, Universidad Peruana Cayetano Heredia, Peru; Chiobuaphong Pakavoalay, EC, faculty of Laos University of Pharmacy).

Positioning of the platform in relation to existing platforms (local and national) 

The Pharmakon platform is a unique tool, both in terms of the number of compounds referenced and the diversity of metadata available. To our knowledge, no platform of this type is present at the national level. The international open databases are generally limited in their coverage (HMDB for clinical information in human health, plantcyc for plants, NPatlas for microorganisms…) and their thematics which does not allow to organize the information in a holistic way, as Pharmakon can do. Another important pitfall is the homogenization of data and their interoperability. Indeed, we have detected a lot of referencing errors during the capture of data from several open databases which have been solved within Pharmakon.

Technical and organizational description, utilization rate 

The data is stored on the laboratory’s Oracle server. An APEX web interface (Oracle technology) allows researchers to explore Pharmakon data as well as download data needed to identify peaks on LC/MS outputs or to exploit metadata for data inference projects. Access to Pharmakon is restricted to our privileged collaborators for joint projects. The rate of use is therefore relatively limited due to the “request-download” approach that we have favored. A wider opening has been envisaged via TTT, but has not yet been achieved, in particular due to the time required to finalize a reconfiguration of this platform in Open Access or a valorization with an industrial partner.