Pharma – PubMed authorship analysis
Freelance Data Scientist Thomas Wood. Helping organisations extract value from unstructured data.
For one client in the pharma industry, I developed cross-platform desktop tool which allows a user to import search results from PubMed for a particular term, and process them into knowledge graphs.
The tool uses natural language processing and PubMed’s MeSH tags to identify who are the most prominent researchers in sub-fields, and quantify how active they are and where they are located geographically. Although medical literature is generally tagged with MeSH terms, lot of the relevant information is found only in paper abstract or full text, and so a sophisticated bespoke natural language processing algorithm was necessary to extract relevant data.
Using this data it was possible for the client to generate graphs of researchers’ collaborations, and rank researchers by various metrics, allowing the company to target researchers effectively for potential collaborations.