<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Track Awesome Bioie Updates Daily</title>
  <id>https://www.trackawesomelist.com/caufieldjh/awesome-bioie/feed.xml</id>
  <updated>2024-07-07T08:50:58.734Z</updated>
  <link rel="self" type="application/atom+xml" href="https://www.trackawesomelist.com/caufieldjh/awesome-bioie/feed.xml"/>
  <link rel="alternate" type="application/json" href="https://www.trackawesomelist.com/caufieldjh/awesome-bioie/feed.json"/>
  <link rel="alternate" type="text/html" href="https://www.trackawesomelist.com/caufieldjh/awesome-bioie/"/>
  <generator uri="https://github.com/bcomnes/jsonfeed-to-atom#readme" version="1.2.2">jsonfeed-to-atom</generator>
  <icon>https://www.trackawesomelist.com/favicon.ico</icon>
  <logo>https://www.trackawesomelist.com/icon.png</logo>
  <subtitle>🧫 A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)</subtitle>
  <entry>
    <id>https://www.trackawesomelist.com/2024/07/07/</id>
    <title>Awesome Bioie Updates on Jul 07, 2024</title>
    <updated>2024-07-07T08:50:58.734Z</updated>
    <published>2024-07-07T08:50:58.519Z</published>
    <content type="html"><![CDATA[<h3><p>Research Overviews / LLMs in Biomedical IE</p>
</h3>
<ul>
<li><a href="http://dx.doi.org/10.1101/2024.04.24.24306315" rel="noopener noreferrer">Large language models in healthcare: A comprehensive benchmark</a> - a statistical and human evaluation of sixteen different LLMs applied to medical language tasks.</li>
</ul>

<ul>
<li><a href="https://doi.org/10.1186/s12911-024-02459-6" rel="noopener noreferrer">Assessing the research landscape and clinical utility of large language models: a scoping review</a> - a high-level review of LLM applications in medicine as of March 2024.</li>
</ul>

<ul>
<li><a href="https://doi.org/10.1016/s2589-7500(24)00061-x" rel="noopener noreferrer">Ethical and regulatory challenges of large language models in medicine</a> - a review of ethical issues arising from applications of LLMs in biomedicine.</li>
</ul>

<ul>
<li><a href="http://dx.doi.org/10.1145/3442188.3445922" rel="noopener noreferrer">On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜</a> - a frequently referenced but still relevant work concerning the roles, applications, and risks of language models.</li>
</ul>
<h3><p>Techniques and Models / Text Embeddings</p>
</h3>
<ul>
<li><a href="https://www.sciencedirect.com/science/article/pii/S1532046418301825" rel="noopener noreferrer">This paper from Hongfang Liu's group at Mayo Clinic</a> demonstrates how text embeddings trained on biomedical or clinical text can, but don't always, perform better on biomedical natural language processing tasks. That being said, pre-trained embeddings may be appropriate for your needs, especially as training domain-specific embeddings can be computationally intensive.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2024/07/07/"/>
    <summary>5 awesome projects updated on Jul 07, 2024</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2023/04/01/</id>
    <title>Awesome Bioie Updates on Apr 01, 2023</title>
    <updated>2023-04-01T01:41:15.460Z</updated>
    <published>2023-04-01T01:41:15.460Z</published>
    <content type="html"><![CDATA[<h3><p>Techniques and Models / GPT-2 models</p>
</h3>
<ul>
<li><a href="https://github.com/microsoft/BioGPT" rel="noopener noreferrer">BioGPT (⭐4.3k)</a> - <a href="https://doi.org/10.1093/bib/bbac409" rel="noopener noreferrer">paper</a> - A GPT-2 model pre-trained on 15 million PubMed abstracts, along with fine-tuned versions for several biomedical tasks.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2023/04/01/"/>
    <summary>1 awesome projects updated on Apr 01, 2023</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2022/09/14/</id>
    <title>Awesome Bioie Updates on Sep 14, 2022</title>
    <updated>2022-09-14T20:23:45.000Z</updated>
    <published>2022-09-14T20:23:45.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Biomedical Text Sources</p>
</h3>
<ul>
<li><a href="https://github.com/allenai/cord19" rel="noopener noreferrer">CORD-19 (⭐154)</a> - A corpus of scholarly manuscripts concerning COVID-19. Articles are primarily from PubMed Central and preprint servers, though the set also includes metadata on papers without full-text availability.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2022/09/14/"/>
    <summary>1 awesome projects updated on Sep 14, 2022</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2022/07/12/</id>
    <title>Awesome Bioie Updates on Jul 12, 2022</title>
    <updated>2022-07-12T20:20:42.000Z</updated>
    <published>2022-07-12T18:55:50.000Z</published>
    <content type="html"><![CDATA[<h3><p>Tools, Platforms, and Services / Annotation Tools</p>
</h3>
<ul>
<li><a href="https://brat.nlplab.org/" rel="noopener noreferrer">brat</a> - <a href="https://www.aclweb.org/anthology/E12-2021/" rel="noopener noreferrer">paper</a> - <a href="https://github.com/nlplab/brat" rel="noopener noreferrer">code (⭐1.8k)</a> - The brat rapid annotation tool. Supports producing text annotations visually, through the browser. Not subject specific; appropriate for many annotation projects. Visualization is based on that of the <a href="https://github.com/nlplab/stav/" rel="noopener noreferrer"><em>stav</em> tool</a>.</li>
</ul>

<ul>
<li><a href="https://ohnlp.github.io/MedTator/" rel="noopener noreferrer">MedTator</a> - <a href="https://academic.oup.com/bioinformatics/article-abstract/38/6/1776/6496915" rel="noopener noreferrer">paper</a> - <a href="https://github.com/OHNLP/MedTator" rel="noopener noreferrer">code (⭐47)</a> - An annotation tool designed to have minimal dependencies.</li>
</ul>
<h3><p>Datasets / Annotated Text Data</p>
</h3>
<ul>
<li><a href="https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/" rel="noopener noreferrer">BioRED</a> - <a href="https://arxiv.org/abs/2204.04263" rel="noopener noreferrer">paper</a> - a set of &gt;6.5K biomedical relation annotations, plus labels for novel findings.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2022/07/12/"/>
    <summary>3 awesome projects updated on Jul 12, 2022</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2022/02/25/</id>
    <title>Awesome Bioie Updates on Feb 25, 2022</title>
    <updated>2022-02-25T15:37:12.000Z</updated>
    <published>2022-02-25T15:37:12.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Annotated Text Data</p>
</h3>
<ul>
<li><a href="https://github.com/UCDenver-ccp/CRAFT" rel="noopener noreferrer">CRAFT (⭐69)</a> - <a href="https://link.springer.com/chapter/10.1007/978-94-024-0881-2_53" rel="noopener noreferrer">paper</a> - 67 full-text biomedical articles annotated in a variety of ways, including for concepts and coreferences. Now on version 5, including annotations linking concepts to the MONDO disease ontology.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2022/02/25/"/>
    <summary>1 awesome projects updated on Feb 25, 2022</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/04/07/</id>
    <title>Awesome Bioie Updates on Apr 07, 2021</title>
    <updated>2021-04-07T17:26:47.000Z</updated>
    <published>2021-04-07T17:26:47.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Other Datasets</p>
</h3>
<ul>
<li><a href="https://eicu-crd.mit.edu/" rel="noopener noreferrer">eICU Collaborative Research Database</a> - <a href="https://www.nature.com/articles/sdata2018178" rel="noopener noreferrer">paper</a> - a database of observations from more than 200 thousand intensive care unit admissions, with consistent structure. Requires registration, training course completion, and data use agreement.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/04/07/"/>
    <summary>1 awesome projects updated on Apr 07, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/09/16/</id>
    <title>Awesome Bioie Updates on Sep 16, 2020</title>
    <updated>2020-09-16T22:20:12.000Z</updated>
    <published>2020-09-16T22:20:12.000Z</published>
    <content type="html"><![CDATA[<h3><p>Techniques and Models / BERT models</p>
</h3>
<ul>
<li><a href="https://microsoft.github.io/BLURB/models.html" rel="noopener noreferrer">PubMedBERT</a> - <a href="https://arxiv.org/abs/2007.15779" rel="noopener noreferrer">paper</a> - A BERT model trained from scratch on PubMed, with versions trained on abstracts+full texts and on abstracts alone.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/09/16/"/>
    <summary>1 awesome projects updated on Sep 16, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/09/08/</id>
    <title>Awesome Bioie Updates on Sep 08, 2020</title>
    <updated>2020-09-08T21:26:58.000Z</updated>
    <published>2020-09-08T21:26:58.000Z</published>
    <content type="html"><![CDATA[<h3><p>Techniques and Models / BERT models</p>
</h3>
<ul>
<li><a href="https://github.com/ncbi-nlp/bluebert" rel="noopener noreferrer">BlueBERT (⭐541)</a> - <a href="https://arxiv.org/abs/1906.05474" rel="noopener noreferrer">paper</a> - A BERT model pre-trained on PubMed text and MIMIC-III notes.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/09/08/"/>
    <summary>1 awesome projects updated on Sep 08, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/07/27/</id>
    <title>Awesome Bioie Updates on Jul 27, 2020</title>
    <updated>2020-07-27T22:09:53.000Z</updated>
    <published>2020-07-27T22:09:53.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Other Datasets</p>
</h3>
<ul>
<li><a href="https://mimic-iv.mit.edu/" rel="noopener noreferrer">MIMIC-IV</a> - An update to MIMIC-III's multimodal patient data, now covering more recent years of admissions, plus a new data structure, emergency department records, and links to MIMIC-CXR images.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/07/27/"/>
    <summary>1 awesome projects updated on Jul 27, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/04/23/</id>
    <title>Awesome Bioie Updates on Apr 23, 2020</title>
    <updated>2020-04-23T20:29:06.000Z</updated>
    <published>2020-04-23T20:29:06.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Annotated Text Data</p>
</h3>
<ul>
<li><a href="https://rgai.inf.u-szeged.hu/node/105" rel="noopener noreferrer">BioScope</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586758/" rel="noopener noreferrer">paper</a> - a corpus of sentences from medical and biological documents, annotated for negation, speculation, and linguistic scope.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/04/23/"/>
    <summary>1 awesome projects updated on Apr 23, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/04/10/</id>
    <title>Awesome Bioie Updates on Apr 10, 2020</title>
    <updated>2020-04-10T20:13:52.000Z</updated>
    <published>2020-04-10T20:13:52.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Annotated Text Data</p>
</h3>
<ul>
<li><a href="https://www.nlm.nih.gov/databases/download/CQC.html" rel="noopener noreferrer">Clinical Questions Collection</a> - also known as CQC or the Iowa collection, these are several thousand questions posed by physicians during office visits along with the associated answers.</li>
</ul>

<ul>
<li><a href="http://2013.bionlp-st.org/" rel="noopener noreferrer">BioNLP ST 2013 datasets</a> - data from six shared tasks, though some may not be easily accessible; try the CG task set (BioNLP2013CG) for extensive entity and event annotations.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/04/10/"/>
    <summary>2 awesome projects updated on Apr 10, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/04/02/</id>
    <title>Awesome Bioie Updates on Apr 02, 2020</title>
    <updated>2020-04-02T18:00:34.000Z</updated>
    <published>2020-04-02T18:00:34.000Z</published>
    <content type="html"><![CDATA[<h3><p>Code Libraries / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://medium.com/@kormilitzin/med7-clinical-information-extraction-system-in-python-and-spacy-5e6f68ab1c68" rel="noopener noreferrer">Med7</a> - <a href="https://arxiv.org/abs/2003.01271" rel="noopener noreferrer">paper</a> - <a href="https://github.com/kormilitzin/med7" rel="noopener noreferrer">code (⭐199)</a> - a Python package and model (for use with spaCy) for doing NER with medication-related concepts.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/04/02/"/>
    <summary>1 awesome projects updated on Apr 02, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/03/09/</id>
    <title>Awesome Bioie Updates on Mar 09, 2020</title>
    <updated>2020-03-09T23:14:29.000Z</updated>
    <published>2020-03-09T23:14:29.000Z</published>
    <content type="html"><![CDATA[<h3><p>Techniques and Models / BERT models</p>
</h3>
<ul>
<li><a href="https://github.com/allenai/scibert" rel="noopener noreferrer">SciBERT (⭐1.5k)</a> - <a href="https://arxiv.org/abs/1903.10676" rel="noopener noreferrer">paper</a> - A BERT model trained on &gt;1M papers from the Semantic Scholar database.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/03/09/"/>
    <summary>1 awesome projects updated on Mar 09, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/01/07/</id>
    <title>Awesome Bioie Updates on Jan 07, 2020</title>
    <updated>2020-01-07T22:55:05.000Z</updated>
    <published>2020-01-07T22:55:05.000Z</published>
    <content type="html"><![CDATA[<h3><p>Data Models / Other Datasets</p>
</h3>
<ul>
<li><a href="https://github.com/OHDSI/CommonDataModel" rel="noopener noreferrer">OMOP Common Data Model (⭐854)</a> - a standard for observational healthcare data.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/01/07/"/>
    <summary>1 awesome projects updated on Jan 07, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/12/04/</id>
    <title>Awesome Bioie Updates on Dec 04, 2019</title>
    <updated>2019-12-04T00:14:30.000Z</updated>
    <published>2019-12-04T00:14:30.000Z</published>
    <content type="html"><![CDATA[<h3><p>Tools, Platforms, and Services / Repos for Specific Datasets</p>
</h3>
<ul>
<li><a href="https://github.com/nikolamilosevic86/TabInOut" rel="noopener noreferrer">TabInOut (⭐41)</a> - <a href="https://link.springer.com/article/10.1007/s10032-019-00317-0" rel="noopener noreferrer">paper</a> - a framework for IE from tables in the literature.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/12/04/"/>
    <summary>1 awesome projects updated on Dec 04, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/11/22/</id>
    <title>Awesome Bioie Updates on Nov 22, 2019</title>
    <updated>2019-11-22T21:07:53.000Z</updated>
    <published>2019-11-22T20:55:40.000Z</published>
    <content type="html"><![CDATA[<h3><p>Research Overviews / Pre-LLM Overviews</p>
</h3>
<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6250990/" rel="noopener noreferrer">Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text</a> - A 2017 review of natural language processing methods applied to information extraction in health records and social media text. An important note from this review: "One of the main challenges in the field is the availability of data that can be shared and which can be used by the community to push the development of methods based on comparable and reproducible studies".</li>
</ul>
<h3><p>Groups Active in the Field / Pre-LLM Overviews</p>
</h3>
<ul>
<li><a href="https://www.brown.edu/academics/medical/about-us/research/centers-institutes-and-programs/biomedical-informatics/" rel="noopener noreferrer">Brown Center for Biomedical Informatics</a> - Based at Brown University and directed by Dr. Neil Sarkar, whose research group works on topics in clinical NLP and IE.</li>
</ul>

<ul>
<li><a href="http://compbio.ucdenver.edu/Hunter_lab/CCP_website/index.html" rel="noopener noreferrer">Center for Computational Pharmacology NLP Group</a> - based at University of Colorado, Denver and led by Larry Hunter - <a href="https://github.com/UCDenver-ccp" rel="noopener noreferrer">see their GitHub repos here.</a></li>
</ul>

<ul>
<li>Groups at U.S. National Institutes of Health (NIH) / National Library of Medicine (NLM):<ul>
<li><a href="https://www.lhncbc.nlm.nih.gov/personnel/dina-demner-fushman" rel="noopener noreferrer">Demner-Fushman group at NLM</a></li>
<li><a href="https://www.ncbi.nlm.nih.gov/research/bionlp/" rel="noopener noreferrer">BioNLP group at NCBI</a> - Develops improvements to biomedical literature search and curation (e.g., through PubMed), led by Dr. Zhiyong Lu.</li>
</ul>
</li>
</ul>

<ul>
<li><a href="https://jensenlab.org/" rel="noopener noreferrer">JensenLab</a> - Based at the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen, Denmark.</li>
</ul>

<ul>
<li><a href="http://www.nactem.ac.uk/" rel="noopener noreferrer">National Centre for Text Mining (NaCTeM)</a> - Based at the University of Manchester and led by Prof. Sophia Ananiadou, NaCTeM is concerned with text mining in general but has a particular focus on biomedical applications.</li>
</ul>

<ul>
<li><a href="https://www.mayo.edu/research/departments-divisions/department-health-sciences-research/medical-informatics/projects" rel="noopener noreferrer">Mayo Clinic's clinical natural language processing program</a> - Several groups at Mayo Clinic have made major contributions to BioIE (for example, the Apache cTAKES platform) over the past 20 years.</li>
</ul>

<ul>
<li><a href="https://monarchinitiative.org/" rel="noopener noreferrer">Monarch Initiative</a> - A joint effort between groups at Oregon State University, Oregon Health &amp; Science University, Lawrence Berkeley National Lab, The Jackson Laboratory, and several others, seeking to "integrate biological information using semantics, and present it in a novel way, leveraging phenotypes to bridge the knowledge gap".</li>
</ul>

<ul>
<li><a href="https://turkunlp.org/" rel="noopener noreferrer">TurkuNLP</a> - Based at the University of Turku and concerned with NLP in general with a focus on BioNLP and clinical applications.</li>
</ul>

<ul>
<li><a href="https://sbmi.uth.edu/nlp/" rel="noopener noreferrer">UTHealth Houston Biomedical Natural Language Processing Lab</a> - Based in the University of Texas Health Science Center at Houston, School of Biomedical Informatics and led by Dr. Hua Xu.</li>
</ul>

<ul>
<li><a href="https://nlp.cs.vcu.edu/" rel="noopener noreferrer">VCU Natural Language Processing Lab</a> - Based at Virginia Commonwealth University and led by Dr. Bridget McInnes.</li>
</ul>

<ul>
<li><a href="http://zaklab.org" rel="noopener noreferrer">Zaklab</a> - Group led by Dr. Isaac Kohane at Harvard Medical School's Department of Biomedical Informatics (Dr. Kohane is also a steward of the n2c2 (formerly i2b2) datasets - see <a href="#datasets">Datasets</a> below).</li>
</ul>

<ul>
<li><a href="https://www.dbmi.columbia.edu/" rel="noopener noreferrer">Columbia University Department of Biomedical Informatics</a> - Led by Drs. George Hripcsak and Noémie Elhadad.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/11/22/"/>
    <summary>13 awesome projects updated on Nov 22, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/09/25/</id>
    <title>Awesome Bioie Updates on Sep 25, 2019</title>
    <updated>2019-09-25T21:20:31.000Z</updated>
    <published>2019-09-25T21:13:25.000Z</published>
    <content type="html"><![CDATA[<h3><p>Datasets / Other Datasets</p>
</h3>
<ul>
<li><a href="http://cohd.io" rel="noopener noreferrer">Columbia Open Health Data</a> - <a href="https://www.nature.com/articles/sdata2018273" rel="noopener noreferrer">paper</a> - A database of prevalence and co-occurrence frequencies of conditions, drugs, procedures, and patient demographics extracted from electronic health records. Does not include original record text.</li>
</ul>

<ul>
<li><a href="https://ctdbase.org/" rel="noopener noreferrer">Comparative Toxicogenomics Database</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323936/" rel="noopener noreferrer">paper</a> - A database of manually curated associations between chemicals, gene products, phenotypes, diseases, and environmental exposures. Useful for assembling ontologies of the related concepts, such as types of chemicals.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/09/25/"/>
    <summary>2 awesome projects updated on Sep 25, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/09/24/</id>
    <title>Awesome Bioie Updates on Sep 24, 2019</title>
    <updated>2019-09-24T18:06:31.000Z</updated>
    <published>2019-09-24T18:03:45.000Z</published>
    <content type="html"><![CDATA[<h3><p>Research Overviews / Pre-LLM Overviews</p>
</h3>
<ul>
<li><a href="https://www.ahajournals.org/doi/full/10.1161/CIRCRESAHA.117.310967" rel="noopener noreferrer">Biomedical Informatics on the Cloud: A Treasure Hunt for Advancing Cardiovascular Medicine</a> - An overview of how BioIE and bioinformatics workflows can be applied to questions in cardiovascular health and medicine research.</li>
</ul>

<ul>
<li><a href="https://www.sciencedirect.com/science/article/pii/S1532046417302563" rel="noopener noreferrer">Clinical information extraction applications: A literature review</a> - A review of clinical IE papers published as of September 2016. From Mayo Clinic group (see below).</li>
</ul>

<ul>
<li><a href="https://arxiv.org/abs/1702.03222" rel="noopener noreferrer">Mining Electronic Health Records (EHRs): A Survey</a> - A review of the methods and philosophy behind mining electronic health records, including using them for adverse event detection. See Table 2 for a list of relevant papers as of mid-2017.</li>
</ul>
<h3><p>Organizations / Pre-LLM Overviews</p>
</h3>
<ul>
<li><a href="https://www.amia.org/" rel="noopener noreferrer">AMIA</a> - Many—but certainly not all—individuals studying biomedical informatics are members of the American Medical Informatics Association. AMIA publishes a journal, JAMIA (see below).</li>
</ul>

<ul>
<li><a href="https://imia-medinfo.org/" rel="noopener noreferrer">IMIA</a> - The International Medical Informatics Association. Publishes the IMIA Yearbook of Medical Informatics.</li>
</ul>
<h3><p>Journals and Events / Journals</p>
</h3>
<ul>
<li><a href="https://academic.oup.com/database" rel="noopener noreferrer">Database</a> - Its subtitle is "The Journal of Biological Databases and Curation". Open access.</li>
</ul>

<ul>
<li><a href="https://academic.oup.com/jamia" rel="noopener noreferrer">JAMIA</a> - The Journal of the American Medical Informatics Association. Concerns "articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy".</li>
</ul>

<ul>
<li><a href="https://www.sciencedirect.com/journal/journal-of-biomedical-informatics" rel="noopener noreferrer">JBI</a> - The Journal of Biomedical Informatics. Not open access by default, though it does have an open-access "X" version.</li>
</ul>

<ul>
<li><a href="https://www.nature.com/sdata/" rel="noopener noreferrer">Scientific Data</a> - An open-access Springer Nature journal publishing "descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data".</li>
</ul>
<h3><p>Journals and Events / Conferences and Other Events</p>
</h3>
<ul>
<li><a href="http://acm-bcb.org/" rel="noopener noreferrer">ACM-BCB</a> - The ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Held annually since 2010.</li>
</ul>

<ul>
<li><a href="http://ieeebibm.org/BIBM2019/" rel="noopener noreferrer">BIBM</a> - The IEEE International Conference on Bioinformatics and Biomedicine.</li>
</ul>

<ul>
<li><a href="https://www.iscb.org/about-ismb" rel="noopener noreferrer">ISMB</a> - The International Conference on Intelligent Systems for Molecular Biology is an annual conference hosted by the International Society for Computational Biology since 1993. Much of its focus has concerned bioinformatics and computational biology without an explicit clinical focus, though it has included an increasing amount of text mining content (e.g., the 2019 meeting included a <a href="http://cosi.iscb.org/wiki/TextMining:Home" rel="noopener noreferrer">full-day special session on Text Mining for Biology and Healthcare</a>). The meeting is combined with that of the European Conference on Computational Biology (ECCB) on odd-numbered years.</li>
</ul>

<ul>
<li><a href="https://psb.stanford.edu/" rel="noopener noreferrer">PSB</a> - The Pacific Symposium on Biocomputing.</li>
</ul>
<h3><p>Tutorials / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0040020" rel="noopener noreferrer">Getting Started in Text Mining</a> - A brief introduction to bio-text mining from Cohen and Hunter. More than ten years old but still quite relevant. See also an <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1702322/" rel="noopener noreferrer">earlier paper by the same authors</a>.</li>
</ul>

<ul>
<li><a href="https://link.springer.com/book/10.1007/978-1-4939-0709-0" rel="noopener noreferrer">Biomedical Literature Mining</a> - A (non-free) volume of Methods in Molecular Biology from 2014. Chapters covers introductory principles in text mining, applications in the biological sciences, and potential for use in clinical or medical safety scenarios.</li>
</ul>

<ul>
<li><a href="https://www.bits.vib.be/training-list/111-bits/training/previous-trainings/183-text-mining" rel="noopener noreferrer">VIB text mining and curation training</a> - This training workshop happenened in 2013 but the slides are still online.</li>
</ul>
<h3><p>Code Libraries / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://github.com/kilicogluh/Bio-SCoRes" rel="noopener noreferrer">Bio-SCoRes (⭐9)</a> - <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0148538" rel="noopener noreferrer">paper</a> - A framework for biomedical coreference resolution.</li>
</ul>

<ul>
<li><a href="https://github.com/NLPatVCU/medaCy" rel="noopener noreferrer">medaCy (⭐424)</a> - A system for building predictive medical natural language processing models. Built on the <a href="https://spacy.io/" rel="noopener noreferrer">spaCy</a> framework.</li>
</ul>
<h3><p>Tools, Platforms, and Services / Repos for Specific Datasets</p>
</h3>
<ul>
<li><a href="https://ctakes.apache.org/" rel="noopener noreferrer">cTAKES</a> - <a href="https://academic.oup.com/jamia/article/17/5/507/830823" rel="noopener noreferrer">paper</a> - <a href="https://github.com/apache/ctakes" rel="noopener noreferrer">code (⭐39)</a> - A system for processing the text in electronic medical records. Widely used and open source.</li>
</ul>

<ul>
<li><a href="https://clamp.uth.edu/" rel="noopener noreferrer">CLAMP</a> - <a href="https://academic.oup.com/jamia/article/25/3/331/4657212" rel="noopener noreferrer">paper</a> - A natural language processing toolkit intended for use with the text in clinical reports. Check out their <a href="https://clamp.uth.edu/clampdemo.php" rel="noopener noreferrer">live demo</a> first to see what it does. Usable at no cost for academic research.</li>
</ul>

<ul>
<li><a href="https://github.com/DeepPhe/DeepPhe-Release" rel="noopener noreferrer">DeepPhe (⭐29)</a> - A system for processing documents describing cancer presentations. Based on cTAKES (see above).</li>
</ul>

<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/dnorm/" rel="noopener noreferrer">DNorm</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810844/" rel="noopener noreferrer">paper</a> - A method for disease normalization, i.e., linking mentions of disease names and acronyms to unique concept identifiers. Downloadable version includes the NCBI Disease Corpus and BC5CDR (see Annotated Text Data below).</li>
</ul>

<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/research/pubtator/" rel="noopener noreferrer">PubTator Central</a> - <a href="https://academic.oup.com/nar/article/47/W1/W587/5494727" rel="noopener noreferrer">paper</a> - A web platform that identifies five different types of biomedical concepts in PubMed articles and PubMed Central full texts. The full annotation sets are downloadable (see <a href="#annotated-text-data">Annotated Text Data</a> below).</li>
</ul>

<ul>
<li><a href="https://github.com/jakelever/pubrunner" rel="noopener noreferrer">Pubrunner (⭐41)</a> - A framework for running text mining tools on the newest set(s) of documents from PubMed.</li>
</ul>

<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/taggerone/" rel="noopener noreferrer">TaggerOne</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018376/" rel="noopener noreferrer">paper</a> - Performs concept normalization (see also DNorm above). Can be trained for specific concept types and can perform NER independent of other normalization functions.</li>
</ul>
<h3><p>Tools, Platforms, and Services / Annotation Tools</p>
</h3>
<ul>
<li><a href="https://github.com/weitechen/anafora" rel="noopener noreferrer">Anafora (⭐238)</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657237/" rel="noopener noreferrer">paper</a> - An annotation tool with adjudication and progress tracking features.</li>
</ul>
<h3><p>Techniques and Models / BERT models</p>
</h3>
<ul>
<li><a href="https://github.com/naver/biobert-pretrained" rel="noopener noreferrer">BioBERT (⭐643)</a> - <a href="https://arxiv.org/abs/1901.08746" rel="noopener noreferrer">paper</a> - <a href="https://github.com/dmis-lab/biobert" rel="noopener noreferrer">code (⭐1.9k)</a> - A PubMed and PubMed Central-trained version of the <a href="https://arxiv.org/abs/1810.04805" rel="noopener noreferrer">BERT language model</a>.</li>
</ul>
<h3><p>Techniques and Models / Other models</p>
</h3>
<ul>
<li><a href="https://github.com/zalandoresearch/flair/pull/519" rel="noopener noreferrer">Flair embeddings from PubMed (⭐14k)</a> - A language model available through the Flair framework and embedding method. Trained over a 5% sample of PubMed abstracts until 2015, or &gt; 1.2 million abstracts in total.</li>
</ul>
<h3><p>Techniques and Models / Text Embeddings</p>
</h3>
<ul>
<li><a href="http://bioasq.org/news/bioasq-releases-continuous-space-word-vectors-obtained-applying-word2vec-pubmed-abstracts" rel="noopener noreferrer">BioASQword2vec</a> - <a href="http://bioasq.lip6.fr/info/BioASQword2vec/" rel="noopener noreferrer">paper</a> - Qord embeddings derived from biomedical text (&gt;10 million PubMed abstracts) using the popular <a href="https://code.google.com/archive/p/word2vec/" rel="noopener noreferrer">word2vec</a> tool.</li>
</ul>

<ul>
<li><a href="https://figshare.com/articles/Improving_Biomedical_Word_Embeddings_with_Subword_Information_and_MeSH_Ontology/6882647" rel="noopener noreferrer">BioWordVec</a> - <a href="https://www.nature.com/articles/s41597-019-0055-0" rel="noopener noreferrer">paper</a> - <a href="https://github.com/ncbi-nlp/BioWordVec" rel="noopener noreferrer">code (⭐141)</a> - Word embeddings derived from biomedical text (&gt;27 million PubMed titles and abstracts), including subword embedding model based on MeSH.</li>
</ul>
<h3><p>Datasets / Biomedical Text Sources</p>
</h3>
<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/" rel="noopener noreferrer">PubMed Central Open Access Subset</a> - A set of PubMed Central articles usable under licenses other than traditional copyright, though the exact licenses vary by publication and source. Articles are available as PDF and XML.</li>
</ul>
<h3><p>Datasets / Annotated Text Data</p>
</h3>
<ul>
<li><a href="https://bionlp.nlm.nih.gov/tac2017adversereactions/" rel="noopener noreferrer">SPL-ADR-200db</a> - <a href="https://www.nature.com/articles/sdata20181" rel="noopener noreferrer">paper</a> - A pilot dataset containing standardised information, and annotations of occurence in text, about ~5,000 known adverse reactions for 200 FDA-approved drugs.</li>
</ul>

<ul>
<li><a href="https://sourceforge.net/projects/biocreative/files/" rel="noopener noreferrer">BioCreAtIvE 1</a> - <a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S1" rel="noopener noreferrer">paper</a> - 15,000 sentences (10,000 training and 5,000 test) annotated for protein and gene names. 1,000 full text biomedical research articles annotated with protein names and Gene Ontology terms.</li>
</ul>

<ul>
<li><a href="https://sourceforge.net/projects/biocreative/files/" rel="noopener noreferrer">BioCreAtIvE 2</a> - <a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s1" rel="noopener noreferrer">paper</a> - 15,000 sentences (10,000 training and 5,000 test, different from the first corpus) annotated for protein and gene names. 542 abstracts linked to EntrezGene identifiers. A variety of research articles annotated for features of protein–protein interactions.</li>
</ul>

<ul>
<li><a href="https://biocreative.bioinformatics.udel.edu/accounts/login/?next=/resources/corpora/biocreative-v-cdr-corpus/" rel="noopener noreferrer">BioCreAtIvE V CDR Task Corpus (BC5CDR)</a> - <a href="https://academic.oup.com/database/article/doi/10.1093/database/baw068/2630414" rel="noopener noreferrer">paper</a> - 1,500 articles (title and abstract) published in 2014 or later, annotated for 4,409 chemicals, 5,818 diseases and 3116 chemical–disease interactions. Requires registration.</li>
</ul>

<ul>
<li><a href="https://biocreative.bioinformatics.udel.edu/resources/corpora/chemprot-corpus-biocreative-vi/#chemprot-corpus-biocreative-vi:downloads" rel="noopener noreferrer">BioCreative VI CHEMPROT Corpus</a> - <a href="https://pdfs.semanticscholar.org/eed7/81f498b563df5a9e8a241c67d63dd1d92ad5.pdf" rel="noopener noreferrer">paper</a> - &gt;2,400 articles annotated with chemical-protein interactions of a variety of relation types. Requires registration.</li>
</ul>

<ul>
<li><a href="https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/" rel="noopener noreferrer">n2c2 (formerly i2b2) Data</a> - The Department of Biomedical Informatics (DBMI) at Harvard Medical School manages data for the National NLP Clinical Challenges and the Informatics for Integrating Biology and the Bedside challenges running since 2006. They require registration before access and use. Datasets include a variety of topics. See the <a href="https://portal.dbmi.hms.harvard.edu/data-challenges/" rel="noopener noreferrer">list of data challenges</a> for individual descriptions.</li>
</ul>

<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/" rel="noopener noreferrer">NCBI Disease Corpus</a> - <a href="https://www.sciencedirect.com/science/article/pii/S1532046413001974" rel="noopener noreferrer">paper</a> - A corpus of 793 biomedical abstracts annotated with names of diseases and related concepts from MeSH and <a href="https://omim.org/" rel="noopener noreferrer">OMIM</a>.</li>
</ul>

<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/research/pubtator/" rel="noopener noreferrer">PubTator Central datasets</a> - <a href="https://academic.oup.com/nar/article/47/W1/W587/5494727" rel="noopener noreferrer">paper</a> - Accessible through a RESTful API or FTP download. Includes annotations for &gt;29 million abstracts and ∼3 million full text documents.</li>
</ul>
<h3><p>Datasets / Protein-protein Interaction Annotated Corpora</p>
</h3>
<ul>
<li><a href="http://corpora.informatik.hu-berlin.de/corpora/brat2bioc/hprd50_bioc.xml.zip" rel="noopener noreferrer">HPRD50</a> - <a href="https://academic.oup.com/bioinformatics/article/23/3/365/236564" rel="noopener noreferrer">paper</a> - 50 scientific abstracts referenced by the Human Protein Reference Database, annotated for PPI.</li>
</ul>
<h3><p>Datasets / Other Datasets</p>
</h3>
<ul>
<li><a href="https://mimic.physionet.org/" rel="noopener noreferrer">MIMIC-III</a> - <a href="https://www.nature.com/articles/sdata201635" rel="noopener noreferrer">paper</a> - Deidentified health data from ~60,000 intensive care unit admissions. Requires completion of an online training course (CITI training) and acceptance of a data use agreement prior to use.</li>
</ul>

<ul>
<li><a href="https://physionet.org/content/mimic-cxr/2.0.0/" rel="noopener noreferrer">MIMIC-CXR</a> - The MIMIC Chest X-Ray database. Contains more than 377,000 radiographic images and accompanying free-text radiology reports. As with MIMIC-III, requires acceptance of a data use agreement.</li>
</ul>

<ul>
<li><a href="https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html" rel="noopener noreferrer">UMLS Knowledge Sources</a> - <a href="https://www.ncbi.nlm.nih.gov/books/NBK9676/" rel="noopener noreferrer">reference manual</a> - A large and comprehensive collection of biomedical terminology and identifiers, as well as accompanying tools and scripts. Depending on your purposes, the single file MRCONSO.RRF may be sufficient, as this file contains unique identifiers and names for all concepts in the UMLS Metathesaurus. See also the Ontologies and Controlled Vocabularies section below.</li>
</ul>
<h3><p>Ontologies and Controlled Vocabularies / Other Datasets</p>
</h3>
<ul>
<li><a href="http://www.disease-ontology.org/" rel="noopener noreferrer">Disease Ontology</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383880/" rel="noopener noreferrer">paper</a> - An ontology of human diseases. Has cross-links to MeSH, ICD, NCI Thesaurus, SNOMED, and OMIM. Public domain. Available on <a href="https://github.com/DiseaseOntology/HumanDiseaseOntology" rel="noopener noreferrer">GitHub (⭐329)</a> and on the <a href="http://www.obofoundry.org/ontology/doid.html" rel="noopener noreferrer">OBO Foundry</a>.</li>
</ul>

<ul>
<li><a href="https://lexsrv3.nlm.nih.gov/Specialist/Summary/lexicon.html" rel="noopener noreferrer">SPECIALIST Lexicon</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2247735/" rel="noopener noreferrer">paper</a> - A general English lexicon that includes many biomedical terms. Updated yearly since 1994 and still updated as of 2019. Part of UMLS but does not require UTS account to download.</li>
</ul>
<h3><p>Data Models / Other Datasets</p>
</h3>
<ul>
<li><a href="https://biolink.github.io/biolink-model/" rel="noopener noreferrer">Biolink</a> - <a href="https://github.com/biolink/biolink-model" rel="noopener noreferrer">code (⭐169)</a> - A data model of biological entities. Provided as a <a href="https://yaml.org/" rel="noopener noreferrer">YAML</a> file.</li>
</ul>

<ul>
<li><a href="http://wiki.biouml.org/index.php/BioUML" rel="noopener noreferrer">BioUML</a> - <a href="https://academic.oup.com/nar/article/47/W1/W225/5498754" rel="noopener noreferrer">paper</a> - An architecture for biomedical data analysis, integration, and visualization. Conceptually based on the visual modeling language <a href="https://www.uml.org/what-is-uml.htm" rel="noopener noreferrer">UML</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/09/24/"/>
    <summary>47 awesome projects updated on Sep 24, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/09/17/</id>
    <title>Awesome Bioie Updates on Sep 17, 2019</title>
    <updated>2019-09-17T18:02:41.000Z</updated>
    <published>2019-09-17T18:02:41.000Z</published>
    <content type="html"><![CDATA[<h3><p>Tools, Platforms, and Services / Repos for Specific Datasets</p>
</h3>
<ul>
<li><a href="https://github.com/CogStack/CogStack-SemEHR" rel="noopener noreferrer">SemEHR (⭐87)</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019046/" rel="noopener noreferrer">paper</a> - an IE infrastructure for electronic health records (EHR). Built on the <a href="https://github.com/CogStack" rel="noopener noreferrer">CogStack project</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/09/17/"/>
    <summary>1 awesome projects updated on Sep 17, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/09/04/</id>
    <title>Awesome Bioie Updates on Sep 04, 2019</title>
    <updated>2019-09-04T20:35:36.000Z</updated>
    <published>2019-09-04T20:35:36.000Z</published>
    <content type="html"><![CDATA[<h3><p>Code Libraries / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://github.com/ropensci/rentrez" rel="noopener noreferrer">rentrez (⭐193)</a> - R utilities for accessing NCBI resources, including PubMed.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/09/04/"/>
    <summary>1 awesome projects updated on Sep 04, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/09/03/</id>
    <title>Awesome Bioie Updates on Sep 03, 2019</title>
    <updated>2019-09-03T21:47:05.000Z</updated>
    <published>2019-09-03T21:47:05.000Z</published>
    <content type="html"><![CDATA[<h3><p>Code Libraries / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://biopython.org/" rel="noopener noreferrer">Biopython</a> - <a href="http://dx.doi.org/10.1093/bioinformatics/btp163" rel="noopener noreferrer">paper</a> - <a href="https://github.com/biopython/biopython" rel="noopener noreferrer">code (⭐4.2k)</a> - Python tools primarily intended for bioinformatics and computational molecular biology purposes, but also a convenient way to obtain data, including documents/abstracts from PubMed (see Chapter 9 of the documentation).</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/09/03/"/>
    <summary>1 awesome projects updated on Sep 03, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/08/30/</id>
    <title>Awesome Bioie Updates on Aug 30, 2019</title>
    <updated>2019-08-30T21:39:39.000Z</updated>
    <published>2019-08-30T21:39:39.000Z</published>
    <content type="html"><![CDATA[<h3><p>Tutorials / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://jensenlab.org/training/textmining/" rel="noopener noreferrer">JensenLab text mining exercises</a></li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/08/30/"/>
    <summary>1 awesome projects updated on Aug 30, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/08/28/</id>
    <title>Awesome Bioie Updates on Aug 28, 2019</title>
    <updated>2019-08-28T18:56:05.000Z</updated>
    <published>2019-08-28T18:56:05.000Z</published>
    <content type="html"><![CDATA[<h3><p>Techniques and Models / BERT models</p>
</h3>
<ul>
<li>ClinicalBERT - Two language models trained on clinical text have similar names. Both are BERT models trained on the text of clinical notes from the MIMIC-III dataset.<ul>
<li><a href="https://github.com/EmilyAlsentzer/clinicalBERT" rel="noopener noreferrer">Alsentzer et al Clinical BERT (⭐634)</a> - <a href="https://www.aclweb.org/anthology/W19-1909/" rel="noopener noreferrer">paper</a></li>
<li><a href="https://github.com/kexinhuang12345/clinicalBERT" rel="noopener noreferrer">Huang et al ClinicalBERT (⭐361)</a> - <a href="https://arxiv.org/abs/1904.05342" rel="noopener noreferrer">paper</a></li>
</ul>
</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/08/28/"/>
    <summary>1 awesome projects updated on Aug 28, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/08/26/</id>
    <title>Awesome Bioie Updates on Aug 26, 2019</title>
    <updated>2019-08-26T17:59:31.000Z</updated>
    <published>2019-08-26T17:59:31.000Z</published>
    <content type="html"><![CDATA[<h3><p>Journals and Events / Journals</p>
</h3>
<ul>
<li><a href="https://academic.oup.com/nar" rel="noopener noreferrer">NAR</a> - Nucleic Acids Research. Has a broad biomolecular focus but is particularly notable for its annual database issue.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/08/26/"/>
    <summary>1 awesome projects updated on Aug 26, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/08/23/</id>
    <title>Awesome Bioie Updates on Aug 23, 2019</title>
    <updated>2019-08-23T21:55:46.000Z</updated>
    <published>2019-08-23T17:59:35.000Z</published>
    <content type="html"><![CDATA[<h3><p>Groups Active in the Field / Pre-LLM Overviews</p>
</h3>
<ul>
<li><a href="http://www.childrenshospital.org/research/labs/natural-language-processing-laboratory" rel="noopener noreferrer">Boston Children's Hospital Natural Language Processing Laboratory</a> - Led by Dr. Guergana Savova, formerly at Mayo Clinic and the Apache cTAKES project.</li>
</ul>
<h3><p>Code Libraries / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://github.com/allenai/SciSpaCy" rel="noopener noreferrer">ScispaCy (⭐1.6k)</a> - <a href="https://arxiv.org/abs/1902.07669" rel="noopener noreferrer">paper</a> - A version of the <a href="https://spacy.io/" rel="noopener noreferrer">spaCy</a> framework for scientific and biomedical documents.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/08/23/"/>
    <summary>2 awesome projects updated on Aug 23, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/08/22/</id>
    <title>Awesome Bioie Updates on Aug 22, 2019</title>
    <updated>2019-08-22T23:54:25.000Z</updated>
    <published>2019-08-22T22:56:40.000Z</published>
    <content type="html"><![CDATA[<h3><p>Research Overviews / Pre-LLM Overviews</p>
</h3>
<ul>
<li><a href="https://www.sciencedirect.com/science/article/pii/S1532046417301909" rel="noopener noreferrer">Literature Based Discovery: Models, methods, and trends</a> - A review of Literature Based Discovery (LBD), or the philosophy that meaningful connections may be found between seemingly unrelated scientific literature.<ul>
<li>For some historical context on LBD, see papers by University of Chicago's Don Swanson and Neil Smalheiser, including <a href="https://www.jstor.org/stable/4307965" rel="noopener noreferrer"><em>Undiscovered Public Knowledge</em></a> (paywalled) and <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5771422/" rel="noopener noreferrer"><em>Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery</em></a>.</li>
</ul>
</li>
</ul>
<h3><p>Tutorials / Pre-LLM Guides, Lectures, and Courses</p>
</h3>
<ul>
<li><a href="https://www.coursera.org/learn/mining-medical-data" rel="noopener noreferrer">Coursera - Foundations of mining non-structured medical data</a> - About three hours worth of video lectures on working with medical data of various types and structures, including text and image data. Appears fairly high-level and intended for beginners.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/08/22/"/>
    <summary>2 awesome projects updated on Aug 22, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/08/21/</id>
    <title>Awesome Bioie Updates on Aug 21, 2019</title>
    <updated>2019-08-21T23:53:02.000Z</updated>
    <published>2019-08-21T20:41:10.000Z</published>
    <content type="html"><![CDATA[<h3><p>Code Libraries / Repos for Specific Datasets</p>
</h3>
<ul>
<li><a href="https://github.com/MIT-LCP/mimic-code" rel="noopener noreferrer">mimic-code (⭐2.4k)</a> - Code associated with the MIMIC-III dataset (see below). Includes some helpful <a href="https://github.com/MIT-LCP/mimic-code/tree/master/tutorials" rel="noopener noreferrer">tutorials (⭐2.4k)</a>.</li>
</ul>
<h3><p>Datasets / Biomedical Text Sources</p>
</h3>
<ul>
<li><a href="http://davis.wpi.edu/xmdv/datasets/ohsumed.html" rel="noopener noreferrer">OHSUMED</a> - <a href="https://dl.acm.org/citation.cfm?id=188557" rel="noopener noreferrer">paper</a> - 348,566 MEDLINE entries (title and sometimes abstract) from between 1987 and 1991. Includes MeSH labels. Primarily of historical significance.</li>
</ul>
<h3><p>Datasets / Annotated Text Data</p>
</h3>
<ul>
<li><a href="https://wsd.nlm.nih.gov/" rel="noopener noreferrer">Word Sense Disambiguation (WSD)</a> - <a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-223" rel="noopener noreferrer">paper</a> - 203 ambiguous words and 37,888 automatically extracted instances of their use in biomedical research publications. Requires UTS account.</li>
</ul>
<h3><p>Datasets / Protein-protein Interaction Annotated Corpora</p>
</h3>
<ul>
<li><a href="http://corpora.informatik.hu-berlin.de/corpora/brat2bioc/aimed_bioc.xml.zip" rel="noopener noreferrer">AIMed</a> - <a href="https://www.ncbi.nlm.nih.gov/pubmed/15811782" rel="noopener noreferrer">paper</a> - 225 MEDLINE abstracts annotated for PPI.</li>
</ul>

<ul>
<li><a href="http://bioc.sourceforge.net/BioC-BioGRID.html" rel="noopener noreferrer">BioC-BioGRID</a> - <a href="https://academic.oup.com/database/article/doi/10.1093/database/baw147/2884890" rel="noopener noreferrer">paper</a> - 120 full text articles annotated for PPI and genetic interactions. Used in the BioCreative V BioC task.</li>
</ul>

<ul>
<li><a href="http://corpora.informatik.hu-berlin.de/corpora/brat2bioc/bioinfer_bioc.xml.zip" rel="noopener noreferrer">BioInfer</a> - <a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-50" rel="noopener noreferrer">paper</a> - 1,100 sentences from biomedical research abstracts annotated for relationships (including PPI), named entities, and syntactic dependencies. <a href="http://mars.cs.utu.fi/BioInfer/" rel="noopener noreferrer">Additional information and download links are here.</a></li>
</ul>

<ul>
<li><a href="http://corpora.informatik.hu-berlin.de/corpora/brat2bioc/iepa_bioc.xml.zip" rel="noopener noreferrer">IEPA</a> - <a href="http://psb.stanford.edu/psb-online/proceedings/psb02/abstracts/p326.html" rel="noopener noreferrer">paper</a> - 486 sentences from biomedical research abstracts annotated for pairs of co-occurring chemicals, including proteins (hence, PPI annotations).</li>
</ul>

<ul>
<li><a href="http://corpora.informatik.hu-berlin.de/corpora/brat2bioc/lll_bioc.xml.zip" rel="noopener noreferrer">LLL</a> - <a href="https://www.semanticscholar.org/paper/Learning-Language-in-Logic-Genic-Interaction-Nedellec/0863a9d71955341b7e1a6a6877d44d4f0bb22671" rel="noopener noreferrer">paper</a> - 77 sentences from research articles about the bacterium <em>Bacillus subtilis</em>, annotated for protein–gene interactions (so, fairly close to PPI annotations). <a href="http://genome.jouy.inra.fr/texte/LLLchallenge/#task1" rel="noopener noreferrer">Additional information is here.</a></li>
</ul>
<h3><p>Ontologies and Controlled Vocabularies / Other Datasets</p>
</h3>
<ul>
<li><a href="https://www.nlm.nih.gov/research/umls/rxnorm/index.html" rel="noopener noreferrer">RxNorm</a> - <a href="https://academic.oup.com/jamia/article/18/4/441/734170" rel="noopener noreferrer">paper</a> - Normalized names for clinical drugs and drug packs, with combined ingredients, strengths, and form, and assigned types from the Semantic Network (see below). Released monthly.</li>
</ul>

<ul>
<li><a href="https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html" rel="noopener noreferrer">UMLS Metathesaurus</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308795/" rel="noopener noreferrer">paper</a> - Mappings between &gt;3.8 million concepts, 14 million concept names, and &gt;200 sources of biomedical vocabulary and identifiers. It's big. It may help to prepare a subset of the Metathesaurus with the <a href="https://www.nlm.nih.gov/research/umls/implementation_resources/metamorphosys/help.html" rel="noopener noreferrer">MetamorphoSys installation tool</a> but we're still talking about ~30 Gb of disk space required for the 2019 release. <a href="https://www.ncbi.nlm.nih.gov/books/NBK9684/" rel="noopener noreferrer">See the manual here</a>. Requires UTS account.</li>
</ul>

<ul>
<li><a href="https://semanticnetwork.nlm.nih.gov/" rel="noopener noreferrer">UMLS Semantic Network</a> - <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447396/" rel="noopener noreferrer">paper</a> - Lists of 133 semantic types and 54 semantic relationships covering biomedical concepts and vocabulary. Is the Metathesaurus too complex for your needs? Try this. Does not require UTS account to download.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/08/21/"/>
    <summary>11 awesome projects updated on Aug 21, 2019</summary>
  </entry>
</feed>