Medicine, Drugs, and Vaccines

Electronic Health Records

Electronic health records (EHRs) are the digital format of patients' health information, either in the context of a hospitalization or inside a national or regional welfare system. Integrated use of EHRs and NGS data is a key process to develop precision medicine. In fact, direct use of NGS data in the clinical practice is unfeasible, but integration of EHRs into pipelines able to extract relevant information from analyzed NGS data would allow to hugely improve clinical outcomes. Examples of clinical data infrastructures exist and have been used in the past for different purposes such as investigation of inherited causes of common diseases or use of existing genomic and clinical data to identify genes related to phenotype and environment. More recently the US Precision Medicine Initiative (PMI) begun planning to integrate genomic and clinical data of a million of individuals to unveil environmental influences on disease treatments, and other initiatives promoted by hospital networks begun the exploitation of shared assets such as electronic records and genomic data to implement a genomic based medicine.

EHRs are mainly designed for clinical (patient care) and billing/insurance purposes, and are not usually designed with science and research in mind. For this reason, EHR-based research poses big challenges about bias and standardization. To this respect, EHRs are similar to any other large biological dataset suffering of integration weaknesses. As previously mentioned, well organized data or even data organized ab-initio for research are much easier to process with computer based method such as machine learning. With wide and standardized adoption of EHRs, millions of clinical data points from thousands of individuals become potentially available: these data are the subject of computational phenotyping and the construction of organized phenome databases. The first issue to solve is therefore the adoption of an internal, logical and common infrastructure to implement standards, common annotations, interchangeable identification numbers. Precise and widely accepted standards on a smaller number of records are to be preferred over bigger datasets with incomplete or non-canonical annotations. The richness of clinical information stored in EHRs lays in its usability and interoperability, thus the quality and shape of data in the EHRs has a direct impact on research.

The case of the Electronic Medical Records and Genomics (eMERGE) Network is a clear example of project spanning a long period, focusing the first efforts and deliverables on building the logic and standards to organize data and the institutions collecting and sharing them. The primary goal of the eMERGE is to combine biorepositories with EHR systems aimed at genomic discovery and implementation of genomic in the medical practice. As a network of hospitals and research institutions, eMERGE had to ensure a correct and usable data infrastructure, then begun to integrate and analyse data (clinical, phenotypic, genomic) and subsequently delivered integrated results. Beside many disease-specific research papers published, two of the most noticeable outcomes impacting research as a whole are the Phenotype Knowledge Base (PheKB) that can be used to mine and discover phenotypes from electronic medical records and the catalog of phenome-wide association scans (PheWAS) which gathers disease/phenotype to gene associations obtained by the coupling of genetic data and EHR data.

Storage and manipulation of clinical data from millions of patients will become a challenge in the same way as it is happening with sequence data. Beside efficient mining and summarizing methods for better and quicker characterization and phenotyping, issues of privacy and security should also be addressed. Better reproducibility, secure data sharing between collaborating researchers or patient communities and enforced privacy of EHRs and trusted de-individualized access to EHRs, are highly desirable goals that can be met with different technologies. Considered the many issues that data sharing and privacy pose and technical approaches to address them, we like to point at the fact that, like any data-intensive discipline, biomedical research is now being considered a subject of choice for emerging informatics technologies such as block chain. The block chain, better known as the Bitcoin underlying technology, is based on distributed ledgers and it is a public, secure and decentralized database of ordered events or records, called blocks, that are time-stamped and linked to the previous block. The public and anonymized transactions are the foundation for both privacy and traceability, and this logic can be well adapted to the requirements for privacy, traceability and trusted sharing imposed by clinical trials and personal EHRs.