Bio Tagger-GM: A Gene/Protein Name Recognition System

March 2009
Journal of the American Medical Informatics Association;Mar/Apr2009, Vol. 16 Issue 2, p247
Academic Journal
Objectives: Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination. Design: BioTagger-GM consists of four main components: (1) dictionary lookup—gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning—machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing—heuristic rules are used to correct recognition errors, and (4) system combination—a voting scheme is used to combine recognition results from multiple systems. Measurements: The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure. Results: BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus. Conclusion: The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system.


Related Articles

  • Clinical Data Mining -- An Approach for Identification of Refractive Errors. Shekar, D. V. Chandra; Srinivas, V.Sesha // International MultiConference of Engineers & Computer Scientists;2008, p551 

    Information Technology offer health care industry significant potential to improve productivity and quality of patient care. The area of Data mining in health care is growing rapidly because of strong need for analyzing the vast amount of clinical data bases stored in hospitals. Hospitals are...

  • Design and Implementation of School Hospital Information Analysis and Mining System. Wang xuesong; Guo Qiang; Li Shanshan; Cao Rongfei // Applied Mechanics & Materials;2014, Issue 513-517, p498 

    Hospital information analysis is a very important way to enhance the medical service. In this paper, the hospital information analysis system of university is implemented for revealing hidden information. First, the processing model of medical information is studied. Then, the multi-layer...

  • A cold, hard look at the paperless medical practice.  // Contemporary OB/GYN;May2005, Vol. 50 Issue 5, p15 

    Presents findings of studies on the impact of computerized physician order entry (CPOE) systems and clinical decision support systems (CDSS) on clinical care in the U.S., published in the 2005 issue of the "Journal of the American Medical Association." Situations in which a CPOE system was used...

  • How many physician practices are adopting EMRs?  // Contemporary OB/GYN;Dec2005, Vol. 50 Issue 12, p13 

    The article reports that only 12.5 percent of the 3,000 medical groups with few full-time physicians have adopted electronic medical record systems (EMR) according to the results of a survey conducted by the Medical Group Management Association in Colorado. 78 percent of practices still use...

  • Figuring out the answers. Kilpatrick, Claire // Nursing Management - UK;Mar2006, Vol. 12 Issue 10, p10 

    The article describes the need for a robust surveillance system to control infection. To support the study of epidemiology, infection surveillance systems was used to achieve three overall objectives: to make sound judgments about infections, to take appropriate action and to identify gaps in...

  • ORIGINAL ARTICLE Effects of Computerized Guidelines for Managing Heart Disease in Primary Care A Randomized, Controlled Trial. Tierney, William M.; Overhage, J. Marc; Murray, Michael D.; Harris, Lisa E.; Xiao-Hua Zhou; Eckert, George J.; Smith, Faye E.; Nienaber, Nancy; McDonald, Clement J.; Wolinsky, Fredric D. // JGIM: Journal of General Internal Medicine;Dec2003, Vol. 18 Issue 12, p967 

    Electronic information systems have been proposed as one means to reduce medical errors of commission (doing the wrong thing) and omission (not providing indicated care). To assess the effects of computer-based cardiac care suggestions. A randomized, controlled trial targeting primary care...

  • EIQ, Inc.  // Pharmaceutical Technology;Aug2010, Vol. 34 Issue 8, p86 

    The article evaluates several computer software for pharmaceutical operations including Change Management, Electronic Signature and Record, and Complaint Handling Management Software from EtQ Inc.

  • Exploiting Temporal Relations in Mining Hepatitis Data. Tu-Bao Ho; Canh-Hao Nguyen; Kawasaki, Saori; Si-Quang Le; Takabayashi, Katsuhiko // New Generation Computing;2007, Vol. 25 Issue 3, p247 

    Various data mining methods have been developed last few years for hepatitis study using a large temporal and relational database given to the research community. In this work we introduce a novel temporal abstraction method to this study by detecting and exploiting temporal patterns and...

  • Data mining.  // Nature Biotechnology;Oct2000 Supplement 1, Vol. 18, p35 

    The article presents a reprint of the article "Data mining" which appeared in the 2000 issue, volume 18 of "Nature Biotechnology." It focuses on the contribution of data mining in the healthcare industry by providing companies a vast array of software products and services to clients that...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics