Sun Kim

Research Scientist
Computational Biology Branch
National Center for Biotechnology Information (NCBI)
National Institutes of Health (NIH)
Bethesda, MD 20894, USA

Phone: +1-301-496-2484

NCBI page: click here

Research Interests
  • Biomedical text mining (semantics, event extraction)
  • Machine learning (Bayesian approaches, kernel methods, neural networks, evolutionary algorithms)
  • Bioinformatics (protein-protein interaction, drug-drug interaction)
Academic Services
  • Reviewer: Bioinformatics, Briefings in Bioinformatics, Database, Journal of the American Medical Informatics Association, PLOS One, PLOS Computational Biology, BMC Bioinformatics, Journal of Biomedical Informatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, IEEE Transactions on Knowledge and Data Engineering, Advances in Bioinformatics, Applied Mathematics and Computation, Journal of Information Science, Algorithms, AMIA 2014-2018, NAACL 2019, ACL 2018, EMNLP 2018, PSB 2010, IJCNLP 2013
  • PC Member: ICMLA 2010-2018, AAAI Fall Symposium 2012, CSBio 2013-2015/2018, BioDM 2015, ACM-BCB 2017
  • OC/LOC Member: BioCreative III, BioCreative 2012, BioCreative V, BioCreative 2016, BioCreative VI
Journal Publications
  • Overview of the BioCreative VI Precision Medicine Track: Mining Protein Interactions and Mutations for Precision Medicine, R. I. Doğan, S. Kim, A. Chatr-aryamontri, C.-H. Wei, D. C. Comeau, R. Antunes, S. Matos, Q. Chen, A. Elangovan, N. C. Panyam, K. Verspoor, H. Liu, Y. Wang, Z. Liu, B. Altınel, Z. M. Hüsünbeyi, A. Özgür, A. Fergadis, C.-K. Wang, H.-J. Dai, T. Tran, R. Kavuluru, L. Luo, A. Steppi, J. Zhang, J. Qu, and Z. Lu, Database, 2019, bay147, 2019. [PDF]
  • Discovering Themes in Biomedical Literature Using a Projection-Based Algorithm, L. Yeganova, S. Kim, G. Balasanov, and W. J. Wilbur, BMC Bioinformatics, 19, 269, 2018. [PDF]
  • ezTag: Tagging Biomedical Concepts via Interactive Learning, D. Kwon*, S. Kim*, C.-H. Wei, R. Leaman, and Z. Lu, Nucleic Acids Research, 46, W523-W529, 2018. [PDF]
  • PubMed Phrases, an Open Set of Coherent Phrases for Searching Biomedical Literature, S. Kim, L. Yeganova, D. C. Comeau, W. J. Wilbur, and Z. Lu, Scientific Data, 5, 180104, 2018. [PDF]
  • Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents, S. Kim, N. Fiorini, W. J. Wilbur, and Z. Lu, Journal of Biomedical Informatics, 75, pp. 122-127, 2017. [PDF] (original version in arXiv)
  • The BioC-BioGRID Corpus: Full Text Articles Annotated for Curation of Protein-Protein and Genetic Interactions, R. I. Doğan*, S. Kim*, A. Chatr-aryamontri*, C. S. Chang, R. Oughtred, J. Rust, W. J. Wilbur, D. C. Comeau, K. Dolinski, and M. Tyers, Database, 2017, baw147, 2017. [PDF]
  • BioCreative V BioC Track Overview: Collaborative Biocurator Assistant Task for BioGRID, S. Kim*, R. I. Doğan*, A. Chatr-aryamontri, C. S. Chang, R. Oughtred, J. Rust, R. Batista-Navarro, J. Carter, S. Ananiadou, S. Matos, A. Santos, D. Campos, J. L. Oliveira, O. Singh, J. Jonnagaddala, H.-J. Dai, E. C. Su, Y.-C. Chang, Y.-C. Su, C.-H. Chu, C. C. Chen, W.-L. Hsu, Y. Peng, C. Arighi, C. H. Wu, K. Vijay-Shanker, F. Aydın, Z. M. Hüsünbeyi, A. Özgür, S.-Y. Shin, D. Kwon, K. Dolinski, M. Tyers, W. J. Wilbur, and D. C. Comeau, Database, 2016, baw121, 2016. [PDF]
  • BioC Viewer: A Web-Based Tool for Displaying and Merging Annotations in BioC, S.-Y. Shin*, S. Kim*, W. J. Wilbur, and D. Kwon, Database, 2016, baw106, 2016. [PDF]
  • Meshable: Searching PubMed Abstracts by Utilizing MeSH and MeSH-Derived Topical Terms, S. Kim, L. Yeganova, and W. J. Wilbur, Bioinformatics, 32(19), pp. 3044-3046, 2016. [PDF]
  • Extracting Drug-Drug Interactions from Literature Using a Rich Feature-Based Linear Kernel Approach, S. Kim, H. Liu, L. Yeganova, and W. J. Wilbur, Journal of Biomedical Informatics, 55, pp. 23-30, 2015. [PDF]
  • Identifying Named Entities from PubMed for Enriching Semantic Categories, S. Kim, Z. Lu, and W. J. Wilbur, BMC Bioinformatics, 16, 57, 2015. [PDF]
  • Retro: Concept Based Clustering of Biomedical Topical Sets, L. Yeganova, W. Kim, S. Kim, and W. J. Wilbur, Bioinformatics, 30(22), pp. 3240-3248, 2014. [PDF]
  • Assisting Manual Literature Curation for Protein-Protein Interactions Using BioQRator, D. Kwon*, S. Kim*, S.-Y. Shin, A. Chatr-aryamontri, and W. J. Wilbur, Database, 2014, bau067, 2014. [PDF]
  • Author Name Disambiguation for PubMed, W. Liu, R. I. Doğan, S. Kim, D. C. Comeau, W. Kim, L. Yeganova, Z. Lu, and W. J. Wilbur, Journal of the Association for Information Science and Technology (JASIST), 65(4), pp. 765-781, 2014. [PDF]
  • Prioritizing PubMed Articles for the Comparative Toxicogenomic Database Utilizing Semantic Information, S. Kim, W. Kim, C.-H. Wei, Z. Lu, and W. J. Wilbur, Database, 2012, bas042, 2012. [PDF]
  • Thematic Clustering of Text Documents Using an EM-based Approach, S. Kim and W. J. Wilbur, Journal of Biomedical Semantics, 3(Suppl 3), S6, 2012. [PDF]
  • PIE the search: Searching PubMed Literature for Protein Interaction Information, S. Kim, D. Kwon, S.-Y. Shin, and W. J. Wilbur, Bioinformatics, 28(4), pp. 597-598, 2012. [PDF]
  • Classifying Protein-Protein Interaction Articles Using Word and Syntactic Features, S. Kim and W. J. Wilbur, BMC Bioinformatics, 12(Suppl 8), S9, 2011. [PDF]
  • The Protein-Protein Interaction Tasks of BioCreative III: Classification/Ranking of Articles and Linking Bio-Ontology Concepts to Full Text, M. Krallinger, M. Vazquez, F. Leitner, D. Salgado, A. Chatr-Aryamontri, A. Winter, L. Perfetto, L. Briganti, L. Licata, M. Iannuccelli, L. Castagnoli, G. Cesareni, M. Tyers, G. Schneider, F. Rinaldi, R. Leaman, G. Gonzalez, S. Matos, S. Kim, W. J. Wilbur, L. Rocha, H. Shatkay, A. V. Tendulkar, S. Agarwal, F. Liu, X. Wang, R. Rak, K. Noto, C. Elkan, Z. Lu, R. I. Doğan, J.-F. Fontaine, M. A. Andrade-Navarro, and A. Valencia, BMC Bioinformatics, 12(Suppl 8), S3, 2011. [PDF]
  • Ensembled Support Vector Machines for Human Papillomavirus Risk Type Prediction from Protein Secondary Structures, S. Kim, J. Kim, and B.-T. Zhang, Computers in Biology and Medicine, 39(2), pp. 187-193, 2009. [PDF]
  • Introducing Meta-Services for Biomedical Information Extraction, F. Leitner, M. Krallinger, C. Rodriguez-Penagos, J. Hakenberg, C. Plake, C.-J. Kuo, C.-N. Hsu, R. T. Tasi, H.-C. Hung, W. W. lau, C. A. Johnson, R. Satre, K. Yoshida, Y. H. Chen, S. Kim, S.-Y. Shin, B.-T. Zhang, W. A. Baumgartner, Jr., L. Hunter, B. Haddow, M. Matthews, X. Wang, P. Ruch, F. Ehrler, A. Ozgur, G. Erkan, D. R. Radev, M. Krauthammer, T. Luong, R. Hoffmann, C. Sander, and A. Valencia, Genome Biology, 9(Suppl 2), S6, 2008. [PDF]
  • PIE: an online prediction system for protein-protein interactions from text, S. Kim*, S.-Y. Shin*, I.-H. Lee, S.-J. Kim, R. Sriram, and B.-T. Zhang, Nucleic Acids Research, 36, W411-W415, 2008. [PDF]
  • Human Papillomavirus Risk Type Classification from Protein Sequences Using Support Vector Machines, S. Kim and B.-T. Zhang, Lecture Notes in Computer Science (EVOBIO 2006), 3907, pp. 57-66, 2006. [PDF]
  • Multi-objective Evolutionary Probe Design Based on Thermodynamic Criteria for HPV Detection, I.-H. Lee, S. Kim, and B.-T. Zhang, Lecture Notes in Artificial Intelligence (PRICAI 2004), 3157, pp. 742-750, 2004. [PDF]
  • Genetic Mining of HTML Structures for Effective Web-Document Retrieval, S. Kim and B.-T. Zhang, Applied Intelligence, 18(3), pp. 243-256, 2003. [PDF]
    corresponding author
    * equally contributed
Conference Publications (Full Papers)
  • Combining Rich Features and Deep Learning for Finding Similar Sentences in Electronic Medical Records, Q. Chen, J. Du, S. Kim, W. J. Wilbur, and Z. Lu, BioCreative/OHNLP Challenge, 2018. [PDF]
  • Efficient Rule-based Approaches for Tagging Named Entities and Relations in Clinical Text, D. Kim, S.-Y. Shin, H.-W. Lim, and S. Kim, BioCreative/OHNLP Challenge, 2018. [PDF]
  • A Fast Deep Learning Model for Textual Relevance in Biomedical Information Retrieval, S. Mohan, N. Fiorini, S. Kim, and Z. Lu, The Web Conference (WWW 2018), pp. 77-86, 2018. [PDF]
  • Overview of the BioCreative VI Precision Medicine Track, R. I. Doğan, S. Kim, A. Chatr-aryamontri, C.-H. Wei, D. C. Comeau, and Z. Lu, Sixth BioCreative Challenge Workshop, pp. 83-87, 2017. [PDF]
  • The BioCreative VI Precision Medicine Track Corpus, R. I. Doğan, A. Chatr-aryamontri, C.-H. Wei, C. S. Chang, R. Oughtred, J. Rust, L. Boucher, S. Kim, D. C. Comeau, Z. Lu, K. Dolinski, and M. Tyers Sixth BioCreative Challenge Workshop, pp. 88-93, 2017. [PDF]
  • Deep Learning for Biomedical Information Retrieval: Learning Textual Relevance from Click Logs , S. Mohan, N. Fiorini, S. Kim, and Z. Lu, ACL 2017 Workshop on Biomedical Natural Language Processing, pp. 222-231, 2017. [PDF]
  • BioCreative VI Precision Medicine Track: Creating a Training Corpus for Mining Protein-Protein Interactions Affected by Mutations, R. I. Doğan, A. Chatr-aryamontri, S. Kim, C.-H. Wei, Y. Peng, D. C. Comeau, and Z. Lu, ACL 2017 Workshop on Biomedical Natural Language Processing, pp. 171-175, 2017. [PDF]
  • PubTermVariants: Biomedical Term Variants and Their Use for PubMed Search, L. Yeganova, W. Kim, S. Kim, R. I. Doğan, W. Liu, D. C. Comeau, Z. Lu, and W. J. Wilbur, ACL 2016 Workshop on Biomedical Natural Language Processing, pp. 141-145, 2016. [PDF]
  • The DDINCBI Corpus - Towards a Larger Resource for Drug-Drug Interactions in PubMed, L. Yeganova, S. Kim, G. Balasanov, K. Bennett, H. Liu, and W. J. Wilbur, LREC 2016 Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability, pp. 38-41, 2016. [PDF]
  • Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis, S. Kim, L. Yeganova, and W. J. Wilbur, Conference on Empirical Methods on Natural Language Processing (EMNLP 2015), pp. 805-810, 2015. [PDF]
  • Overview of BioCreative V BioC Track, S. Kim, R. I. Doğan, A. Chatr-aryamontri, M. Tyers, W. J. Wilbur, and D. C. Comeau, Fifth BioCreative Challenge Workshop, pp. 1-9, 2015. [PDF]
  • Identifying Genetic Interaction Evidence Passages in Biomedical Literature, R. I. Doğan, S. Kim, A. Chatr-aryamontri, D. C. Comeau, and W. J. Wilbur, Fifth BioCreative Challenge Workshop, pp. 36-41, 2015. [PDF]
  • BioQRator: A Web-Based Interactive Biomedical Literature Curating System, D. Kwon, S. Kim, S.-Y. Shin, and W. J. Wilbur, Fourth BioCreative Challenge Workshop, pp. 241-246, 2013. [PDF]
  • Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers, S. Kim, W. Kim, D. C. Comeau, and W. J. Wilbur, NAACL 2012 Workshop on Biomedical Natural Language Processing, pp. 185-192, 2012. [PDF]
  • System Description for the BioCreative 2012 Triage Task, S. Kim, W. Kim, C.-H. Wei, Z. Lu, and W. J. Wilbur, BioCreative Workshop 2012, pp. 20-24, 2012. [PDF]
  • An EM Clustering Algorithm which Produces a Dual Representation, S. Kim and W. J. Wilbur, International Conference on Machine Learning and Applications (ICMLA 2011), pp. 90-95, 2011. [PDF]
  • Improving Protein-Protein Interaction Article Classification Performance by Utilizing Grammatical Relations, S. Kim and W. J. Wilbur, Third BioCreative Challenge Workshop, pp. 83-88, 2010. [PDF]
  • Evolutionary Hypernetwork Classifiers for Protein-Protein Interaction Sentence Filtering, J. Bootkrajang, S. Kim, and B.-T. Zhang, Genetic and Evolutionary Computation Conference (GECCO 2009), pp. 185-192, 2009. [PDF]
  • Evolving Hypernetwork Models of Binary Time Series for Forecasting Price Movements on Stock Markets, E. Bautu, S. Kim, A. Bautu, H. Luchian, and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2009), pp. 166-173, 2009. [PDF]
  • Finding Cancer-Related Gene Combinations Using a Molecular Evolutionary Algorithm, C.-H. Park, S.-J. Kim, S. Kim, D.-Y. Cho, and B.-T. Zhang, IEEE International Symposium on Bioinformatics and Biomedical Engineering (BIBE 2007), pp. 158-163, 2007. [PDF]
  • Evolving Hypernetwork Classifiers for microRNA Expression Profile Analysis, S. Kim*, S.-J. Kim*, and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2007), pp. 313-319, 2007. [PDF]
  • Use of Evolutionary Hypernetworks for Mining Prostate Cancer Data, C.-H. Park, S.-J. Kim, S. Kim, D.-Y. Cho, and B.-T. Zhang, International Symposium on Advanced Intelligent Systems, pp. 702-706, 2007.
  • Identifying Protein-Protein Interaction Sentences Using Boosting and Kernel Methods, S.-Y. Shin*, S. Kim*, J.-H. Eom, B.-T. Zhang, and R. Sriram, Second BioCreative Challenge Workshop, pp. 187-192, 2007. [PDF]
  • Text Classifiers Evolved on a Simulated DNA Computer, S. Kim, M.-O. Heo, and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2006), pp. 2646-2652, 2006. [PDF]
  • A Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature, J.-H. Eom, S. Kim, S.-H. Kim, and B.-T. Zhang, Lecture Notes in Bioinformatics (KDLL 2006), 3886, pp. 42-52, 2006. [PDF]
  • Evolutionary Learning of Web-Document Structure for Information Retrieval, S. Kim and B.-T. Zhang, IEEE Congress on Evolutionary Computation (CEC 2001), pp. 1253-1260, 2001. [PDF]
  • SCAI Experiments on TREC-9, Y.-H. Kim, S. Kim, J.-H. Eom, and B.-T. Zhang, Text Retrieval Conference (TREC-9), pp. 392-399, 2000.
  • Web-Document Retrieval by Genetic Learning of Importance Factors for HTML Tags, S. Kim and B.-T. Zhang, PRICAI 2000 Workshop on Text and Web Mining, pp. 13-23, 2000.
  • SCAI TREC-8 Experiments, D.-H. Shin, Y.-H. Kim, S. Kim, J.-H. Eom, H.-J. Shin, and B.-T. Zhang, Text Retrieval Conference (TREC-8), pp. 511-518, 1999.
    corresponding author
    * equally contributed
Conference Publications (Abstracts)
  • Sentence Similarity Measures Revisited: Ranking Sentences in PubMed Documents, Q. Chen*, S. Kim*, W. J. Wilbur, and Z. Lu, ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 531-532, 2018. [PDF]
  • Towards seamless format conversion between BioC and PubAnnotation for sharing PubMed/PubMed Central documents and annotations, S. Kim, D. C. Comeau, R. I. Doğan, and Z. Lu, Biomedical Linked Annotation Hackathon 3, 2017.
  • Building a cost-effective gold standard set for enriching PubAnnotation, D. Kwon, C.-H. Wei, S. Kim, R. Leaman, and Z. Lu, Biomedical Linked Annotation Hackathon 3, 2017.
  • BioCconvert: A Conversion Tool Between BioC and PubAnnotation, D. C. Comeau, R. I. Doğan, S. Kim, C.-H. Wei, W. J. Wilbur, and Z. Lu, International Conference on Biological Ontology & BioCreative, 2016.
  • BioCreative V: A Community-wide Effort for the Evaluation of Text Mining and its Relevance for Biomedical Curation, C. N. Arighi, K. B. Cohen, D. C. Comeau, R. I. Doğan, J. Fluck, L. Hirschman, S. Kim, M. Krallinger, Z. Lu, F. Rinaldi, A. Valencia, T. Wiegers, W. J. Wilbur, and C. H. Wu, International Biocuration Conference, 2016.
  • Biocuration and Text Mining: Lessons Learned from Developing an Interoperable Collaborative Biocurator Assistant Tool for BioGRID, R. I. Doğan, S. Kim, A. Chatr-aryamontri, W. J. Wilbur, and D. C. Comeau, International Biocuration Conference, 2016.
  • BioCreative V: A New Challenge in Text Mining for Biocuration, C. N. Arighi, A. Chatr-aryamontri, K. B. Cohen, D. C. Comeau, J. Fluck, R. I. Doğan, L. Hirschman, S. Kim, M. Krallinger, F. Leitner, Z. Lu, J. Oyarzabal, O. Rabal, F. Rinaldi, C. O. Tudor, A. Valencia, T. Wiegers, W. J. Wilbur, and C. H. Wu, International Biocuration Conference, 2015.
  • Analyzing MEDLINE Topics with a Projection Method, L. Yeganova, S. Kim, and W. J. Wilbur, NIPS 2014 Workshop on Modern Machine Learning and Natural Language Processing, 2014.
  • Extracting Drug-Drug Interactions from Literature Using a Rich Feature-Based Linear Kernel Approach, S. Kim, H. Liu, L. Yeganova, and W. J. Wilbur, AMIA 2014 Annual Symposium, 2014.
    Book Chapters
    • Natural Language Processing, Y.-T. Kim et al., 2001. (in Korean)
      Publications in Korean
      • 클러스터 기반 가중치 부여를 통해 희소성을 제거한 협력적 여과, 신원진, 김선, 장병탁, 한국컴퓨터종합학술대회 논문집, 36(1), pp. 451-456, 2009.
      • 하이퍼네트워크 모델을 이용한 텍스트 문장 분류, 작가멧, 김선, 장병탁, 대한전자공학회 추계학술대회 논문집, 31(2), pp. 987-988, 2008.
      • 기계번역문장 품질 평가를 위한 하이퍼네트워크 기반 언어 모델링, 고영길, 장하영, 김선, 장병탁, 정보통신분야학회 합동학술대회 논문집, pp. 277-280, 2008.
      • 마이크로어레이 기반 miRNA 모듈 분석을 위한 하이퍼망 분류 기법, 김선, 김수진, 장병탁, 정보과학회논문지 : 소프트웨어 및 응용, 35(6), pp. 347-356, 2008.
      • DNA Chip Informatics 기술, 장병탁, 황규백, 정제균, 김선, 엄재홍, 바이오웹진, 2003.
      • 다수의 목표 유전자에서 진화연산을 이용한 Oligonucleotide Probe 선택, 신기루, 김선, 장병탁, 한국정보과학회 봄 학술발표 논문집, 30(1), pp. 455-457, 2003.
      • Oligonucleotide Microarray의 Probe 선택을 위한 진화적인 접근 방법, 김선, 장병탁, 한국데이터마이닝학회 추계학술대회 논문집, pp. 140-147, 2002.
      • 유전 알고리즘을 이용한 DNA Microarray의 Probe 선택, 김선, 장병탁, 한국퍼지 및 지능시스템학회 춘계 학술발표 논문집, pp. 183-186, 2002.
      • 진화연산을 이용한 웹 문서의 특성 학습, 김선, 장병탁, 한국퍼지 및 지능시스템학회 춘계 학술발표 논문집, pp. 43-46, 2000.
        Datasets
        Tools
        • ezTag: a web-based annotation tool that allows curators to perform annotation and provide training data interactively.
        • NCBITextLib: a software library for building a large-scale data infrastructure for text mining.
        • Meshable: a web service for searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.
        • BioC Viewer: a web interface for displaying and merging annotations in BioC.
        • BioQRator: a general-purpose user interface for annotating bio-entities and relationships.
        • PIE the search: a web service to find protein-protein interaction informative articles from PubMed.
        • PIE: a configurable web service to extract protein-protein interaction sentences from biomedical literature.
          Publications in Google Scholar


          Revised: Feb 1, 2019