MultilingualBIO: Multilingual Biomedical Text Processing

Workshop @ LREC 2018 Miyazaki (Japan), 8th May 2018

Call for papers

The workshop goal is to promote synergies between the clinical/biomedical language processing communities and the Machine Translation community. This should be achieved by focussing on research topics of common interest for these two communities, in particular the extraction, representation, interoperability and integration of specialised multilingual resources and the development of novel tools to overcome serious limitations of coverage of non-English lexical sources and components. We expect that the outcome of this workshop will accelerate the development of innovative ways that empower improved access, analysis and integration of healthcare-relevant information from heterogeneous content types, including electronic health records, medical literature, clinical trials, medical agency reports or social media. This event can be viewed as a unique opportunity to promote the development of multilingual biomedical text processing infrastructures, exploring the use of machine translation methodologies for tasks like medical named entity recognition or clinical concept indexing in languages other than English.

This workshop represents a complementary effort to previous workshops focused on building -mostly monolingual- resources for biomedical processing, such as: BioTxtM 2014 and 2016, and the Clinical Natural Language Processing Workshop at COLING 2016 as well as international efforts like the Mantra project concerned with multilingual terminologies and named entities  and the Khresmoi effort focusing on the development of multilingual search systems.

Motivation and Topics of Interest

Clinical and biomedical text mining are popular tasks in NLP research that have reached considerable progress over the past years. However, the main achievements in processing of biomedical text are almost restricted to English, with most other languages lagging behind in this respect. In particular, one aspect that has not received enough attention so far is the multilingual aspect of Biomedical text processing. On the other hand, automatic translation strategies of English medical terms have resulted in promising attempts to increase the coverage of non-English terminologies (Afzal et al. 2015, Neveol et al. 2016, van Mulligen et al. 2016); machine translation and multilingual approaches have also been explored for entity recognition efforts in the biomedical domain (Rebholz-Schuhmann et al. 2013); finally, multilingual ontologies are valuable resources for disease surveillance systems (Collier et al 2006).

The need to translate biomedical texts occurs in many situations. To put an example, cross-border mobility of people may require specific translation of medical records and discharge reports. In addition, internationalization of the pharmaceutical industry demands that technical specifications and package leaflets of medicines be translated to the language of the customer in several countries; not to mention medical patent translation, which is a specific area by itself. Other common examples of translation of biomedical text are laboratory reports, clinical trials or scientific publications. The use of Machine Translation in scenarios such as the above is desirable, both to reduce costs and to ensure terminological consistency across languages. Machine Translation, one of the most challenging tasks in Natural Language Processing, has experienced a big quality leap in recent years, due to the application of deep learning techniques and the growing availability of bilingual in-domain corpora. Yet, high quality translation of specialized domains remains a challenge and is very much dependant on the availability of good quality parallel corpus in that domain.

Summary of the call

In this workshop, we will introduce attendees to new attempts of automatic processing of biomedical text in languages other than English and in multilingual scenarios. The goal of this workshop is to promote synergies between the BioNLP community and the Machine Translation community, by addressing issues of common interest, such as the following (but not restricted to):

  • Production of multilingual corpora in the biomedical domain.
  • Creation of multilingual biomedical glossaries and ontologies.
  • Extension of the coverage of the Unified Medical Language System (UMLS) to languages other than English.
  • Building of MT systems adapted to the biomedical domain.
  • Dealing with localisation issues, including adaptation to local varieties of international languages (UK vs USA English, Spanish from Spain and Latin America or USA, etc.).

Intended Audience

The workshop aims at bringing together researchers and developers from academia and industry. In particular, perspectives from the following user groups are welcome:

  • Application developers, from both research and industry
  • Researchers from the NLP and MT communities
  • Ph.D students interested or working in MT and / or biomedical language processing