MembersRSS | Atom

Results and resources

Multilinguality

EuropeanaConnect facilitates multilingual access to Europeana content by providing multilingual query translation as well as multilingual mapping of controlled vocabularies for browsing and searching within the Europeana Semantic Layer. Reaching this goal will allow users to submit queries in their native language and still be able to retrieve documents in other languages and make content multilingually and multiculturally usable. Objects within Europeana derive from many sources across all European countries. Through the provision of multilingual access capabilities all content can be accessed by all Europeana users equally, regardless of their native language or the available native language resources.

EuropeanaConnect supports a set of 10 languages: English, French, German, Italian, Polish, Spanish (core languages), Dutch, Hungarian, Swedish and Portuguese.

A summary of multilingual access strategies can be found here:

  • Report on Multilingual Access Strategies to Digital Libraries (24 October 2011)

    Presentations

    Europeana Language Resources Repository

    To offer multilingual access capabilities it is essential to use resources which can help processing natural language in machine-readable form. The Europeana Language Resources Repository collects and aggregates open-source and licensed language resources to be used via download or direct APIs in Europeana components. These language resources include:

    • Stop word lists: lists of 'non-content' words, such as articles, conjunctions, prepositions, which can be ignored for specific tasks of processing
    • Language identifiers: a tool that is necessary whenever the language of the query is not explicitly known
    • Morphological analyzers: software modules that perform tokenization and lemmatization, but also decompounding, multi-word detection and part of speech tagging
    • Named entity recognizers: software modules that identify named entities, such as person names, geographic names, organisation names, etc.
    • Translation dictionaries: mappings between terms in different languages

    Documents and Presentations

    To see what resources have been selected for use in Europeana go to the Resource Register :
    http://europeanalabs.eu/wiki/LinguisticResourceRegister

    Open Source language resources aggregated within the community can be found here:
    http://europeanalabs.eu/wiki/WP2LanguageResources


    Tools for multilingual mapping

    Multilingual mapping of controlled vocabularies has the purpose of providing multilingual search and browsing capabilities in the Europeana portal. To merge the heterogeneous and multilingual Europeana resources, this task seeks to relate value vocabularies (thesauri, person authority lists, etc) that are relevant in the Cultural Heritage domain. The approach here is to find alignments between the local vocabularies used to annotate the original data and more general pivot vocabularies.

    Documents and Presentations


    Translation services

    A proper software module named MultiLingual Information Access (MLIA) has been designed and included in the Language Resource Repository. The goal of the module consists in providing query translation functionalities to the Europeana portal in order to support Cross-Language Information Retrieval.
    The MLIA module implements a query translation strategy by exploiting and coordinating the Language Resources present in the Europeana Language Resources Repository. The translation approach implemented by the MLIA module consists in the sequence of 3 different activities, the query analysis, the translation of the query terms by means of bilingual dictionaries and the disambiguation of the translation candidates retrieved by the dictionaries.

    Documents and Presentations

    User Studies & Evaluation

    The objective of this task is to give an overview over a number of projects within Europe that have dealt - expressly or not - with multilingual access issues to their content representing their results. The main focus is on user needs and desired features for multilingual access learned in part from a thorough screening of associated Europeana user studies and results from other projects as well as from a survey targeted specifically toward multilingual access issues within Europeana. The outcome of these studies is a description of user requirements and suggested usage scenarios for three multilingual access features (query translation, result representation & multilingual subject mapping for document enrichment), which serves as a starting point for a discussion on multilingual access options in Europeana.

    Testing and evaluation of the translation services and modules is fundamental for ensuring that the developed components comply with the user requirements.

    Documents and Presentations

    The workshop website with all the presentations is available at: http://www.europeanaconnect.eu/MLIA4DL09Workshop.php

    Language focused Log file Analysis

    Understanding and evaluating user behavior is crucial for system design, meeting user needs and expectations. Through the analysis of log file the Europeana users and especially the actual usage of language features was studied. The Clickstream Logger gathers extended information on user behavior with a special focus on language sensitive aspects.

    Documents and Presentations