Words matter--especially in a bilingual environment where there are political sensitivities. As an impartial resource for Canadian parliamentarians (and others) that produces and collects many documents, the Library of Parliament maintains a controlled vocabulary internally to facilitate access. In this article, the author outlines the Library of Parliament Subject Taxonomy and discusses two challenges related to its development: language neutrality and the interlinguistic equivalence of concepts between English and French.
The Library of Parliament's Parliamentary Information and Research Service (PIRS) provides research and analysis on any topic related to public policy to senators and members of the House of Commons, as well as to parliamentary committees and associations. Year in and year out, the Library analysts produce thousands (1) of background documents that are stored in an electronic document management system. The Library is also the repository for the House of Commons sessional papers and the speeches of members of Cabinet. (2) To facilitate the retrieval of this mass of documents by subject and also to optimize the visibility by search engines of documents published online, the Library maintains a controlled vocabulary internally: the Library of Parliament Subject Taxonomy. Controlling language, in a bilingual environment where political sensitivities can be exacerbated, proves to be a more complex task than it may seem at first. First, this article outlines the Library of Parliament Subject Taxonomy. Second, it discusses two challenges related to its development: language neutrality and the interlinguistic equivalence of concepts between English and French.
Description of the taxonomy
All information systems seek to increase precision(the number of relevant documents retrieved as a proportion of the number of relevant or irrelevant documents returned by the system) and recall (the number of relevant documents retrieved as a proportion of the number of relevant documents in the collection). However, two natural language phenomena, synonymy and polysemy, affect precision and recall. First, synonymy, where the same concept is represented by different words or expressions (for example, the expressions "cellular phone", "mobile phone" and "cell phone" represent the same concept), affects recall: users must think of all possible synonyms to ensure that they can find all the documents on that concept. The opposite is polysemy, where the same word represents different concepts (for example, in French, "droit" is both "the body of legal rules in force in a society" and "permission to do something under rules recognized in a community"), affects precision: without language control, users may have to filter a large number of results with concepts that do not interest them but are represented by the same words. (3)
The purpose of controlled vocabularies is, first, to control synonymy by ensuring that, in any given information system, a concept is represented by only one label (one word or one expression). Synonyms become keys to access...