skip to main page content CETIS: Click here to return to the homepage
the centre for educational technology interoperability standards

skip over the long navigation bar
Press centre

Inside Cetis
what is Cetis?
Contact us
Cetis staff
Jobs at CETIS


XML: Click here to get the news as an RSS XML file XML: Click here to get the news as an Atom XML file iCAL: Click here to get the events as an iCalendar file

what are learning technology standards?
who's involved?
who's doing what?

CETIS Groups
what are cetis groups?
what difference can they make?
Assessment SIG
Educational Content SIG
Enterprise SIG
Metadata SIG
Life Long Learning Group
Portfolio SIG
Accessibility Group
Pedagogy Forum
Developer's forum

Accessibility (310)
Assessment (74)
Content (283)
Metadata (195)
Pedagogy (34)
Profile (138)
Tools (197)
For Developers (569)
For Educators (344)
For Managers (339)
For Members (584)
SCORM (118)
AICC (18)
CEN (34)
DCMI (36)
EML (47)
IEEE (79)
IMS (302)
ISO (21)
OAI (24)
OKI (20)
W3C (37)

print this article (opens in new window) view printer-friendly version (opens in new window)

Standards bodies face call for more language codes

The standard language taxonomies used widely today are coming under increasing criticism from a number of industries. Existing schemes support only a fraction of the languages and dialects in use around the world, which poses a problem for many applications, including learning technology.

The identification of languages is critical in the area of learning technology standards. We need to be able to identify, using metadata, the language used by a resource. There is
also the language of the intended end-user to consider (for example, a resource may be designed to teach an English-speaking person
French). We also need to be able to create metadata in various languages, and to be able to label that metadata so that repositories, search engines and readers can identify the language used.

Language definitions also form an integral part of markup languages such as XML, used to define the majority of learning technology standards.

The most common methods of denoting languages are the two- and three-letter codes defined in the ISO 639 standard, which provides "codes for the representation of names of languages" (ISO 639, ISO/FDIS 639-1, ISO 639-2). However, this only provides a set of language names for between 200 and 400 languages (depending on which version of the standard you are using): there are now draft proposals that call for adoption of schemes that identify 7,000 or even 70,000 languages and dialects.

One proposal calls for codes supporting representation of the language along at least five axes: "geog (geographical specification), script (writing system), temp (temporal specification), socli (sociolinguistic specification), and style (stylistic specification)."

Using a more complex scheme of this kind, you could identify a text as being written in "20th Century Colloquial Irish English" for example. This kind of flexibility could be very useful in the context of archives of historical documents - where there is considerable demand for a better language classification - but may be seen as overly complex for many applications.

For a detailed examination of the language identification issue, take a look at "Language Identifiers in the Markup Context" at XML Cover Pages.

Related items:


No responses have been posted

Creative Commons License This work is licensed under a Creative Commons License.

syndication |publisher's statement |contact us |privacy policy

 go to start of page content