Submissions/The Language Commons Wiki


This is an open submission for Wikimania 2010.

Title of the submission

The Language Commons Wiki

Type of submission (workshop, tutorial, panel, presentation)


Author of the submission

Ed Bice (
Steven Bird (University of Melbourne)
Kurt Bollacker (The Long Now Foundation)
Gary Simons (SIL International)
Laura Welcher (The Long Now Foundation)

E-mail address or username (if username, please confirm email address in Special:Preferences)


Country of origin

The Language Commons is an international consortium. Offices are in San Francisco, USA.

Affiliation, if any (organization, company etc.)

The Language Commons

Personal homepage or blog
Abstract (please use no less than 300 words to describe your proposal)

This session will present a prototype of The Language Commons Wiki. This wiki consists of one page for each of the ˜6,900 human languages, along with index pages for 3,900 language families and subgroups, and support for taxonomic navigation. Each page integrates structured data from all of the major initiatives that publicly index, categorize, and describe the world's languages, including traditional archives, digital libraries, national language surveys, and ongoing linguistic research. Each page captures user-contributed material, raw material and expert commentary that supports the further curation of the structured data sources. In this way, language speakers, educators, researchers, and the general public will be able to access the most comprehensive, accurate, and accountable information available for any language, in a single location.

In our prototype, language metadata and taxonomy are populated with the 10,800 language entities in Freebase, a collaborative database and Semantic Web project. It also incorporates material from the 4,000 language pages in the English Wikipedia, and simultaneously addresses the problem that Wikipedia language pages are highly inconsistent in content, structure, and coverage. Linguistic documents, linguistic data (including glossaries, dictionaries, parallel corpora) and many other resources are populated from numerous sources including: the Internet Archive; the Rosetta Project Collection with materials on 2,500 languages; maps of language usage from the LL-Map Project; and, 85,000 archived language resources from the Open Language Archives Community (OLAC).

This project relies on a novel collaborative dynamic: highly structured content aggregated from respected sources, coupled with user-supplied material and commentary, where each plays off the other. We seek guidance from the MediaWiki community on the most appropriate design, collaborative model, and editorial processes. We also seek advice on how best to manage the interplay with English Wikipedia.

Track (People and Community/Knowledge and Collaboration/Infrastructure)

Knowledge and Collaboration

Will you attend Wikimania if your submission is not accepted?


Slides or further information (optional)

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. bevcorwin
  2. --Ravidreams 07:10, 20 May 2010 (UTC)[reply]
  3. El Ágora 04:45, 21 May 2010 (UTC)[reply]
  4. Jerzy Celichowski
  5. Kocio 23:37, 3 June 2010 (UTC)[reply]
  6. Amir E. Aharoni 06:37, 22 June 2010 (UTC)[reply]
  7. Jon Harald Søby 18:27, 26 June 2010 (UTC)[reply]
  8. Natbrown 12:59, 29 June 2010 (UTC)[reply]