Orthoepikon: a toolbox to build reading-aloud assistants

What is Orthoepikon

Orthoepikon is a set of open-source tools to turn XML pronunciation dictionaries and rule files into fast finite-state processors that make simple pronunciation annotations to plain, HTML or RTF texts so that they can be read aloud correctly by learners of a language. Here's an example where annotations showing the Valencian pronunciation of a Catalan text are shown as an HTML file:

Orthoepikon output; HTML generated from Catalan text

In the example, red annotations on top of letters or groups remind the reader of their correct pronunciation; in some places, just highlighting the letter is enough to remind the reader that it has a special sound.

The name Orthoepikon is neo-Greek inspired on orthoepic, "of or pertaining to orthoepy, or correct pronunciation", orthoepy being "the art of uttering words correctly" or "a correct pronunciation of words"


Orthoepikon sources (code, rather extensive sample data for the Valencian variety of the Catalan language, and a small example French) have been re-released on July 15, 2009 through the Orthoepikon Gna! page. You can always follow development at the project's SVN server.


Orthoepikon has only been tested so far on Linux systems. Precursors of Orthoepikon have successfully been compiled for Windows using Cygwin (see http://sao.dlsi.ua.es).

Orthoepikon requires that package lttoolbox (a component of the Apertium machine translation toolbox), version 1.0.2, is installed in your system (lttoolbox has its own requirements). The package may be downloaded from http://www.sourceforge.net/projects/apertium/.


While we prepare a more formal documentation, you may read the current draft of an overview of Orthoepikon.

A reading-aloud assistant using Orthoepikon technology

An example of a reading-aloud assistant for the Valencian variety of the Catalan language, built for the Acadèmia Valenciana de la Llengua using Orthoepikon technology (actually, a precursor to Orthoepikon) may be found at SAÓ.

Developers sought

Orthoepikon is seeking developers for the following tasks:

Those interested should write to Mikel L. Forcada or Carlos Pérez Sancho


The early development of Orthoepikon was funded by the Acadèmia Valenciana de la Llengua. Orthoepikon is inspired on an idea by Jordi Colomina (fellow of the Acadèmia Valenciana de la Llengua and professor of Catalan Philology at the Universitat d'Alacant) and has been developed at the Transducens research group (Department de Llenguatges i Sistemes Informàtics, Universitat d'Alacant) by Carlos Pérez-Sancho, Sergio Ortiz-Rojas, Carmen Arronis, and Mikel L. Forcada. Early help by Alicia Garrido-Alenda is acknowledged.