Omorfi–Open source morphological analyzer of Finnish

Authors: Flammie Pirinen
Software version:
 20101026 (draft only, please send feedback to authors)
Documentation license:
 Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 Unported
SVN Revision:292
SVN Date:2010-08-31

Omorfi

Omorfi is open source morphological analyser of Finnish language licenced under GPLv3. It uses lexical data from research institute of languages in finland Nykysuomen sanalista v1 and open source lexical data collecting project Joukahainen.

This documentation is intended for end-users of the morphology. The chapter 1 is an introduction that everyone may want to read. Chapter 2 describes the implemented morphological analyser and its contents. Chapter 3 describes applications of omorfi and references to their use.

This documentation is made from doc/ directory of omorfi source distribution. This document talks about 20101026, you may find more up-to-date versions via Omorfi website.

Introduction

Omorfi is an open source implementation of automatic morphological analysis of Finnish language, implemented in finite-state technology using traditional finite-state models and open source tools. Main development tools of omorfi are Helsinki Finite-State Technology tools from Helsinki finite-state technology. For lexical data source omorfi uses Nykysuomen sanalista v1, GNU LGPL word list by Research institute of languages in Finland, and Joukahainen, a GNU GPL word list developed by open source community.

Omorfi was started in University of Helsinki as master's thesis project. Afterwards it has been continued as author's side project and used for research purposes in the development of FST tools. Omorfi is also used by some external projects for basic morphological analysis or full-form dictionary.

This document describes what the morphological analyser contains, what its analyses mean, and why it has been built like this. This documentation is intended to be long-term background material for omorfi. The basic usage and details which may vary a lot is covered in Omorfi basic usage guide, which contains the specific commands, software versions and filenames relevant for current versions of the software. For a developer's reference, you may skim through this documentation and then read Omorfi developer's guide to find out the coding conventions, contribution guidelines, and the places of code to modify.

Notational conventions

Throughout the document, hyperlinks, such as VISK (refering to descriptive grammar of Finnish language) are given. For academic citations [VISK10] notation is used. These citations are listed as endnotes of the document.

This monospaced style is used for input and output of command line tools and morphological analysis strings that are meant for automatic parsing. An offset monospaced section is used for longer sets of examples. For most examples, an informal parsing formula, such as following, is provided:

word+tags*

The purpose of this is to give very short overview of what kind of output you may expect when parsing data relavant to the description. The notation used is free form variant of regular expressions, where an asterisk * is used to signify repetition of previous structure zero or more times and a plus + once or more; previous example would've meant one or more words and possibly tags following them. A question mark ? is used to represent optionality of previous structure and parentheses () are used to group multiple structures together for asterisks or question marks.

Note

In this document, a note admonition like this one is used along with normative references, such as ones to official Finnish grammar VISK.

Warning

A warning admonition, such as this one, is used to note something unexpected or deviating from practices users of previous systems may have done differently.

The possible morphological readings relevant to the section are enumerated with a list of examples accompanied by standard format table. The examples show all the values of analyses using forms of example word, the examples are then explained in the table:

example     READINGS...NAME=VALUE1

example     READINGS...NAME=VALUE2
VARIABLE NAME Explanation Examples
VARIABLE VALUE1 interpretation of NAME=VALUE1 example in Finnish (translation or gloss)
VARIABLE VALUE2 interpretation of NAME=VALUE2 example will analyse with NAME=VALUE described
... ... ...

Note that the examples have been obtained by running the analyser and copied here verbatim. The examples may have been cleaned up and rewrapped manually.

The basis for morphological analyses

The Omorfi is not intended to create new linguistic descriptions. We merely aim to capture as much of contemporary linguistic knowledge about morphology of word-forms in the analyser. The primary source for linguistic knowledge of Finnish language is Iso suomen kielioppi, from now on refered to as VISK or the official grammar of Finnish language. Everything implemented in Omorfi is described in VISK or other scholarly resource on morphology of Finnish language, and the intent of this document is to list all parts of omorfi along with specific references to the grammar or the relevant scientific source describing the implemented feature.

The main emphasis on morphological analysis in Omorfi means, that it has been built to analyse word-forms in isolation, based on the information that is present in the word-form. While Omorfi is and will be used for other purposes than morphological analysis, such as ones listed in applications, the core morphological analyses will be retained as unchanged as possible. The necessary additions may be done by extending the description.

Morphological software conventions

There are also conventions used in past and contemporary morphological analysers, that have had an effect in some of the design decisions of Omorfi. The basic tagging conventions of omorfi do not retain direct compatibility with past systems, but it is designed so that conversion downwards can be typically supported. For this reason the analyses of omorfi are occasionally lengthier than needed, since they aim to contain reasonable superset of a features that has been used in other systems. This has been done to facilitate the usage and comparison of Omorfi and other systems in the traditional, basic tasks of a morphological analyser.

The examples in this document have been given in omorfi's own notation for analyses. While this notation may change between versions it should always cover the same features and information and therefore I believe this verbose notation is most useful for the examples of this documentation.

Morphology

The implementation of morphology in Omorfi deals with inflection, derivation and compounding of wordforms. This creates a morphological analyzer, that can retrieve words to their dictionary forms, refered to as lemma, and their morphological analyses. This chapter describes implementation of the morphology in terms of morphological combinatorics and shows how the analyses are formatted by the default analyzer in Omorfi. While the main target of Omorfi source code is a morphological analyzers, it is used for various purposes, such as spell-checking and correction, hyphenation, morpho-syntactic disambiguation, machine translation, so the morphological analyses provided may vary from end application.

The morphological analyses, in the end, are encoded as linear strings. The format of these strings varies wildly depending on application, but omorfi source tree aims to cater all end applications by providing rules to allow different encoding representations of morphological analyses. By default omorfi has its own, rather elaborate tags (also refered to as omor tagging style), more relevant to machine parsing than human-readable. The tags are always written with capitals in form [NAME=VALUE]. If you intend to parse it with scripts you will only need to capture [.*] and split the contents by = to extract some kind of feature structure map. It is very likely for same name to exist multiple times because of compounding, so you will also need to decide how to handle this in your application. It is suggested to give the rightmost reading the most value, since compounding and derivation of Finnish always extend to the right, but in some applications of course this is not the ideal solution.

Other tagging representations are modeled upon specific applications, requirements or standards. Currently you may try some of the recoded analyzers emulating Constraint Grammar, Finnish text collection or apertium style. For interactive use, the colorterm variant can be used for most terse output, but it relies on color coding working on terminal. The Constraint Grammar style is string of form lemma+X+Y+Z, Finnish text collection style correspondingly lemma X Y Z, apertium style lemma<X><Y><Z>. Here's one example from passage parsed under both the omor format and then the Constraint Grammar emulating format:

lukemaan    [BOUNDARY=LEXITEM][LEMMA='lukea'][POS=VERB][KTN=58][KAV=D]
    [VOICE=ACT][INF=MA][NUM=SG][CASE=ILL][BOUNDARY=LEXITEM]
lukemaan    [BOUNDARY=LEXITEM][LEMMA='lukea'][POS=VERB][KTN=58][KAV=D]
    [VOICE=ACT][PCP=MA][CMP=POS][NUM=SG][CASE=ILL][BOUNDARY=LEXITEM]

ja  [BOUNDARY=LEXITEM][LEMMA='ja'][POS=PARTICLE][BOUNDARY=LEXITEM]
ja  [BOUNDARY=LEXITEM][LEMMA='ja'][POS=CONJUNCTION][BOUNDARY=LEXITEM]

selittämään [BOUNDARY=LEXITEM][LEMMA='selittää'][POS=VERB][KTN=53]
    [KAV=C][VOICE=ACT][INF=MA][NUM=SG][CASE=ILL][BOUNDARY=LEXITEM]
selittämään [BOUNDARY=LEXITEM][LEMMA='selittää'][POS=VERB][KTN=53]
    [KAV=C][VOICE=ACT][PCP=MA][CMP=POS][NUM=SG][CASE=ILL][BOUNDARY=LEXITEM]

sitä        [BOUNDARY=LEXITEM][LEMMA='se'][POS=PRONOUN][NUM=SG]
    [CASE=PAR][BOUNDARY=LEXITEM]
sitä        [BOUNDARY=LEXITEM][LEMMA='sitä'][POS=PARTICLE][BOUNDARY=LEXITEM]

ennen       [BOUNDARY=LEXITEM][LEMMA='ennen'][POS=ADVERB][BOUNDARY=LEXITEM]
ennen       [BOUNDARY=LEXITEM][LEMMA='ennen'][POS=ADPOSITION]
    [BOUNDARY=LEXITEM]

kaikkea     [BOUNDARY=LEXITEM][LEMMA='kaikki'][POS=PRONOUN
    ][NUM=SG][CASE=PAR][BOUNDARY=LEXITEM]

kouluissa   [BOUNDARY=LEXITEM][LEMMA='koulu'][POS=NOUN][KTN=1
    ][NUM=PL][CASE=INE][BOUNDARY=LEXITEM]

ja  [BOUNDARY=LEXITEM][LEMMA='ja'][POS=PARTICLE][BOUNDARY=LEXITEM]
ja  [BOUNDARY=LEXITEM][LEMMA='ja'][POS=CONJUNCTION][BOUNDARY=LEXITEM]

muissa      [BOUNDARY=LEXITEM][LEMMA='muu'][POS=ADJECTIVE][KTN=18]
    [CMP=POS][NUM=PL][CASE=INE][BOUNDARY=LEXITEM]

oppilaitoksissa     [BOUNDARY=LEXITEM][LEMMA='oppilaitos'][POS=NOUN][KTN=39
    ][NUM=PL][CASE=INE][BOUNDARY=LEXITEM]

eri [BOUNDARY=LEXITEM][LEMMA='eri'][POS=PARTICLE][BOUNDARY=LEXITEM]

maiden      [BOUNDARY=LEXITEM][LEMMA='maa'][POS=NOUN][KTN=18]
    [NUM=PL][CASE=GEN][BOUNDARY=LEXITEM]

ja  [BOUNDARY=LEXITEM][LEMMA='ja'][POS=PARTICLE][BOUNDARY=LEXITEM]
ja  [BOUNDARY=LEXITEM][LEMMA='ja'][POS=CONJUNCTION][BOUNDARY=LEXITEM]

alueiden    [BOUNDARY=LEXITEM][LEMMA='alue'][POS=NOUN][KTN=48]
    [NUM=PL][CASE=GEN][BOUNDARY=LEXITEM]

poliittisista       [BOUNDARY=LEXITEM][LEMMA='poliittinen'][POS=ADJECTIVE]
    [KTN=38][CMP=POS][NUM=PL][CASE=ELA][BOUNDARY=LEXITEM]

oloista     [BOUNDARY=LEXITEM][LEMMA='olo'][POS=NOUN][KTN=1]
    [GUESS=DERIVE][DRV=INEN][CMP=POS][NUM=SG][CASE=PAR][BOUNDARY=LEXITEM]
oloista     [BOUNDARY=LEXITEM][LEMMA='olo'][POS=NOUN][KTN=1
    ][NUM=PL][CASE=ELA][BOUNDARY=LEXITEM]
oloista     [BOUNDARY=LEXITEM][LEMMA='oloinen'][POS=ADJECTIVE]
    [KTN=38][CMP=POS][NUM=SG][CASE=PAR][BOUNDARY=LEXITEM]

CG format:

lukemaan    lukea+V+Act+Inf3+Sg+Ill
lukemaan    lukea+V+Act+AgPcp+Pos+Sg+Ill

ja  ja+Part
ja  ja+Conj

selittämään selittää+V+Act+Inf3+Sg+Ill
selittämään selittää+V+Act+AgPcp+Pos+Sg+Ill

sitä        se+Pron+Sg+Par
sitä        sitä+Part

ennen       ennen+Adv
ennen       ennen+Adp

kaikkea     kaikki+Pron+Sg+Par

kouluissa   koulu+N+Pl+Ine

ja  ja+Part
ja  ja+Conj

muissa      muu+A+Pos+Pl+Ine

oppilaitoksissa     oppilaitos+N+Pl+Ine

eri eri+Part

maiden      maa+N+Pl+Gen

ja  ja+Part
ja  ja+Conj

alueiden    alue+N+Pl+Gen

poliittisista       poliittinen+A+Pos+Pl+Ela

oloista     olo+N+Der/inen+Pos+Sg+Par
oloista     olo+N+Pl+Ela
oloista     oloinen+A+Pos+Sg+Par

Names of tags in omorfi tagset

The tags of omorfi analyses refer to the strings of form [NAME=VALUE]. For example in tag [POS=NOUN] the NAME is POS and the VALUE is NOUN. The below listing is short reference for the tags and their values. The definitions and examples are provided in the following chapters.

  • [BOUNDARY] is word boundary marker. NAME is BOUNDARY and possible values are LEXITEM for initial and final boundaries and COMPOUND for compound medial boundaries.
  • [LEMMA] contains reference to dictionary word form. The VALUE can be therefore arbitrary string from the dictionary resources, spelled exactly as it is in the dictionary.
  • [POS] is part of speech tag. NAME is POS, values can be traditional parts of speech, such as VERB, NOUN, ADJECTIVE, etc. as well as one of omorfi’s own pseudo parts of speech, such as SUFFIX, PREFIX etc.
  • [SUBCAT] is optional generic subcategorization for tags that do not fit under any other name. Possible values are such as PROPER for proper nouns, DEMONSTRATIVE for demonstrative pronouns etc.
  • The INFLECTION contains following tags in order:
    • [KTN] is inflection class from kotus dictionaries. NAME is KTN and values are 1 through 49 for nominals or 52 through 78 for verbs.
    • [KAV] is kotus dictionary gradation class letter. NAME is KAV and VALUE is one of A through M. KAV is optional tag, for words that are not tagged as gradating in dictionaries it does not exist (regardless whether actual morphological process of gradation applies).
    • [INF] applies to verb and marks infinitive forms. Its possible values are A, MA, E and MAISILLA. The infinitive forms have limited nominal inflection.
    • [PCP] applies to verbs and marks participle forms. Its possible values are NUT, MA and NEG. Participles have full adjectival inflection including cases, comparation and such.
    • [PRS] applies to finite forms of verbs, with possible values of SG1, SG2, SG3, PL1, PL2, PL3 and PE4.
    • [NUM] applies to nominals that have numeral inflection, possible values are SG, PL and SG,PL.
    • [CASE] applies to nominals and nominal forms of verbs, marking the case. the possible values for typical nominals are NOM, GEN, ACC, PAR, INE, ELA, ILL, ADE, ABL, ALL, ESS, ABE, TRA, CMT. Limited cases have also values of LAT, DIS, and PRL.
    • [POSS] applies to nominals and marks possessive suffixes. The possible values are SG1, SG2, SG3,PL3, PL1 and PL2.
    • [CLIT] applies to almost any part of speech and word forms, marking the discource particle enclitics. The possible values are KIN, KO KAAN, PA, HAN and for few rare cases S and KA. The clitics may form chains of arbitrary length and ordering.
  • [DRV] is derivational for suffixes that return to main lexicon rather than end or compounding. Derivation of course wildly overgenerates so for some applications it will be preferable to remove them altogether. There’s a large amount of possible values and they change with each release so check sources for details.
  • [GUESS] is used to mark root form that is not found in dictionary but was made up by morphology. Possible values are COMPOUND for new forms made by productive compounding mechanism and DERIVE for new forms made by productive derivation mechanism. It is common that compounds and derived forms appear both in derived and underived forms. The weighting mechanism ensures that the readings found by original dictionary form will always be prefered in applications supporting weights. In many end applications, the guessed forms have been pruned from dictionary.
  • [COMPOUND_FORM] is used for special compound forms of words, its possible values are S for nominals exhibiting nen-s variation for compound formation and OMIT for cases where required compound part is omitted with hyphen.
  • [STYLE] is tentative tag for marking stylistic variation, current values are DIALECTAL, NONSTANDARD, RARE and ARCHAIC. Most of these word forms may be unusable for applications expecting modern standard Finnish.

Boundaries

Boundary tags are of form [BOUNDARY=VALUE]. The ultimate boundaries of each lexical item are marked explicitly, using value LEXITEM. The word boundaries inside lexical items are also marked in the analysis, if analysis consists of more than one lemma, using value COMPOUND. The word boundaries of other multi-word expressions are marked by orthographical space only.

Some special symbols can delimit sentences or paragraphs, and have analysis field of boundary value SENTENCE and PARAGRAPH. This feature is experimental.

Thus all analyses of a lexical unit are formed as:

[BOUNDARY=LEXITEM]...[BOUNDARY]*...[BOUNDARY=LEXITEM]

Following examples demonstrate both types of boundaries:

talo  [BOUNDARY=LEXITEM][LEMMA='talo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kissakoira    [BOUNDARY=LEXITEM][LEMMA='kissa'][POS=NOUN][KTN=9]
  [NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='koira'][POS=NOUN][KTN=10][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
BOUNDARY Meaning Examples
LEXITEM ultimate boundary of lexical item kissa (cat) two boundaries
COMPOUND word boundary of a generated compound kissakoira (cat dog) one boundary

Kissakoira is a made-up, but perfectly valid, compound that is unlikely to be in dictionary and therefore most likely formed by productive compounding. The applications that do not make use of productive compounding will not have these forms.

Lemma

In the analyses used with omorfi, the lemma systematically refers to root form of word as it is presented in original lexical data source. For data of Nykysuomen sanalista this means the word form you can use to look it up from Kielitoimiston sanakirja, i.e. the official dictionary. The values are coded by default to tag with name LEMMA, and arbitrary value in single quotation marks:

[BOUNDARY][LEMMA='.*']...

This also means that when derivational or compounding processes create a new word form, the lemma will refer to ones that can be found from dictionary. End user prefering otherwise should look into adding the compounded or derived form to lexical data:

kissa [BOUNDARY=LEXITEM][LEMMA='kissa'][POS=NOUN][KTN=9]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kissatar      [BOUNDARY=LEXITEM][LEMMA='kissa'][POS=NOUN][KTN=9]
  [GUESS=DERIVE][DRV=TAR][POS=NOUN][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kissakoira    [BOUNDARY=LEXITEM][LEMMA='kissa'][POS=NOUN][KTN=9]
  [NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='koira'][POS=NOUN][KTN=10][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
LEMMA Meaning Examples
'kissa', 'koira', ... the dictionary form of words analyzed kissa (cat) one lemma, kissakoira (cat dog) two lemmas

Warning

By default generated compounds have analyses of form [LEMMA][TAGS][LEMMA][TAGS]..., only lexicalised compounds have analyses of form [LEMMALEMMA][TAGS]. If you need traditional form of compound analysis you can either add your compounds to lexicon.

Parts of speech

The parts of speech are indicated in field named POS. All word form analyses typically start with part of speech right after lemma data:

[BOUNDARY][LEMMA][POS]...

The morphological division of Finnish words has three classes: verbal, nominal and others. The verbs are identified by personal, temporal, modal and infinite inflection. The nominals are identified by numeral and case inflection. The others are, apart from being the rest, identified by defective or missing inflection.

The classes are further subdivided by syntactic features. The nominals consist of nouns (substantiivi), adjectives, pronouns and numerals. The others are subdivided into adpositions, adverbs and particles. Omorfi also maintains subdivision of particles into conjunctions, which is not present in the grammar, but so useful for language technology that it has been deemed necessary.

The POS values in omorfi are based on this finer, morphosyntactic classification.

The list below introduces the default example words for each of the parts of speech:

talo  [BOUNDARY=LEXITEM][LEMMA='talo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kutoa [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][INF=A][NUM=SG][CASE=LAT][BOUNDARY=LEXITEM]

kaunis        [BOUNDARY=LEXITEM][LEMMA='kaunis'][POS=ADJECTIVE]
  [KTN=41][CMP=POS][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

minä  [BOUNDARY=LEXITEM][LEMMA='minä'][POS=PRONOUN]
  [SUBCAT=PERSONAL][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

yksi  [BOUNDARY=LEXITEM][LEMMA='yksi'][POS=NUMERAL][KTN=31]
  [SUBCAT=CARD][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

nopeasti      [BOUNDARY=LEXITEM][LEMMA='nopeasti'][POS=ADVERB]
  [BOUNDARY=LEXITEM]

hei   [BOUNDARY=LEXITEM][LEMMA='hei'][POS=INTERJECTION]
  [BOUNDARY=LEXITEM]

no    [BOUNDARY=LEXITEM][LEMMA='no'][POS=PARTICLE]
  [BOUNDARY=LEXITEM]

että  [BOUNDARY=LEXITEM][LEMMA='että'][POS=CONJUNCTION]
  [SUBCAT=SUBORD][BOUNDARY=LEXITEM]
POS Meaning Example
NOUN noun (Finnish substantiivi) talo (house)
VERB verb kutoa (knit)
ADJECTIVE adjective kaunis (beautiful)
PRONOUN pronoun minä (I)
NUMERAL numeral yksi (one)
ADVERB adverb nopeasti (fast)
INTERJECTION interjection hei (hey!)
ADPOSITION adposition päällä (over)
PARTICLE particle no (well)
CONJUNCTION conjunction että (so that)

Note

VISK: - definitions > S > sanaluokka - § 438 <http://scripta.kotus.fi/visk/sisallys.php?p=438> - § 63 onwards explains morphological features of parts of speech <http://scripta.kotus.fi/visk/sisallys.php?p=63>.

Nominal declination

Nominal parts of speech have common nominal declination consisting 16 cases in singular and plural, combined with any possessive suffix, combined with any clitics. Total is some thousands of word forms per word. The nominal parts of speech include nouns, adjectives, numerals and pronouns. The nominalised forms of verbs will also include nominal declination. The format of noun analysis string is:

...[NUM][CASE][POSS]?[CLIT]*[BOUNDARY]

Examples of nouns in tables of this section are given with forms of word valo (light), which does not have any stem variation in inflection. Here's a few examples of valo's inflectional pattern:

valolle       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ALL][BOUNDARY=LEXITEM]

valolleni     [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ALL][POSS=SG1][BOUNDARY=LEXITEM]

valoillenikokaan      [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=PL][CASE=ALL][POSS=SG1][CLIT=KO][CLIT=KAAN][BOUNDARY=LEXITEM]

Number

Nominals inflect in number, to mark plurality of the word. NUM for nouns is either singular or plural, or in some cases underspecified. Numeral ending comes first after word stem, but is often more or less combined with case ending, and usually causes stem variation:

valo  [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=NOM][BOUNDARY=LEXITEM]

valot [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=PL]
  [CASE=NOM][BOUNDARY=LEXITEM]
NUM Meaning Example
SG Singular valo (light)
PL Plural valot (lights)

Case

CASE for nominals has 16 possible values, the cases of Finnish nominals mark syntactic roles (nominative, partitive, accusative-genitive) and semantics (others, partially even syntactic cases). The syntactic designation or semantic gloss is given in the meaning column, the traslations in example column are approximate since there's no 1:1 correspondence between semantic cases of Finnish and prepositions of English.

While many of cases have only one distinct ending, some combinations of plurality and case endings can exhibit up to 6 distinct case markers:

valo  [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=NOM][BOUNDARY=LEXITEM]

valoa [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=PAR][BOUNDARY=LEXITEM]

valon [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=GEN][BOUNDARY=LEXITEM]

valossa       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=INE][BOUNDARY=LEXITEM]

valosta       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ELA][BOUNDARY=LEXITEM]

valoon        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ILL][BOUNDARY=LEXITEM]

valolla       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ADE][BOUNDARY=LEXITEM]

valolta       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ABL][BOUNDARY=LEXITEM]

valolle       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ALL][BOUNDARY=LEXITEM]

valona        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ESS][BOUNDARY=LEXITEM]

valoksi       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=TRA][BOUNDARY=LEXITEM]

valotta       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1][NUM=SG]
  [CASE=ABE][BOUNDARY=LEXITEM]

valoine       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=9]
  [CASE=CMT][BOUNDARY=LEXITEM]

valoin        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=9]
  [NUM=PL][CASE=INS][BOUNDARY=LEXITEM]
CASE Meaning Example
NOM Nominative (subject) valo (light)
PAR Partitive (partial object) valoa (some light)
GEN Genitive (attribute/possessive) valon (light's)
INE Inessive (in inside) valossa (in light)
ELA Elative (away from inside) valosta (from (inside of) light)
ILL Illative (into inside) valoon (to light)
ADE Adessive (on surface/vicinity) valolla (on/nearby light)
ABL Ablative (from surface/vicinity) valolta (from (nearby of) light)
ALL Allative (on to surface/vicinity) valolle (towards the light)
ESS Essive (as) valona (as light)
TRA Translative (become as) valoksi (into light)
ABE Abessive (without) valotta (without light)
CMT Comitative (with/in company of) valoine (with lights)
INS Instructive (with/by using) valoin (using lights)

Possessive suffixes

Posessive ending indicates ownership and can attaches always after a case ending. POSS can take six possible values from singular and plural, first, second and third person references, where third person form is always ambiguous over plurality. The third person form also has two allomorphs, latter of which typically only exists after long vowels. Here are the example readings of word light:

valoni        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][POSS=SG1][BOUNDARY=LEXITEM]

valosi        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][POSS=SG2][BOUNDARY=LEXITEM]

valonsa       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][POSS=SG3][BOUNDARY=LEXITEM]
valonsa       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][POSS=PL3][BOUNDARY=LEXITEM]

valomme       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][POSS=PL1][BOUNDARY=LEXITEM]

valonne       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][POSS=PL2][BOUNDARY=LEXITEM]
POSS Meaning Example
SG1 First person singular valoni (my light)
SG2 Second pers. singular valosi (your light)
SG3, PL3 third person singular or plural valonsa (his/her/their light)
PL1 First person plural valomme (our light)
PL2 Second pers. plural valonne (your light)

Noun subcategories

Nouns have currently only one subcategory of proper nouns, or names. Proper nouns are usually written with initial capitals–or more recently, totally arbitrary capitalisations, such as in brand names nVidia and ATi. Proper nouns do have full inflectional morphology exactly as other nouns, but work slightly differently in derivation and compounding. Some capitalised nouns may also lose capitalisation in derivation. Here are examples of semantic sub classes of proper nouns:

Pekka [BOUNDARY=LEXITEM][LEMMA='Pekka'][POS=NOUN]
  [SUBCAT=PROPER][KTN=9][KAV=A][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

Virtanen      [BOUNDARY=LEXITEM][LEMMA='Virtanen'][POS=NOUN]
  [SUBCAT=PROPER][KTN=38][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

Helsinki      [BOUNDARY=LEXITEM][LEMMA='Helsinki'][POS=NOUN]
  [SUBCAT=PROPER][KTN=5][KAV=G][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
SUBCAT Meaning Examples
PROPER proper noun Pekka (personal name), Virtanen (surname), Helsinki (geographical name)

Allomorphy

Certain nominal cases have multiple surface forms, which some applications need to tell apart. For these cases the omor tagset provides ALLO tag. The value of ALLO is the morphophonemic representation of the morpheme, written in caps, such as A for partitive ending a or ä.

Adjectives

Adjectives are effectively inflected as nouns, with additional level of comparison forms before regular nominal inflection. Adjectives are also very unlikely to have possessive suffixes. The adjectives

[POS=ADJECTIVE][KTN][KAV]?[CMP][NUM][CASE][POSS]?[CLIT]?

The examples in this section are given with nopea (fast). Here's an example of how comparisons forms derive to nominal inflection:

nopea [BOUNDARY=LEXITEM][LEMMA='nopea'][POS=ADJECTIVE][KTN=15]
  [CMP=POS][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

nopeampi      [BOUNDARY=LEXITEM][LEMMA='nopea'][POS=ADJECTIVE]
  [KTN=15][CMP=CMP][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

nopein        [BOUNDARY=LEXITEM][LEMMA='nopea'][POS=ADJECTIVE]
  [KTN=15][CMP=SUP][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

Note

VISK §

Comparison

Comparison has three levels marked by CMP tag. In modern grammar comparison is under derivation instead of regular inflection, which also makes sense for Omorfi, since each form of comparison has full set of nominal inflection. The comparative suffixes precede the nominal inflection.

CMP Meaning Example
POS Positive nopea
CMP Comparative nopeampi
SUP Superlative nopein

Numerals

Numerals do not have any specific inflection besides noun's. The numerals, however, do have special compounding restrictions and patterns. They are also one of the typical part of speech in systems, so it is included here as separate class. The analysis of numeral compounds is detailed in the compounding section, but otherwise numerals follow the basic nominal pattern. It may also be noteworthy that this means full nominal inflection; Finnish numerals have singular and plural forms. The analysis strings are as with nouns:

[POS=NUMERAL][KTN][KAV]?[NUM][SUBCAT][CASE][POSS]?[CLIT]*

The numerals are of course infinite, closed class of words. The implementation of Omorfi aims to recognise all of the numeral words and their compounds using systemic names for very large numerals. The systemic names are comprised of the greek prefix x and suffix part for xillions and xilliards (i.e. like long scale English numerals). So the scale goes from miljoona (10^6, million), miljardi (10^9, milliard), biljoona, biljardi, triljoona, and so on for prefixes kvadri-, kvinti-, septi-, ..., until sentiljoona (10^303). Here are few examples:

yksi  [BOUNDARY=LEXITEM][LEMMA='yksi'][POS=NUMERAL][KTN=31]
  [SUBCAT=CARD][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kaksitoista   [BOUNDARY=LEXITEM][LEMMA='kaksi'][POS=NUMERAL]
  [KTN=31][SUBCAT=CARD][NUM=SG][CASE=NOM]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='toista']
  [POS=NUMERAL][BOUNDARY=LEXITEM]

 satakaksikymmentäkolmemiljoonaaneljäsataaviisikymmentä-
  kuusituhattaseitsemänsataakahdeksankymmentäyhdeksän
  [BOUNDARY=LEXITEM][LEMMA='sata'][POS=NUMERAL][KTN=9]
  [KAV=F][SUBCAT=CARD][NUM=SG][CASE=NOM]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='kaksi']
  [POS=NUMERAL][KTN=31][SUBCAT=CARD][NUM=SG][CASE=NOM]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='kymmenen']
  [POS=NUMERAL][KTN=32][SUBCAT=CARD][NUM=SG][CASE=PAR]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='kolme']
  [POS=NUMERAL][KTN=7][SUBCAT=CARD][NUM=SG][CASE=NOM]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='miljoona']
  [POS=NUMERAL][KTN=10][SUBCAT=CARD][NUM=SG][CASE=PAR]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='neljä']
  [POS=NUMERAL][KTN=10][SUBCAT=CARD][NUM=SG][CASE=NOM]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='sata'
  ][POS=NUMERAL][KTN=9][KAV=F][SUBCAT=CARD][NUM=SG]
  [CASE=PAR][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='viisi'][POS=NUMERAL][KTN=27][SUBCAT=CARD]
  [NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='kymmenen'][POS=NUMERAL][KTN=32][SUBCAT=CARD]
  [NUM=SG][CASE=PAR][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='kuusi'][POS=NUMERAL][KTN=27][SUBCAT=CARD]
  [NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='tuhat'][POS=NUMERAL][KTN=46][SUBCAT=CARD]
  [NUM=SG][CASE=PAR][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='seitsemän'][POS=NUMERAL][KTN=10][SUBCAT=CARD]
  [NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='sata'][POS=NUMERAL][KTN=9][KAV=F][SUBCAT=CARD]
  [NUM=SG][CASE=PAR][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='kahdeksan'][POS=NUMERAL][KTN=10][SUBCAT=CARD]
  [NUM=SG][CASE=NOM][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='kymmenen'][POS=NUMERAL][KTN=32][SUBCAT=CARD]
  [NUM=SG][CASE=PAR][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='yhdeksän'][POS=NUMERAL][KTN=10][SUBCAT=CARD]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

Numeral categories

Numerals have functional subcategories for semantics, which have been used in most of the other systems and retained here as well. The distinction is made between cardinal and ordinal numbers, and is purely semantic:

kolme [BOUNDARY=LEXITEM][LEMMA='kolme'][POS=NUMERAL][KTN=7]
  [SUBCAT=CARD][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

neljäs        [BOUNDARY=LEXITEM][LEMMA='neljäs'][POS=NUMERAL]
  [KTN=45][SUBCAT=ORD][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
SUBCAT Meaning Example
CARD cardinal kolme (three)
ORD ordinal neljäs (fourth)

For some numerals there are special derived forms with approximative meaning. These forms are not often fully inflected or inflected at all, and do not participate in compounding:

kuutisen

toistasataa
SUBCAT Meaning Example
APPROX approximal kuutisen (about six), toistasataa (100–200)

Note

VISK §

Pronouns

Pronouns inflect mostly like nouns, but have their own POS. Pronouns are also only nouns to have explicit phonemically distinct accusative markers. Many of pronouns have defective pattern, e.g. only singulars or plurals, or heteroclitical paradigms. Pronoun analyses are of same form as other nominals:

[POS=PRONOUN][KTN][KAV]?[NUM][CASE][POSS]?[CLIT]*

Pronoun-specific cases

Some of the pronouns have accusative as separate case:

minut [BOUNDARY=LEXITEM][LEMMA='minä'][POS=PRONOUN]
  [SUBCAT=PERSONAL][NUM=SG][CASE=ACC][BOUNDARY=LEXITEM]
CASE Meaning Examples
ACC Accusative (object) minut (me)

Note

VISK §

Pronoun subcategories

Pronouns are divided into semantic classes by use. The classification is fully copied from the modern grammar:

minä  [BOUNDARY=LEXITEM][LEMMA='minä'][POS=PRONOUN]
  [SUBCAT=PERSONAL][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]


tämä  [BOUNDARY=LEXITEM][LEMMA='tämä'][POS=PRONOUN]
  [SUBCAT=DEMONSTR][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kuka  [BOUNDARY=LEXITEM][LEMMA='kuka'][POS=PRONOUN]
  [SUBCAT=INTERROG][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

joka  [BOUNDARY=LEXITEM][LEMMA='joka'][POS=PRONOUN]
  [SUBCAT=RELATIVE][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kukaan        [BOUNDARY=LEXITEM][LEMMA='kukaan'][POS=PRONOUN]
  [SUBCAT=QUANTOR][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

itse  [BOUNDARY=LEXITEM][LEMMA='itse'][POS=PRONOUN]
  [SUBCAT=REFLEX][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

toinen        [BOUNDARY=LEXITEM][LEMMA='toinen'][POS=PRONOUN]
  [SUBCAT=RECIPROC][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
SUBCAT Meaning Examples
PERSONAL Personal minä (me)
DEMONSTR Demonstrative tämä (this)
INTERROG Interrogative kuka (who?)
RELATIVE Relative joka (who)
QUANTOR Quantor kukaan (no one)
REFLEX Reflexive itse (self)
RECIPROC Reciprocal toinen (each other)

Adverbs, adpositions and other ad words

Ad words are typically derived or inflected word forms with lexicalised meanings and defective inflection patterns; habitive adverbs (e.g. mainly sti derivation, but not all) have comparation and clitics, locative adverbs have partial locative cases, possessives and clitics, temporal adverbs have only clitics. Prolatives and similar (e.g. yli ~ ylitse) may only have clitics as well. Lots of inflected forms of adverbs is further lexicalised into more adverbs (i.e. all forms of one adverb have dictionary entries). Intensifying adverbs might not assume clitics at all. The analysis strings of adverbs therefore vary on case-by-case basis. Mostly they fall under simple form of:

[POS=ADVERB][CASE]?[POSS]?[CLIT]?

[POS=ADPOSITION][CASE]?[POSS]?[CLIT]?

Note

VISK § 678 (discriminating adverb from adposition) <http://scripta.kotus.fi/visk/sisallys.php?p=678>

Adverbs

As noted earlied, many of adverbs are nominals with current or archaic case endings, and the endings may be marked in omorfi as long as they are clear. Also the sti derivation of adjectives is productive in class of manner adverbs. The certain types of adverbs that are mostly productively derived may be available in Omorfi:

nopeasti      [BOUNDARY=LEXITEM][LEMMA='nopea'][POS=ADJECTIVE]
  [KTN=15][CMP=POS][GUESS=DERIVE][DRV=STI][POS=ADVERB][BOUNDARY=LEXITEM]

meritse       [BOUNDARY=LEXITEM][LEMMA='meri'][POS=NOUN][KTN=24]
  [GUESS=DERIVE][NUM=PL][DRV=TSE][POS=ADVERB][CASE=PRL][BOUNDARY=LEXITEM]

taloittain    [BOUNDARY=LEXITEM][LEMMA='talo'][POS=NOUN][KTN=1]
  [GUESS=DERIVE][NUM=PL][DRV=TTAIN][POS=ADVERB][CASE=DIS][BOUNDARY=LEXITEM]
CASE Meaning Example
PRL prolative meritse (by sea)
DIS distributive taloittain (house by house

Adpositions

Adpositions are, like adverbs, current or archaic inflectional forms of regular nominals. The adpositions are further sub-categorised along their syntactic behaviour, to prepositions and postposition. The prepositions appear in front of the adpositional phrase and postpositions in back. Many of the adpositions can appear in both.

Acronyms

Acronyms in omorfi are those shortened nominals, which have inflection. The inflection of these acronyms is formed by adding colon to the acronym, and adding most of the inflectional endings after the colon. The acronyms may be inflected in three ways. The inflectional endings after colon may show either the inflection of last letter of the acronym, or the last word of the acronym. The latter form of inflection is only implemented if the lexical source contains information of the last word of the acronym. For example STT short for Suomen tietotoimisto (Finland's information office) is inflected as STT:hen in illative since letter tee (T) is teehen in illative form, but also STT:oon is valid illative, since -toimisto is -toimistoon in illative form (the additional o there is an orthographic convention). For example:

STT   [BOUNDARY=LEXITEM][LEMMA='STT'][POS=ACRONYM][NUM=SG]
  [CASE=NOM][BOUNDARY=LEXITEM]

STT:hen       [BOUNDARY=LEXITEM][LEMMA='STT'][POS=ACRONYM]
  [NUM=SG][CASE=ILL][BOUNDARY=LEXITEM]

The acronyms that form phonotactically valid words may often be inflected as regular nouns. Since their inflection pattern follows the regular nouns inflection pattern---e.g. KELA (Kansaneläkelaitos, the social security office) is inflected like noun kela ()---they should be treated as regular nouns in all parts of morphology. Some of these words lose their acronym interpretation and become regular nouns written in lowercase, such as laser. The lowercase variants are also allowed for other words:

AIDSilla      [BOUNDARY=LEXITEM][LEMMA='AIDS'][POS=NOUN][KTN=5]
  [NUM=SG][CASE=ADE][BOUNDARY=LEXITEM]

The non-inflecting abbreviations are described in their own section.

Verb conjugation

Verb's conjugation includes voice (in Finnish grammars also verbal genus), tense (tempus), moods (modus), personal endings or negation marker and clitics. The analysis strings of verb inflection is not as systematic as nouns, as most categories collapse together in forms, for example voice distinction does not exist in all moods and tenses, and tense distinction only exists in one mood. Instead of underdefining analyses, many times taggings are omitted so verb analysis strings vary. Part of verbs regular derivation is typically included in the inflection, as has been done in traditional grammars. These infinite forms have nominal declination. Analysis string for finite verb forms is:

[POS=VERB][KTN][KAV]?[VOICE][MOOD][TENSE]?[PRS]?[NEG]?[CLIT]?

The infinite forms of verbs may have voice included. The infinite forms are split into infinitives, participles and derivations. The analysis string after these markers are same as for all nominals:

[POS=VERB][KTN][KAV]?[VOICE][INF][NUM]?[CASE]?

For participles the part after [VOICE] is the same as nominal declination. For infinitives, only some of the CASE values may appear, and full listing of those cases can be found below.

Verb subcategories

Verbs have only one special subcategory for negation verb ei, which has partial inflection:

[BOUNDARY=LEXITEM][LEMMA='ei'][POS=VERB][SUBCAT=NEG]
  [VOICE=ACT][PRS=SG1][BOUNDARY=LEXITEM]
SUBCAT Meaning Example
NEG negation verb en (I don't)

Note

Marking negation verb as specific sub-category of verbs and the verb form that only goes along with it conneg has some history in fennistics, but I do not know the origin of the practice and it isn't in VISK. In fact this practice was added for interoperability with Sámi language morphologies, which follow the same tagging.

Finite verb inflection

The finite inflection of verbs concerns actual verbal inflection in person, mood, tense.

Person

Personal ending of verb defines the actors. PRS has seven possible values, six for the singular and plural groups of first, second and third person forms, and one specifically for passive. The passive personal form is encoded as fourth person passive, which had been the common practice in past systems and is accurate naming:

kudon [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=SG1][BOUNDARY=LEXITEM]

kudot [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=SG2][BOUNDARY=LEXITEM]

kutoo [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=SG3][BOUNDARY=LEXITEM]

kudomme       [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=PL1][BOUNDARY=LEXITEM]

kudotte       [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=PL2][BOUNDARY=LEXITEM]

kutovat       [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52][
  KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=PL3][BOUNDARY=LEXITEM]

kudotaan      [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=PSS][MOOD=INDV][TENSE=PRES][PRS=PE4][BOUNDARY=LEXITEM]
PRS Meaning Example
SG1 First pers. singular kudon (I knit)
SG2 2nd person singular kudot (you knit)
SG3 Third pers. singular kutoo (he/she/it knits)
PL1 First pers. plural kudomme (we knit)
PL2 2nd person plural kudotte (you knit)
PL3 Third pers. plural kutovat (they knit)
PE4 Passive 4th person kudotaan (knitting is being done)

Negated form

Verbs have specific forms going together with negation verb (which has partial inflection itself). This form is marked with a NEG tag with value CON. The existence of negated form varies between moods, voices and tenses:

kudo  [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][MOOD=INDV][TENSE=PRES][NEG=CON][BOUNDARY=LEXITEM]

kudota        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=PSS][MOOD=INDV][TENSE=PRES][PRS=PE4][NEG=CON]
  [BOUNDARY=LEXITEM]
NEG Meaning Example
CON Negated form (en) kudo (I don't knit), (ei) kudota (no knitting)

Verbal genus (voice)

Verb inflection has two categories for active and passive voice, marked in tag named VOICE. For finite verb forms active voice is tied to personal forms and passive voice to non-personal verb endings. The voice is also marked in some of the infinite verb forms:

kudon [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=SG1][BOUNDARY=LEXITEM]

kudotaan      [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=PSS][MOOD=INDV][TENSE=PRES][PRS=PE4][BOUNDARY=LEXITEM]
VOICE Meaning Example
ACT active kudon (I knit)
PSS passive kudotaan (knitting)

Note

ISK § 110 <http://scripta.kotus.fi/visk/sisallys.php?p=110>, of passive

Tempus (tense)

Verbs may inflect to mark up tense. TENSE has two values. For moods other than indicative the tense is not distinctive in surface form, and therefore not marked in the analyses. The morphologically distinct forms in Finnish form only distinctions between past and non-past tenses, which should be noted since some historical systems have talked about imperfect and present:

kudon [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=SG1][BOUNDARY=LEXITEM]

kudoin        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PAST][PRS=SG1][BOUNDARY=LEXITEM]
Symbol Tense Example
PRES non-past kudon (I knit)
PAST past kudoin (I knitted)

Note

VISK § 112 <http://scripta.kotus.fi/visk/sisallys.php?p=112>, § 111 for tenses and moods collectively

Modus (Mood)

Finite verb forms inflect to mark up moods. Mood is systematically included in analysis strings, even with unmarked indicative. Only indicative mood includes full set of temporal and personal inflection, others have limited inflection in current use. Some forms may also be covered by theoretical or archaic word forms, which are included in some versions of Omorfi. MOOD has four possible values:

kudon [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=INDV][TENSE=PRES][PRS=SG1][BOUNDARY=LEXITEM]

kudo  [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=IMPV][PRS=SG2][BOUNDARY=LEXITEM]

kutoisin      [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=COND][PRS=SG1][BOUNDARY=LEXITEM]

kutonen       [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][MOOD=POTN][PRS=SG1][BOUNDARY=LEXITEM]
VALUE Meaning Example
INDV indicative kudon (I knit)
IMPV imperative kudo (do knit!)
COND conditional kutoisin (I would knit)
POTN potential kutonen (I might knit)

Note

VISK § 115–118 <http://scripta.kotus.fi/visk/sisallys.php?p=115>, § 111 for tenses and moods collectively

Infinite verb forms

Infinite verb forms are in principle nominal derivations from verb, included in morphology as inflection by long linguistic tradition. Especially notable is that verb form A infinitive with lative case marking is still considered the dictionary form of the verb.

Infinitives

INF has 4 possible values. Also one fully productive derivational form used to be marked infinitive in old grammars. In traditional grammars the infinitive forms were called I, II, III, IV and V infinitive, the modern grammar replaces the first three with A, E and MA respectively. The IV infinitive, which has minen suffix marker, has been reanalysed as derivational and this is reflected in Omorfi. The V infinitive is also assumed to be mainly derivational, but included here for reference.

The short form of A infinitive is in lative case which is extinct from nominal conjugation. The long form of A infinitive is translative, and it requires possessive suffix. For E infinitive, the possible cases are inessive and instructive, the possessive suffix is optional for both, but rare for instructive form. For MA infinitive the possible cases are abessive, adessive, elative, illative, inessive and instructive, the possessive ending is very rare since it usually indicates agent participle instead. The mAisillA derivation is theoretically already in adessive case (of mA infinitive's inen derivation, but this re-analysis is not performed in omorfi) and therefore has no case inflection, the possessive endings are optional but common. The minen derivation creates a noun root form, and has standard nominal inflection:

kutoa [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][INF=A][NUM=SG][CASE=LAT][BOUNDARY=LEXITEM]

kutoen        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][VOICE=ACT][INF=E][NUM=SG][CASE=INS][BOUNDARY=LEXITEM]

kutomatta     [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
[KAV=F][VOICE=ACT][INF=MA][NUM=SG][CASE=ABE][BOUNDARY=LEXITEM]

kutominen     [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][DRV=MINEN][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kutomaisillani        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB]
  [KTN=52][KAV=F][VOICE=ACT][DRV=MAISILLA][POSS=SG1][BOUNDARY=LEXITEM]
INF Meaning Examples
A A infinitive kutoa (to knit)
E E infinitive kutoen (by knitting)
MA Ma infinitive kutomatta (without knitting)
DRV=MINEN IV infinitive kutominen (knitting n.)
DRV=MAISILLA V infinitive kutomaisillani (I am about to knit)

Note

VISK § 120–121 <http://scripta.kotus.fi/visk/sisallys.php&p=120>, § 119 for infinite forms collectively

Participles

There are 4 participle forms. Like infinitives, participles in traditional grammars were named I and II where NUT and VA are used in modern grammars. The agent and negation participle have sometimes been considered outside regular inflection, but in modern Finnish grammars are alongside other participles and so they are included in inflection in omorfi as well. In some grammars the NUT and VA participles have been called past and present participles respectively, drawing parallels from other languages, but these names are more misleading and should usually be avoided. The participles work as mostly as adjective or nominal derivations, and may include full nominal inflection:

kutonut       [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB]
  [KTN=52][KAV=F][VOICE=ACT][PCP=NUT][CMP=POS]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

kutova        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB]
  [KTN=52][KAV=F][VOICE=ACT][PCP=VA][CMP=POS][NUM=SG]
  [CASE=NOM][BOUNDARY=LEXITEM]

kutoma        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB]
  [KTN=52][KAV=F][VOICE=ACT][PCP=MA][CMP=POS][NUM=SG]
  [CASE=NOM][BOUNDARY=LEXITEM]

kutomaton     [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB]
  [KTN=52][KAV=F][VOICE=ACT][PCP=NEG][CMP=POS]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
PCP Meaning Example
NUT Nut participle kutonut (been knitted)
VA Va participle kutova (to be knitted)
MA Agent participle kutomani (which I knitted)
NEG Negated participle kutomaton (unknitted)

Warning

Be aware that some traditional commercial software for Finnish morphology mistakenly analyse agent participles as MA infinitives which result in different taggings in some reference corpora you may see. To distinguish agent participle from MA infinitive, apart from semantics, agent participle almost always requires possessive suffix, and only rarely specifies agent via syntactic means. Also, participles allow all cases whereas set of cases used with infinitives are limited.

Note

VISK § 122 <http://scripta.kotus.fi/visk/sisallys.php?p=122>, § 119 for infinite forms collectively

Discourse particles (clitics)

Clitics are suffixes which can attach almost anywhere in the ends of words, both verb forms and nominals. They also attach on end of other clitics, froming theoretically infinite chains. In practice it is usual to see at most three in one word form. Two clitics have limited use: -s only appears in few verb forms and combined to other clitics and -kA only appears with few adverbs and negation verb. Their meaning also largely varies largely on context and even intonation, and the glosses below are therefore very vaguely relevant:

valohan       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][CLIT=HAN][BOUNDARY=LEXITEM]

valokaan      [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][CLIT=KAAN][BOUNDARY=LEXITEM]

valokin       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][CLIT=KIN][BOUNDARY=LEXITEM]

valoko        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][CLIT=KO][BOUNDARY=LEXITEM]

valopa        [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=NOM][CLIT=PA][BOUNDARY=LEXITEM]

tules [BOUNDARY=LEXITEM][LEMMA='tulla'][POS=VERB][KTN=67]
  [VOICE=ACT][MOOD=IMPV][PRS=SG2][CLIT=S][BOUNDARY=LEXITEM]

eikä  [BOUNDARY=LEXITEM][LEMMA='ei'][POS=VERB][SUBCAT=NEG]
  [VOICE=ACT][PRS=SG3][CLIT=KA][BOUNDARY=LEXITEM]
CLIT Meaning Example
HAN -hAn (even, also) valohan (even light)
KAAN -kAAn (not even) valokaan (not even light)
KIN -kin (also, as well) valokin (also light)
KO -kO (question) valoko (light?)
PA -pA (indeed, esp.) valopa (light indeed)
S -s (moderate) tules (do come)
KA -kA (negation) eikä (nor)

Note

VISK § 126– <http://scripta.kotus.fi/visk/sisallys.php?p=126>, § 131 on combinatorics,

Other expressions

Many numerals are written in digits or other codified expressions. Even digit sequences inflect and participate in compounding in Finnish.

SUBCAT Meaning Example
DIGIT numeral written in digits 3,141 (3.141), XIV:ttä (of 14th)

Non-inflecting parts of speech

There are several parts of speech in omorfi that do not have any inflection and do not participate in derivation or compounding. The official grammar uses name particle for all of the non-inflecting words, here the syntactic and semantic division for conjunctions, interjections and the rest (named as particles here and in old grammars) has been retained.

Conjunctions

Conjunctions are non-inflecting words that join syntactic structures together. The conjunstions have two subcategories according the type of syntactic relation they make. The analysis string of conjunction is:

[POS=CONJUNCTION][SUBCAT]

Subcategories of conjunctions: -ordination

The conjunctions are divided into two classes depending on whether they act as subordinating or co-ordinating their respective syntactic units, this is marked by SUBCAT values SUBORD and COORD:

kun   [BOUNDARY=LEXITEM][LEMMA='kun'][POS=CONJUNCTION]
  [SUBCAT=SUBORD][BOUNDARY=LEXITEM]

ja    [BOUNDARY=LEXITEM][LEMMA='ja'][POS=CONJUNCTION]
  [SUBCAT=COORD][BOUNDARY=LEXITEM]
SUBCAT Meaning Examples
SUBORD Subordinating kun (when)
COORD Co-ordinating ja (and)

Note

VISK § 816 <http://scripta.kotus.fi/visk/sisallys.php?p=816> (the classification differs, SUBORD is for unifying with other systems)

Interjections

Interjections are usually characterisations of speech acts, and may often consist of more or less arbitrary series of characters, sometimes onomatopoetic. Also minimal turns in dialogue, mumbling, swearing, and so on are interjections. They always have analysis string:

[POS=INTERJECTION]

Abbreviations

Abbreviations are shortened word forms that do not inflect. Most of the abbreviations are written with lowercase letters and end in full stop. Some of the old abbreviations use colon as marker of omission inside the word. The analysis string must be:

[POS=ABBREVIATION]

Particles

Particles are leftover part of speech for non-inflected words that didn't find their way elsewhere. The analysis string is always:

[POS=PARTICLE]

Derivations

Derivation forming is experimental feature and not present in all versions and applications using omorfi. The derived forms should be considered guesses at best. The form of derived analysis strings vary depending on root word, but typical form is:

[POS][INFLECTIONS...][GUESS][DRV][POS]...

The first POS is POS of dictionary word, the second is POS of derived form. Currently formed are following DRV values:

nopeasti      [BOUNDARY=LEXITEM][LEMMA='nopea'][POS=ADJECTIVE]
  [KTN=15][CMP=POS][GUESS=DERIVE][DRV=STI][POS=ADVERB][BOUNDARY=LEXITEM]

kutoja        [BOUNDARY=LEXITEM][LEMMA='kutoa'][POS=VERB][KTN=52]
  [KAV=F][GUESS=DERIVE][DRV=JA][POS=NOUN]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

valoinen      [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN]
  [KTN=1][GUESS=DERIVE][DRV=INEN][POS=ADJECTIVE]
  [CMP=POS][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

valotar       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN]
  [KTN=1][GUESS=DERIVE][DRV=TAR][POS=NOUN]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

valollinen    [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN]
  [KTN=1][GUESS=DERIVE][DRV=LLINEN][POS=NOUN]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

valoton       [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN]
  [KTN=1][GUESS=DERIVE][DRV=TON][POS=NOUN]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

valoitse      [BOUNDARY=LEXITEM][LEMMA='valo'][POS=NOUN]
  [KTN=1][GUESS=DERIVE][NUM=PL][DRV=TSE]
  [POS=ADVERB][CASE=PRL][BOUNDARY=LEXITEM]
DRV Meaning Examples
STI manner of A nopeasti (fast)
JA actor of V kutoja (knitter)
INEN having N valoinen (lightful)
TAR feminine N valotar (lightress)
LLINEN owner of N valollinen (lighted)
TON without N valoton (lightless)
TSE via N valoitse (by light)
VS N-ness valous (lightness)

For most applications derivations must be removed from the morphological process and added to lexical data source as needed.

Compounding

Compounding is productive morphological process in Finnish language. Typically any nominals can be joined to form ad hoc compounds as needed. There are many restrictions to the word forms allowed in compounds. The productive nominal compounds are always formed by chain of nominals in genitive, nominative or special compound form, followed by final nominal word holding the inflectional suffixes. The nominals may also be nominalised verb forms.

There are also less productive compounds, where initial parts of compound may have other forms than those listed above, these should be added to lexical data since they are typically lexicalised. There is also set of adjective initial compounds where inflection in standard Finnish is said to agree for all parts of compound, these cases are not many and becoming more rare in general use, so they should be listed in exceptions.

The numeral compounds agree in all parts, except for nominative form where multiplicants take partitive forms. This complexity is hard-coded to morphology. In numeral compounds also the order of multipliers must go in decreasing magnitude.

The table below illustrates possible chains by some examples:

talonmies     [BOUNDARY=LEXITEM][LEMMA='talo'][POS=NOUN][KTN=1]
  [NUM=SG][CASE=GEN][BOUNDARY=COMPOUND]
  [GUESS=COMPOUND][LEMMA='mies'][POS=NOUN][KTN=42]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

salaattikastike       [BOUNDARY=LEXITEM][LEMMA='salaatti'][POS=NOUN]
  [KTN=5][KAV=C][NUM=SG][CASE=NOM][BOUNDARY=COMPOUND
  ][GUESS=COMPOUND][LEMMA='kastike'][POS=NOUN][KTN=48]
  [KAV=A][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

isänisänisä   [BOUNDARY=LEXITEM][LEMMA='isä'][POS=NOUN][KTN=10]
  [NUM=SG][CASE=GEN][BOUNDARY=COMPOUND][GUESS=COMPOUND]
  [LEMMA='isä'][POS=NOUN][KTN=10][NUM=SG][CASE=GEN]
  [BOUNDARY=COMPOUND][GUESS=COMPOUND][LEMMA='isä']
  [POS=NOUN][KTN=10][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]

naislääkäri   [BOUNDARY=LEXITEM][LEMMA='nainen'][POS=NOUN]
  [KTN=38][COMPOUND_FORM=S][BOUNDARY=COMPOUND]
  [GUESS=COMPOUND][LEMMA='lääkäri'][POS=NOUN][KTN=6]
  [NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
Compound pattern Examples
N GEN + N talonmies (house's man = janitor)
N NOM + N salaattikastike (salad dressing)
N GEN* N isänisänisänisän...isä (paternal great great ... grand father)
N CMP + N naislääkärinainen + lääkäri, female doctor)
A X + N X vanhallepojallevanha + poika, old boy = bachelor)
NUM X* kahdeksisadaksikolmeksikymmeneksineljäksi (into 234)

The productive compounding is typically required to gain any coverage with the analyzer, but it's also endless source of problems with ambiguity. In omorfi the method to deal with compounds combines list of verified compounds with estimate of likelihood of compound in weighted analyzer. The end applications may need to ignore productive compounds or decide threshold for accepted compounds.

Guesses

Some analysis include indication that word does not have a dictionary based root form, but contains a root that is generated morphologically. The value of GUESS defines the system that created the new root form. Currently two processes produce new base forms, compounding and derivation.

VALUE Meaning Example
COMPOUND compound kissakoira
DERIVE derive kissatar

For most applications guesses are supposed to be moved into the dictionaries as new base forms if recognised as proper used words.

Style

Many lexical sources seem to record notes of style or area of usage with the words. This kind of lexical data may be indicated in additional STYLE value. The existing uses of style feature classify common misspellings or substandard forms with, dialectal, rare and archaic forms:

seitsämän     [BOUNDARY=LEXITEM][LEMMA='seitsemän']
  [STY=NONSTANDARD][POS=NUMERAL][KTN=10][SUBCAT=CARD]
  [NUM=SG][CASE=GEN][BOUNDARY=LEXITEM]

mie   [BOUNDARY=LEXITEM][LEMMA='mie'][POS=PRONOUN]
  [SUBCAT=PERSONAL][NUM=SG][CASE=NOM][BOUNDARY=LEXITEM]
VALUE Meaning Example
NONSTANDARD non-standard seitsämänseitsemän (seven)
RARE rare  
DIALECTAL dialectal mie (I)
ARCHAIC archaic  

Not all applications and versions of Omorfi include all of these forms.

KTN and KAV

These analyses come from traditional dictionaries released by Research Institute of Languages in Finland (RILF). They are usable in few applications so they are retained as analyses, but mostly they only provide slight disambiguation for some words with equal dictionary forms. For specification of these values the original documentation may be available from kotus kaino resource for Nykysuomen sanalista. The KTN takes values from 1 to 49 for nominals and 52 to 78 for verbs. 50 and 51 were originally used to mark up compounds, but these markings were mostly removed even from the source lexical data omorfi was built on, so they have not been restored. The example tables in the indicated site are comprehensive enough not to be reproduced here.

The KTN and KAV data is additional data related to words, and may not be included in all future versions of omorfi.

Note

VISK § 63 <http://scripta.kotus.fi/visk/sisallys.php?p=63>, gives a short introduction to inflection of Finnish

Uppercasing

Most versions of omorfi can read words written in titlecase or uppercase as variants of the regular lowercased words. For some applications this data is necessary and is saved in omor tagset in tag named CASECHANGE. The casing tag has values NONE, UPFIRST and UPALL for retained case, titlecase and uppercase respectively.

Applications

The main application of omorfi is morphological analyser; the task of reading text and giving morphological analyses of potential word-forms. There are also other applications that use Omorfi, the main ones, such as the writers tools, are distributed along Omorfi. More complex ones, morphosyntactic disambiguation using Constraint Grammar, or rule-based machine-translation using apertium, or the spell checking library voikko, are separate software packages depending upon omorfi. This chapter gives a brief overview of these software and their relation to omorfi.

Morphological analysis

The morphological analysis is provided in omorfi number of different finite state automata, which contain different encodings or analysis styles, and different degrees of certainty for morphological analyses. The different encoding models of analyses are, in addition to the default omorfi tag set, added for needs of external applications, such as CG or apertium. The third, FTC has been made to facilitate comparison of omorfi analysis with those in the commercial corpora of Finnish text collection.

Weighted disambiguation

Omorfi analyser includes some statistical unigram based as well as some rule based crude disambiguation schemes. This is implemented by simply learning the most common word forms from a corpus and using simple rules to reduce the likelihood of unlikely word forms or compounds. This system is detailed in [LIN09a].

Writers's tools

The writer's tools refers to word-error spell-checking and correction, and automatic hyphenation. The automata necessary for this basic functionality is provided with Omorfi. For the basic spell-checking there is a modified version of omorfi's dictionary. The spelling correction is provided by an error model that is applied in conjunction with the spell-checking dictionary. This system is detailed in [PIR10a]. The practical implementation of this spell checking system is included in spell checking library voikko.

Morphosyntactic disambiguation: CG

Warning

The CG ruleset and parsers are not included in omorfi distribution

The morphosyntactic disambiguating system of Finnish uses Constraint Grammar, originated from [KAR90]. The omorfi includes CG compatible analyser, converted automatically from the main analyser. The Constraint Grammar works by extending morphological analyses with syntactic readings and removing illegal readings. Example usage of CG grammar can be found from apertium's Finnish machine translation pairs.

Machine translation: apertium

Warning

Apertium machine translation is not included in omorfi distribution

The apertium machine translation system contains some Finnish translation pairs which use omorfi for basic morphological analysis. The analyses are decorated with Constraint Grammar transfered to/from target/source language, and generated. The omorfi package contains apertium compatible analyser, which may be used for these purposes.

[KOS83]Kimmo Koskenniemi (1983), Two-level morphology (doctoral thesis)
[KAR90]Fred Karlsson (1990) Constraint grammar as a framework for parsing running text in 13th international conference on Computational linguistics
[PIR08]Tommi Pirinen (2008), Automatic morphological finite-state analyzer of Finnish language using open source tools (Master's thesis)
[LIN09a]Krister Lindén, Tommi Pirinen (2009), Weighting Finite-State morphological analysers using HFST tools
[PIR10a]Tommi A Pirinen, Krister Linden (2010), Finite-State Spell-Checking with Weighted Language and Error Models in LREC 2010 Saltmil workshop
[VISK10]Auli Hakulinen, Maria Vilkuna, Riitta Korhonen, Vesa Koivisto, Tarja Riitta Heinonen ja Irja Alho (2010): Iso suomen kielioppi, Verkkoversio