Welcome to Weird World

CRT: Toki Pona

Design equals intention.—Richard Eckersley

Surely the most powerful of all humanity’s beastly urges is the deep-seated lust to edit other people’s writing. You’ve certainly felt it, reading the faux pas in the local paper (“the official spoke on condition of animosity”), or the purple prose in a best seller (“You know your body loves this, Anastasia”), or the typo on the chalkboard at the deli counter (“your welcome to a sample”). But in my case, the urge is directed not in correcting words or even sentences, but in correcting grammars. That is why I studied for so many years to become a CRT. No, not a Cathode Ray Tube, not a Communist Republican Terrorist, not even a Cerebrally Ruptured Trumpite. I am a CRT: a Conlang Repair Technician.

When you think of an ideal repair tech you should think of Darren, the guy who repaired my air-conditioner. There was the poor young man, slaving away on a sunny day in August with a heat index of 110. He wasn’t being creative, wasn’t adding anything, wasn’t improving anything. I didn’t want my central air improved: I wanted not to roast alive in my own living room. And I didn’t, thanks to Darren. As global warming proceeds apace I believe fixers of air-conditioners will come to be worshipped as gods.

A repair technician doesn’t making things better—s/he’s makes things the same in a better way. They restore function to what should be functional and isn’t. And plenty of conlangs could work more efficiently than they do. By “more efficiently” I don’t mean the endless quibbles about Esperanto’s “Standard European” vocabulary, or which conlang is most speakable/ learnable/useful, or what an ideal conlang “should” be. I mean: are the intentions of the conlang’s creator fully realized? And if they aren’t, how can they be made to be so? Any mismatch between design and intention is what a Conlang Repair Technician repairs.

There are about 900 conlangs listed on various websites, and more keep coming. Dothraki, a language of Westeros in George R. R. Martin’s Song of Ice and Fire series is a recent addition to the corpus, as is Na’vi from the film Avatar. But of all this cornucopia of conlangs, surely the easiest to learn is the delightful “minilang” called Toki Pona, the “Language of Good.” Hardly the kind of extremolang we’ve been discussing in these pages, it is nonetheless a marvelous example of what you can do once you realize the vast potential of language as a plaything. And like any language, nat- or con-, Toki Pona is also a window into the human mind.

In doing my repair work I stick closely to Pu, the “official” book about Toki Pona written by its creator, Sonja Lang. jan Sonja, as she is known to her tribe, has written a marvel of scholarship, creativity, and playfulness, inspired—uniquely, as far as I can tell—by the teachings of Lào Zi, the (perhaps imaginary) author of the Dào Dé Jīng. Toki Pona is an “a posteriori” conlang, meaning that its vocabulary and grammar come from other languages. About 30% of the lexicon is borrowed directly or indirectly from English, and the derivations are quite obvious. For example, jaki is translated in the Pu dictionary as “disgusting, obscene, sickly, toxic, unclean, unsanitary”: in a word, yucky. On the other hand, “you” is sina and “strong” is wawa, both of which are Finnish, and pana is “give,” from KiSwahili pa-ana, “give-each.other.” Facebook’s Toki Pona group hosts extensive discussions on the sources Lang used for her vocabulary, and unlike Esperanto or Lojban, Toki Pona’s sources are quite far-ranging.

Like many conlangs, Toki Pona is intended to have a positive effect on the minds of its speakers. The reader already knows my take on Sapir-Whorfism. But really, the “point” of Toki Pona—if a language needs to make a point—is not to conduct an experiment, or to prove a theory, but to have fun. And jan Sonja certainly sets a good example. In referring to her book as Pu she makes a wonderful pun: pu in Classical Chinese is the “uncarved block” of Taoism, the fundamental, unspeakable ground-of-being. But to native English-speakers of a certain temperament, Pu also opens the gateless gate of verbal mischief:

(1) mi wile musi kepeken Pu

I want play using Book

"I like to play with Pu.”

Which, of course, is what I’m doing right here. But more practically, in this chapter I do four things with Toki Pona:

1. interpret one principle

2. discover two morphemes

3. erect three rules, and

4. make four morphemes out of two.

This is not to say that Toki Pona needs fixing: it doesn’t. It already has what it needs to attract a fairly sizable online fanbase: 3,700 members of the Facebook group and counting. But as I develop my arguments in this chapter, I hope it will become clear that, while Toki Pona is a “language of good,” in certain ways it can be better. And your Conlang Repair Technician can make it better without changing a single one of its major features.

One

What does Pu mean by its repeated references to “simplicity”? The “simplicity” jan Sonja intends is a simplification of thought:

Training your mind to think in Toki Pona can lead to deeper insights. If many of life’s problems are created by our excess thoughts, then Toki Pona filters out the noise and points to the center of things. (p. 12)

How is this done? “The wisdom of life consists in the elimination of non-essentials,” as the 20th-century Chinese philosopher Lin Yutang is quoted as saying on Pu’s page 80. Eliminate the “clutter” from a language and what is left will help its speakers have “simpler” minds, more calm, honest, and collected. But what part of language is to be simplified? In a nimi: vocabulary.

There are only 120 or so words in Toki Pona. With so small a word-hoard each item must cover a lot of semantic territory. Toki can mean “communicate, speak, say, talk, use language, think,” while pona translates “good, positive, useful, friendly, peaceful, simple.” The stretch from “good” to “simple” is quite broad, and embodies a very particular view of “goodness,” but jan Sonja isn’t out to construct a “neutral” interspeech like Esperanto or a “logical” tongue like Lojban or Ithkuil. Her intent is to boil down the word-soup of language until only the bones remain, the essential concepts necessary for basic human communication. In training the mind to think in these concepts the fundamental simplicity of reality might then reveal itself whenever you speak. You can’t say “credit default swap” in Toki Pona, but you can say jan li suli mute, mani li suli lili, “people are of much importance, money is of little importance.” (p. 73) This does not seem to me to be a restriction of thought a là Newspeak but an elegant statement of fact: even a Wall Street hustler could understand it, once past capitalist duckspeak.

But Pu cannot get past a basic fact of human speech: any vocabulary of a human language, no matter how small, is bound to contain boatloads of hidden complexity. Recall Chomsky’s remarks on the “richness” of the lexicon:

Internal conditions on meaning are rich, complex, and unsuspected; in fact, barely known. The most elaborate dictionaries do not dream of such subtleties; they provide no more than hints that enable the intended concept to be identified by those who already have it.

As a demonstration of this, take pona as it is used in the common phrase jan pona. Picking out the “friendly” definition of pona, this word is used by the Toki Pona community to mean “friend.” But why this meaning? It could just as easily mean:

(a) “simple person”—one who is genuine, unaffected, honest; or

(b) “peaceful person”—someone like Gandhi, MLK, Kabir, or the Dalai Lama; or

(d) “positive person”—an optimist, one who looks on the bright side; or

(e) “useful person”—someone who can be counted on, who is dependable.

Of course, you could argue that a “friend” is any or all of these things. But jan as used in jan pona appears to pick out one particular meaning from those listed in Pu—the meaning “friendly”—and allows the other meanings to hover in the background. Is this selection process “simple”? And what rules or principles guide speakers in making it? And why translate “friend” as jan pona and not as, say jan pi pilin mi, “person of my heart,” or jan (lon) poka (mi), “person (at) (my) side”?

Or take a more serious issue: the grammaticalization of nurture. Suli is translated “big, large, heavy, long, tall; important; adult.” Its compliment lili, however, means “little, small, short; few; a bit; young.” If a jan suli is an important person, is a jan lili unimportant? The word for “parent” is mama, but the word for “child” is . . . jan lili, a compound. Why this asymmetry? There are a number of words for being in charge—mama, lawa, kute—but no words for a charge, for someone under the care of another. Kute means “hear, listen, pay attention to, obey.” All examples in Pu of kute as “obey” refer to children obeying their parents. Do parents ever kute their children? What Inuktitut expresses with -gi- is apparently beyond Toki Pona’s power. Whether it should be or not is beyond the power of a mere CRT. I raise the issue of grammatical nurture to show that the “simplicity” of Toki Pona’s lexicon masks complex presuppositions that are nowhere made explicit. If Chomsky is to be believed, such complexity is as natural as the Way of Heaven, and as inescapable.

Syntax, however, is another matter. The syntactic rules of Toki Pona are refreshingly simple. Basic order is SVO; heads come before modifiers; there are prepositions and a postposition, and a half-dozen form-classes. Simple indeed—but could it be more so? In Chomsky’s current grammatical model, his “Minimalist Program,” the only syntactic processes are merge and move, and all word-order phenomena are explained by just these two. At least to Chomsky, what is “simple” about language is not what most of us would guess after being exposed in school to the “spiders from Mars,” those stick-leggedy parsing trees. It’s word order and relations between words that are the easy part of language, and which should be the easy part of any conlang.

In keeping with the Chomskian view, my first move as Toki Pona CRT is to nail down what “simplicity” means as a conlang design principle. It means this:

Principle of Lexical Complexity

Whenever possible, move complexity out of the syntax and into the lexicon.

I call this principle “LexPlex” for short. It is the foundation of almost all the “repairs” I make to Toki Pona.

Two

“Officially,” Toki Pona has only those morphemes listed in Pu, plus the occasional newcomer accepted by the online kulupu pi toki pona such as the recent kipisi, “cut” (from Inuktitut kipi-, plus antipassive -si-). However, language can be sneakier than even its own inventor might imagine. For Toki Pona’s vocabulary is haunted by—gasp!—invisible morphemes, ghostly presences rising from the darkness of the Uncarved Block. “Looked for, they are not seen; listened for, they are not heard; reached for, they cannot be grasped.” But Lào Zi did not know the master science. These intangible morphemes are quite graspable.

Something from nothing

In the first iteration of Toki Pona published in 2001 there were only three numbers: ala (zero), wan (one), and tu (two). Later, three others were added: luka (five), mute (twenty), and ale (one hundred). Numbers other than these are built up by compounding, exactly as is done in Dyirbal, Inuktitut, or Pawnee. However, numeral compounds are unique, made by a process that is not found elsewhere in the language. Most compounds in Toki Pona are of the head-modifier type: luka wawa, “hand strong,” i.e., a strong hand. But when luka is used as the number “five” it does not serve as the head of a modifier: luka tu does not mean *“five of two” but “five and two.” The silence that links numbers into bigger numbers is not the silence that links non-number words into phrases. Where there is meaning there is morpheme: there is a meaningful distinction between these two silences, therefore the silence between numbers must represent a zero morpheme, ø. (We’ll get to the silent “of” later.)

ø in Toki Pona does not function quite like the Boolean and used in logic. In the Lord of the Rings, when Barliman Butterbur tells of a battle with robbers, he does not give the number of casualties as “five” but “three and two”: three Men and two Hobbits. In Bree, good fences make good neighbors, so lumping Men and Hobbits together under a single number just isn’t done. If good master Barliman had spoken Old Toki Pona he’d have counted Merry’s ponies as tu tu wan, but he’d have counted the robbers’ victims as tu wan en tu. That is, he’d have kept the functions of ø and and discrete.

Unlike and, ø is non-commutating: it orders the numbers flanking it in such a way that the larger number is always to its left, the smaller to its right. Though wan tu makes as much sense as tu wan arithmetically, tu wan is the only term for “3” given in Pu. Similarly for all the other numbers listed: the number to the left is always ≥ the number to the right: “13” is luka luka tu wan, and so on. Using one instance of each number, the biggest number you can make is “128,” and it must be stated thus: ale mute luka tu wan.

Speakers of Toki Pona do not consider coining new words to be appropriate: o weka e nimi namako, “avoid new words,” to quote a member of the Facebook group. But ø is not namako: it has always been present, though by implication. There are still only 120+ nimi in Toki Pona, if by nimi is meant “(audible) word,” as opposed to “morpheme.” I suspect jan Sonja didn’t list ø in the dictionary simply because she was used to grammars that don’t use zero-morphemes. (English grammars don’t usually describe the singular suffix -ø in “girl-ø.”) Instead of describing ø she wrote a traditional chapter on the rules for making numbers. All I’m doing with ø is folding the rules in Pu’s lesson 12 into a single morpheme. Thus I take complexity out of the grammar and put it into the lexicon, which is where, under the LexPlex principle, complexity belongs.

The sentence initiator

There is one more silent morpheme I would like to recognize for Toki Pona, and that is σ, the sentence initiator. This morpheme (technical name “sigma”) places a beginning and an end to any sentence—and the end is just a new beginning. Its most important job is to set boundaries around any sentence so that the words within it can be more effectively interpreted. Despite its abstract nature it can be pronounced: it is the longest of Toki Pona’s three pause phonemes, short (μ), regular (#), and long (σ).

To see what σ can do for us, let’s look at la. Pu describes the word la as “very powerful. It allows you to link two sentences, or link a fragment to a sentence.” (p. 51). It is also said to “separate context from the main sentence.” The examples show that anything preceeding la is used either as an adverbial phrase (tenpo ni la, “now”) or a dependent clause such as the kind made in English with “if” or “when.” So far, ale li pona (“all is good”).

Trouble arises when we try to tell what la is doing in running text. As pause phonemes of any length tend to be elided in ordinary speech, there are several possible interpretations to (2):

(2) jan Sili li lape lili lon supa tenpo suno pini la jan Sili li pona e tomo

“Sili napped on the sofa. Yesterday Sili tidied up the house” (p. 61) or

“If Sili napped on the yesterday sofa, then Sili tidied up the house” or

“Sili, having finished napping on the day sofa, Sili tidied up the house”

The meaning of (2) hinges crucially on the presence or absence of what in writing is a period and in hearing is a long pause. In fact, jan Sonja places a period between supa and tenpo, thus making the first translation of (1) the only one that is plausible. But that period represents a meaning, and where there is meaning there is morpheme. Therefore, the long pause shown in writing with a “.” is in Toki Pona a morpheme, σ.

La is two-sided, or two-faced, if you will: its use extends to the constituents on either side of it. σ puts boundaries on la, limiting its scope such that, for example, the second and third translations of (2) are not allowed. In the examples in lesson 14 jan Sonja puts a comma before la, which tells me that in her mind la has this quality of “binary scope,” and that the constituent to the left is a modifier, the constituent to the right a head. And what is a “constituent,” as far as la is concerned? Anything between itself and σ.

Sigma can do something else for the grammar, and that is eliminate the rule that deletes li after mi and sina. The deletion rule can be rewritten as a formula within li’s lexical entry, so:

(α) li → $ / σ {mi, sina} ___

Unlike ø, the silence of li after mi or sina—symbolized “$” for “empty” because the morpheme’s sound has been “emptied” out of it—represents the process called “dropping.” $ is not a separate morpheme but an “allomorph,” another version of a morpheme, as the -s, -z, and -ez plurals in English are allomorphs of a single morpheme. Dropping in Spanish allows $ te amo for yo te amo. Many languages have this option because the verb agrees with the pronoun, and these are called, logically enough, “pro-drop” languages. Empty and zero require distinct symbols because they represent distinct functions: $ is a variation of a morpheme, or of several morphemes, whereas ø is a morpheme in itself.

(α) is a formal way of saying what the dictionary says about the use of li: it stands “between any subject except mi alone or sina alone and its verb” (p. 128). The appearance of σ in (α) merely makes the description simpler. It accounts for the absence of li in mi toki but its presence in sina en mi li toki (“you and I talk”) or in tomo sina li namako (“your house is new”). It is not necessary to make a syntactic rule for this, as all (α) describes is the behavior of li in a particular context, and such “behavior” is a part of its meaning. And all meaning belongs in the lexicon.

Three

There are only two kinds of words in Toki Pona, function and content. All human languages, con- or nat-, have at least these two word-types; the distinction goes back to the Classical Chinese grammarians, who divided their vocabularies into “empty” words and “full” words. These two basic word-types allow us to recognize three kinds of basic grammatical relations:

between function and content: the phrase relation,

between two function words: the scope relation, and

between two content words: the modifier relation.

Once we’ve accounted for these three relations we have written the grammar of Toki Pona.

The grammar described on Toki Pona’s Wikipedia page lists ten rules of syntax. This sounds pretty simple: Esperanto has a whopping 16, and the grammars of most natlangs have dozens if not hundreds. But your CRT can do better, if by “better” you mean “fewer.” Here is my “repair” of the syntactic rules of Toki Pona:

1. Rule of Templatic Syntax:

All well-formed strings have the underlying form (function word + content word)*

Def. A f(unction-)word is an Î {anu, e, en, kepeken, la, li, lon, o^L, o^R, pi, sama, tan, taso, tawa, μ, σ, ø}

Def. * = may be repeated as desired, indefinitely

Cor. redundant words are deleted at the surface

2. Rule of Nested Hierarchy:

F-words group c(ontent)-words into phrases that nest in the Scope Hierarchy:

σ < |la^B, o^L| < en < |li, o^R| < e < {kepeken, lon, sama, tan, tawa} < anu^B < pi < μ

Def. “_” = “must be included in all sentences”

Def. “ < ” = “includes under its scope” or “is interpreted after”

Def. “|…|” = “chose one and only one per sentence from this set”

Def. “{…}” = “chose as many as needed per sentence from this set”

Def. all f-words have rightward scope unless otherwise indicated

3. Rule of Structural Conjunction:

Boolean and is formed by the repetition of any Î {e, en, li} within its own scope

Corr. function scope is binary in this context

Like the ten rules listed on Wikipedia, the Three Rules listed here provide instruction on how to make and interpret grammatical utterances in Toki Pona. Unlike the Ten, the Three do so in a way that is easier to memorize and is logically more powerful.

The template

I have borrowed the notion of “template” here from the North Afro-Asiatic languages, where it refers to morphology, not syntax. A “templatic morphology” is one that makes words the way Arabic, Berber, or Hebrew do, with a string of consonants that carry the meaning and vowels inserted to distinguish, for example, the Hebrew katav, “he wrote” from katvah, “text,” or kotev, “writing.” My “templatic syntax” of Toki Pona means that a small number of function words fit into the “template” to form the backbone of the sentence, with content words inserted between them to flesh it out. Now let’s try making a sentence: “I want a drink of water.” We start with our template:

f + c + f + c + f + c . . . .

and some vocabulary:

e, object noun phrase (f)

en, subject noun phrase (f)

li, verb phrase (f)

mi, I/me (c)

moku, eat/drink (c)

telo, water (c)

wile, want (c)

μ, head-modifier relator (f)

σ, sentence initiator (f)

All those words marked “(f)” are members of the function-word class, listed under the definitions of Rule 1. All words not in the f-word set are content words by default, as function and content are the only two options for defining the syntactic role of a word. Two f-words, en and μ, we haven’t got around to explaining yet; we’ll define them more precisely later. The Scope Hierarchy of Rule 2 orders the f-words so that they must appear in a particular sequence. Also, function-words can only occupy the f positions in the template, so there are gaps between them:

f c f c f c f c

σ . . . en . . . li . . . e . . .

E has no equivalent in toki Inli (“English”): it is used whenever a c-word follows another c-word that in turn follows li or o. If e is present the following c-word is the object of the sentence; if it is not, the following c-word is an adverb modifying the preceeding verb:

(3) ona li toki ala

she predicate talk not

“she did not talk” (but did something else)

(3’) ona li toki e ala

she predicate talk object not

“she said nothing” (but remained silent)

The distinction between sentences like (3) and (3’) is central to Toki Pona’s grammar. It sometimes causes trouble for learners whose native tongue has no marker of the direct object. Speakers of Biblical Hebrew (‘eth) or Hawai’ian (i) would have no trouble with e.

Having decided that “I” is the subject of our sentence, “drink” and “want” are the verbs, and “water” is the object, we plug these into the relevant c-positions in the template. That is, we put “I” directly after en, the verbs directly after li, and “water” after e, like this:

f (c) f c f c (f) c f c

σ . . . en mi li (wile, moku) e telo

We’re not done yet: there is still the gap between σ and en, plus we have to find the missing f-word between wile and moku. The mystery f-word must be under the scope of li or by Rule 2 it won’t fit into the ordering of the f-words we have already. Also, it must be consistent with the meanings we have chosen—it must be on our list. The only morpheme that fits both criteria is μ, so we plug it in. This morpheme lets us know that wile is the head of the li phrase and moku is modifying wile: “want” is what we’re doing, and “consume” restricts the meaning of “want” to a particular type of wanting: a wanting to consume something. Now, after inserting μ and applying (α), we have

σ . . . en mi [li] wile [μ] moku e telo,

with brackets around words that are present but silent (li because it has been dropped, μ because it is inherently silent). The only remaining piece of the puzzle is what to do with the c-word missing between σ and en. There’s a hole in the template that needs to be filled, but no meaning available to fill it. When this happens in a natural language we postulate the existence of what is called an “empty category.” In English, the sentence “he would like you to come” does not contain an empty category because all the syntactic slots are filled. The sentence “he would like to come” does contain an empty category: the unpronounced subject of the verb “come.” In other words, an empty category in English is a noun without a pronunciation, but which you can guess is there because the sentence’s meaning and grammar require it.

Toki Pona’s empty category is a c-word with no pronunciation and no meaning. It is simply a place-holder, as “0” is a place holder in numbers like “10” or “1001.” I’ll list it here as “ø^C” to distinguish it from “ø^F,” the compound number-maker. Now we can write our sentence as:

[σ] [ø^C] en mi [li] wile [μ] moku e telo

We still have one more step to go. In all known examples of Toki Pona text, en never appears at the beginning of a sentence. To account for this we write a formula similar to (α):

(β) en → $ / ø^C ___.

This does two things at once: it explains the lack of sentence-initial en, and it restricts what we can do with ø^C. (β) implies that whenever a sentence does not contain an initial en, en and ø^Care both present but silent: ø^C is inherently silent, and en has been dropped. (β) preserves the (f + c) template while explaining why the sentence as pronounced/written appears to violate it. And with these various steps and procedures followed we now have our surface output, with silent morphemes left unwritten:

(4) mi wile moku e telo.

There! Wasn’t that simple?

No? Then don’t bother with any of the stuff I’ve just described. Seriously: if your native language has features similar to Toki Pona’s—short words, few inflections, an SVO core syntax, no ergativity, no evidentials, a language like English, Mandarin, or /Xam—then you might be better off playing with Pu. If “simplicity” for you means “don’t make me memorize a bunch of stuff” then you can learn and speak Toki Pona quite nicely by fitting it to your preconceptions: that the subject comes before the verb and the object after it, and so on. There will be a price, of course: by going the easy route and not digging deeper into what makes Toki Pona tick, you’ll miss out on methods of analysis you can apply to other languages, including your own. You’ll speak the language, you’ll comprehend the language, but you won’t understand it. In other words: you’ll miss out on a lot of cool stuff. Your loss, ya big linguistic weeny!

You know it’s not enough for a chemist to mix baking soda and vinegar together and watch them fizz—she’ll want to know why bases and acids neutralize each other. And that means atoms and molarity and valence shells and what “pH” means. And as any chemist will tell you, atoms are just plain weird. The same is true of linguists: it’s not enough to memorize rules or write descriptions. Why do subject and object come before the verb in Tibetan, but after it in Hawai’ian? Why does KiSwahili have prepositions but Japanese has postpositions? Why does Mandarin have no adjectives and English no evidentials? There’s no way to answer those questions without entering the Temple of Supreme Weirdness: the mind. In the split second between thought and speech something amazing is happening, something that happens nowhere else in the universe, and linguists just love being amazed. All the technical jargon, parsing diagrams, and arcane theorizing has one goal: to explore the most amazing thing in the known universe. Your mind. jan Sonja designed her language to do just this.

Complexity is a relative term. What is “simple” to a speaker of English may be anything but to a speaker of Salish. The reverse is also true: Pu spends a lot of ink showing us how to make nouns into verbs, verbs into adjectives, and so on. Chief Seattle would have needed no such guidance.

Scope

The key concept of Rule 2 is “scope.”All function-words in Toki Pona (and for that matter, in all other languages, nat- or con-) combine with content-words to form phrases. A “noun phrase” is one that is initiated by a nominal f-word such as “a” or “the” in English, ka or nā in Hawai’ian, or e or lon in Toki Pona; similarly for verb phrases. Because the affected content words follow them we say such f-words have “rightward scope.” Other f-words establish a phrase by standing at the end of a constituent rather than at its beginning: in Toki Pona, o makes a vocative phrase by following a noun. These functions have “leftward scope.” A few f-words such as anu (“or”) make phrases out of the words on either side of them: they have “binary scope.”

This is how f and c relate to form phrases. But how do phrases relate? We saw how when building (4): the f-words in phrases (and therefore, the phrases themselves) must follow one another in a particular order. The li phrase must follow the en phrase because li is lower in “rank” than en, as symbolized by the “<” in Rule 2. Another way to say this is that an en phrase includes a li phrase within its own meaning; similarly, a li phrase contains an e phrase within its meaning. When one phrase is including within the meaning of another we say it is “nested” within it. A sentence is like a Matryushka doll, constituents inside of other constituents like dolls inside of dolls, and getting smaller the deeper you go from sentence to clause to phrase to word.

Having the concept of “scope” under our belts, the “power” of la mentioned in Pu (p. 51) can now be more precisely defined: it is the second-highest-ranking function-word in the hierarchy. This means that it can cover a lot of territory: it can include more than one phrase within its scope. The only f-word of higher rank is the highest-ranking morpheme of all, σ.

Three²

And since we’re dealing with threes in this section, there are three morphemes in Toki Pona that change their meanings by changing their scope. These are:

e^R, the direct object preposition, but e^B, the object phrase and

en^R, the subject preposition, but en^B, the subject phrase and

li^R, the predicate marker, but li^B, the verb phrase and

Besides marking the direct object, e is used to add another object to an object phrase:

(5) ona li seli e soweli e pan

s/he predicate fire object animal object grain

“she cooked the hares and some rice” (p. 61)

The first e in (5) serves to mark off the object(s) of the sentence from the subject and the verb. Because it applies only to the words following it, it has rightward scope. But the second e applies to the words on either side of it: it has binary scope. But the second e has changed its meaning, from object-marker to conjunction, and changed its place in the Scope Hierarchy, being under the scope of the first e. It seems that we must “split” e into two morphemes here, as the two uses—object-marker and conjunction—differ in meaning and in syntax. But e is not the only morpheme that acts this way.

En is an interesting morpheme as it is part of a rather drastic asymmetry in Toki Pona’s lexicon. Of the Boolean (“logical”) operations and, not, and or, the disjunct and negative operations are each expressed with a single word: not is ala, and or is anu. The conjunct operation, on the other hand, is represented by three words:

e: within direct object phrases, as we have just seen in (5)

en: within a subject phrase, the only function of en mentioned in Pu (p. 56, 57)

li: within verb + verb constructions to indicate “and also,” or “and then”

En is only used in subject phrases, and is used in exactly the same way as e is used in object phrases. That is, to express “you and I washed” you say sina en mi li telo, but to say “washed clothes and dishes” is li telo e ken e ilo moku, not *li telo e ken en ilo moku. To link verbs rather than nouns it is li that is repeated: li telo li seli, “cleaned and cooked.”

E is basically a preposition forming direct object phrases. We can also look at li as a “preposition” forming verb phrases. It follows, then, that we can interpret en as the marker of the subject phrase: the subject preposition, as it were. Interpreting en as a subject marker allows us to replace the syntactic notion “subject” with a lexical notion “subject marker,” eliminating the need for a separate rule to define what a “subject” is. In this way we are able to move another bit of complexity into the lexicon. The location of an en-phrase can be fixed in front of the predicate (that is, the li-phrase) by letting en have scope over li, and this is what is done in the Scope Hierarchy described in Rule 2.

Interpreting e, en, and li as phrase-markers and as conjunctions means there is no overt and in Toki Pona: and is expressed structurally, by the repetition of the appropriate f-word. This is not such an odd notion: Mandarin has a structural or, formed by repeating the verb with a different object. Although Pu does not explicitly license other forms of and, I would think we could use any right-handed f-word as a conjunction:

(6) ?mi tawa tomo esun tawa tomo lipu

I (go) to building business to building book

“I went to the store and to the library”

(7) ?mi lon telo suli lon poka telo

I at water big at side water

“I am at the sea, (specifically) at the shore”

(8) ?sina pali e ni kepeken ilo palisa kepeken ilo kiwen

you do object this use tool wood use tool stone

“you made this using tools of wood and tools of stone”

(9) ?ona li tawa sama waso sama kala

s/he predicate go like bird like fish

“she went gliding like a bird, and also like a fish”

(10) ?o tawa o pali

command go command do

“go do it!”

I do not feel that such an extension is within my competance as a mere repair tech, so I must leave it to the Toki Pona community to rule on this matter. If the community is pleased to find that such sentences as (6) - (10) are grammatical, all we have to do is change {e, en, li} in Rule 3 to “any f ^R” and we’ve covered all bases. If we wished, we could even eliminate Rule 3 entirely and instead cover the conjunctive uses of e, en, etc. by making lexical rewrite formulae, so:

e → and / e … ___ … {preposition, σ}.

By erecting Rule 3 as I do, I take some complexity out of the lexicon and put it into the grammar —but I do this in the name of ease of memorization, and to point up the similarities in the uses of e, en, and li. Rule 3 as stated also makes extensions as those shown in (6) - (10) easier to accommodate, should the kulupu pi toki pona so desire.

Four

It is the custom in conlang repair to spend lots of time futzing with function morphemes: critics of Esperanto, for example, tend to dislike the accusative case-marker -n. However, I don’t want to eliminate anything from Pu, nor do I wish to change any established meanings. What I want to do is tease apart meanings that don’t belong together. There are two morphemes in Toki Pona that can each be seen as performing two discrete roles, but which differ in ways that cannot easily be captured within rules. I propose to split each of these morphemes in two:

o splits to becomes o^R (irreal mode) and o^L (vocative case)

pi splits to become pi^R (genitive case) and μ^B (modifier relation)

Once again: I am not conjuring morphemes out of nowhere. I am finding morphemes that have always been there, and writing descriptions of these morphemes that are consistant with what is already known about the language.

O^L and o^R

According to Pu, the f-word o is used in three contexts: “1. after a noun phrase to show who is being called or addressed; 2. before a verb to express a command or request, and 3. after the subject (and replacing li) to express a wish or desire” (p. 41)

The first use of o is quite different from the other two. It is used to indicate a nominal case, the “vocative.” This is a case used in Latin, Sanskrit, Hawai’ian, and other languages to hail someone: “Hey!” or “o Such-and-so!” Unlike all other case-markers in Toki Pona the scope of vocative o extends leftward, over the words that preceed it. In this it patterns with la, which is also left-handed and also terminates its scope at σ. From the examples I’ve seen, o^L and la are in complementary distribution, and so may be grouped together in the Scope Hierarchy: |la, o|^L.

There is a single technical term to cover uses 2 and 3 of o, and that is “irreal.” This is the verbal mode used to indicate that the event in question does not represent something describable with the S-prime know. Instead, an irreal event is something represented by feel, if, think, want, or not.want. Irreal events include the future tense (“will”), desideratives (“wish that,” “intend to”), conditionals (“if/then,” “assuming that,” “might be”), subjunctives (“would that it were,” “that X may”), jussives (“let’s”), and commands (“you must,” “do it!”). My view of o^R as an irreal marker is reinforced by the use jan Sonja makes of it in her translation of a Bahá’í prayer:

(11) I bear witness O my God, that You have created me to know You and to worship You.

sewi mi o! mi toki wawa e ni: sina pali e mi tawa seme? mi o sona e sina. mi o olin e sina. (p. 89)

divine my vocative I say strongly object this: you make object me for what? I irreal know object you. I irreal love object you.

It seems here that o has a subjunctive sense, and should be translated “that I may.” That is, the speaker does not yet know God, but feels and wants that she might, or should. Li and o^R are in complimentary distribution, and in the Scope Hierarchy are grouped together into a set like this: |li, o|^R. If o before a verb is the irreal marker, then it follows that li marks the real mode, statements asserted as known. As li may be dropped but o may not, we can say that real is the default mode: all sentences in Toki Pona are assumed to describe the world as it is known unless we are given to believe otherwise by o^R.

Pi and μ (and la)

The most common binary-scope function word in Toki Pona is pi. In any phrase word₁ + pi + word₂, pi lets us know that word₁ is the “head” of the phrase and that word₂ modifies word₁ in much the same way that an adjective modifies a noun or an adverb modifies a verb in a natlang like English. Between two nouns, pi translates the English word “of.” However, pi is not used in phrases with only two c-words: to say “a good person” you do not say *jan pi pona, “a person of goodness.” Pi is only used when three or more c-words occur in a phrase:

(12) jan pona toki pona: “a friend good to talk to”

(13) jan pona pi toki pona: “a proponent of Toki Pona”

Example (12) shows how the modifier relation operates in strings of three or more words: “When another word is added to a noun phrase, it describes the sum of all previous words” (p. 44). That is, in any sequence of content words, the last is interpreted as a modifier of all the words that preceed it; the second to the last modifies the words that preceed it but not the word it preceeds, and so in. That is, the “scope” of a modifier is all the content words to the left of it, and the modifier scopes “nest” like this: (((head + modifier) + modifier) + … ). If the head is what we are used to calling a “noun,” the modifiers are “adjectives”; if the head is a “verb” the modifiers are “adverbs.” The underlying structure of (12) looks like this:

(12a) (((jan pona) toki) pona)

We go from left to right: jan means “person” and pona means “good,” so we have “a good person,” i.e., a “friend.” Adding the next word toki gives us a “friend speaking,” or “a friend who speaks.” The final pona modifies all that comes before it, so we have “a good speaking-friend,” a friend who is good to speak to.

Adding pi to the mix changes things. We start with (13) as we did with (12), interpreting the head-modifier phrases first. This time, however, whereas toki and pona₂ are linked, pona₁ and toki are not, and the bracketing looks like this:

(13a) ((jan pona) pi (toki pona)).

Pi not only separates jan pona from toki pona, it treats each h-m phrase as a unit that may in turn be bound together into another unit, a pi-phrase. The constituent to the left of pi is the head, and the constituent to its right is the modifier. Pi thus creates a head-modifier phrase, but at a higher (that is, more inclusive) level than the head-modifier phrases with nothing between the words.

Or is this nothing a something? That is: do we need a rule that says “an adjective follows the noun,” or “when another word is added to a noun phrase, it describes the sum of all previous words”? There is another option, the LexPlex option: split pi into two morphemes, one of which forms h-m relations on the deepest level of the syntax, and another which does the same thing but on the next higher level. The higher-level morpheme is pi itself; the other is a morpheme that establishes the deepest-level head-modifier relation, a silent morpheme which I call “mem” and symbolize as “μ.”

Both pi and μ can be translated “of”: toki μ pona, “language of good.” Where pi and μ differ most crucially is in their ability to nest: μ-phrases can nest inside of pi-phrases, but pi-phrases cannot nest inside of μ-phrases: μ is lower on the Scope Hierarchy than pi. Also, μ can nest under itself: like and it is “recursive”:

(12b) (((jan μ pona) μ toki) μ pona)

We see that phrases differ according to the presence of absence of pi, and that we can capture this difference by interpreting the absence of pi as the presence of μ. But μ and ø^F also differ: though both are silent, non-commutating, and of binary scope, μ binds any adjacent c-words into a h-m relation, whereas ø^F binds adjacent words into an and relation and appears only between words used as numbers. We may also say that la enacts the h-m relation but on the level of the clause and in the opposite direction: any string of words after σ and before la is a dependent clause (i.e., a modifier-clause), while anything after la and before σ is an independent clause (i.e., a head-clause). La, μ, and pi all do essentially the same thing but at different levels and in different directions.

Mem has a pronounciation: it is a short pause between words, shorter than the pause between f and c that occurs in li sona or e jan. Compound words are formed by deleting μ: jan μ pona, “a good person,” jan pona, “friend.” That is, jan pona as “friend” is treated by the lexicon as a single word, not as a string of two c-words (regardless of how it is written).

Details

In addition to the four “repairs” I describe above, there are other changes that can be made to how words in Toki Pona are described within the lexicon. By “lexicon” here I mean not just a listing of words and meanings (as found in Pu, pp. 125-134), but labels accompanying words to help describe their syntactic behavior. For example, a detailed lexicon of Toki Pona would specify which words are functions, and the scope of each. We can do more than this, however.

Exclamations can be listed within a rule, but Lexplex would suggest they be specified in the lexicon. Any word from the set {a, ala, ike, jaki, mu, o, pakala, pona, toki} may be used as an exclamation. Most of these words change their meaning when so used: μ pakala, “broken,” σ pakala, “sorry!” I propose including seme, the question-marker, in this set, with the meaning “huh?” “what the?”

The only form-classes in Toki Pona are function-words and content-words. The labels “noun,” “(pre-)verb,” “adjective,” “adverb,” and “preposition” are unnecessary, as any c-word may play any of these roles depending on what f-word it stands under:

toki =

“hello!” / σ ___ σ

“say” / |li, o| ___

“language” / {e, en, kepeken, lon, sama, tan, tawa} ___

“linguistic” / μ ___

“speaking of” / ___ la

sona =

“know (something)” /|li, o| ___ e

“know how (to do something)” / |li, o| ___ μ

To make things more convenient we can use the traditional labels to indicate sets of words of related function. The set {e, en, kepeken, lon, sama, tan, tawa} can be replaced with {preposition}, the set |li, o|^R with |mode|, and so on. We can also rearrange items into different sets as needed the way I do in Rule 2, where I specify e and en separately from the other prepositions to allow subject phrases to preceed the verb and objects to follow it.

You’ll recall my argument about the phrase jan pona and why it means “friend” and not, say, “buddha.” We can make use of the power granted us by lexical rewrite formulas to narrow down the meaning of any word according to its context:

pona =

“friend” / jan + ___

“simple” / toki + ___

“good” / lape + ___ (“good night” = σ lape pona σ)

and so on.

Some words in Toki Pona are “underspecified” for class. That is, they are inherently neither f- nor c-words, but take on their class assignment according to where they fit in the template: lon before a c-word is a preposition (i.e., an f-word) meaning “at,” but before an f-word it is a c-word meaning “exist,” among other things:

lon =

“at” / ___ c-word

“exist” / |li, o| ___

“true” / μ ___

“yes” / σ ___

It sometimes happens that when lon and tawa are used as verbs the underlying noun-phrase that follows will begin with the same word used as a preposition. That is, “to be at” is li lon lon and “go to” is li tawa tawa. You could say that such expressions are “overspecified”—they contain more information than necessary. Whenever this happens the redundant preposition is dropped:

(γ) {lon, tawa} → $ / {lon, tawa} ___

A similar rule drops o^L when it appears before o^R. Redundancy is incompatible with simplicity so away it goes, into the great void of $.

Several sections ago I claimed to discover two silent morphemes in Toki Pona, the f-words ø^F and σ. However, I then proceeded to sneak in a third silent morpheme, ø^C, to represent the empty category. My perspicacious readers undoubtedly discovered this deception many pages ago and have been chuckling in anticipation of seeing my comeuppance in The New York Times Book Review. But your author will get the last laugh! ø^F and ø^C are in cold fact a single zero, whose apparent duality comes about because it is unspecified for the f/c distinction. Just as suno means “sun,” “sunny,” or “shine” depending on context, so ø takes on one or the other role according to contexts specified in the lexicon:

ø → ø^F / larger number ___ smaller number (additive function)

ø → ø^C / σ ___ en (empty category)

We don’t have to specify ø as f or c until the template demands it, and until it does, ø, like lon, is a single morpheme despite the reader’s detective genius. You’ve got to get up early if you’re going to get the drop on the Cunning Linguist! A a a!

Mem is not the only f-word that can appear inaudibly between two c-words. Consider this sentence:

(14) sina toki ala toki e toki Inli?

you talk not talk object talk English

According to what we know of silent morphemes and of dropping we can deduce that sina toki has the underlying form sina li toki and that toki Inli has the underlying form toki μ Inli. But what of the three c-word string toki ala toki? “Not speaking talkatively” might be the translation if the underlying form was toki μ ala μ toki and this were a saying of Master Lào. However, the string verb + ala + verb, where both verbs are identical, is actually the way Toki Pona asks questions requiring a “yes” or “no” answer. (The format is borrowed from Mandarin.) A more accurate translation might be something along the lines of “Do you speak English, or do you not speak (it)?” In other words, there’s a Boolean or hidden in this construction. The logical place to put it is after ala, so that the full underlying sentence is:

(14a) f c f c f c f c f c f c

[en] sina [li] toki μ ala [anu] toki e toki μ Inli?

“do you speak English?”

Anu is dropped from the surface output by the formula

(δ) anu →$ / li c₁ ala ___ c₂, where c₁≡ c₂.

as the ala between the identical verbs makes anu redundant, and thus droppable.

ø^C is not the only c-word that can appear inaudibly between two f-words. Sina, for example, is always deleted before o^R in commands:

(ε) sina → $ / ___ o^R when used as imperative,

or, to state it another way:

o^R → command / sina → $ ___

Every language must navigate between the Scylla and Charybdis of “say what you mean” and “don’t take too long saying it.” (γ), (δ), and (ε) are three examples of how to shoot the rapids. More rewrite formulas can be written for various other contexts, and more uses can be made of ø^C, but I will leave finding these for the amusement of the reader. O musi! Enjoy!

How do you say “What did you do to her?” in Toki Pona? Most English-speakers might come up with something like:

(16a) sina puli e seme tawa ona?

you do object what? to her

This is the translation to be found in Pu:

(16b) sina seme e ona? (p. 32)

Although I don’t think jan Sonja speaks any Australian languages, the use of seme as a verb “do what?” is exactly parallel to the use of the verb wiyamal in Dyirbal. (16b) is a neat example of what you can do with a conlang once you’ve freed yourself from the habit-patterns of your native tongue. And you can do this even if—gasp!—you don’t speak any Native Australian at all.

Sapir-Whorf . . . and beyond

jan Sonja intends her language to be an experiment involving our old buddy the Sapir-Whorf Hypothesis. She set up Toki Pona to be a benign Newspeak, a way of simplifying thought to bring about spiritual (or at least, psychological) insight. Rather than come to love Big Brother, jan Sonja wants us to love the “simplicity” extolled in the Dào Dé Jīng:

More words count less

Hold fast to the center (chapter 5)

It is most important

To see simplicity

To realize one’s true nature

To cast off selfishness

And to temper desire. (chapter 19)

Have little and gain

Have much and be confused (chapter 22)

I believe I have shown throughout The Cunning Linguist that Sapir-Whorfian notions of linguistic determinism simply won’t wash. But I think jan Sonja has accomplished something much more interesting than yet another S-W experiment: she has created a CSM—a conlang semantic metalanguage.

In the chapter “Atoms For Peace” I discussed the S-prime, which some linguists believe is the “atom of thought,” or at least, of language. The best-known Natural Semantic Metalanguage is the one described by Wierzbicka and Goddard, which makes use of about 65 primes. Toki Pona’s 120+ root-words perform very much the same role as S-primes: “Toki Pona is a language that breaks down advanced ideas to their most basic elements” (p. 9). The process of translating from a natlang to Toki Pona is, in effect, a way of performing semantic analysis by means of a metalanguage. There’s a bit of CSM on page 12 of Pu: “What is a ‘bad friend’? The Toki Pona expression for friend is jan pona, or literally ‘good person.’ You quickly realize that a bad friend is a contradiction in itself.” Which you do, that is, assuming you know that pona and ike are antonyms (nowhere described as such in the dictionary, but easily deduced), or that you know that pona is how Toki Pona says good and ike is how it says bad. Knowing these things, the contradiction in *jan pona ike instantly becomes obvious. Perhaps we might even state this as

Rule 4: Non-Contradiction

Two content-words that are antonyms to each other may not modify the same head.

We specify in the lexicon which words are antonyms and thus in complimenmtary distribution: |ike, pona|, |pimeja, suno|, and so on. Contradictions such as Ayn Rand’s “rape by invitation” are tolerated in English, but in Toki Pona it seems they are not only not tolerated, they are not grammatical. In giving us a language in which words represent so directly the “atoms of thought,” jan Sonja may have made her most wonderful contribution to the wonderful world of conlangs.

And speaking of wonderful, let’s see what the Story of the Girl looks like in the Language of Good:

ni la mama meli mi li toki e mi: meli lili li insa e luka ona lon ko seli, ona li pana e ko seli tawa sewi. ona li toki e ko seli, “ko seli lon ma ni la ale pi ona o ante tawa nasin pi sewi pimeja . . .”

I use the compound: ko seli, “fire powder” to mean “ashes.” And nasin pi sewi pimeja, “path of the dark above,” is the Milky Way. Get on the ‘Net, buy Sonja Lang’s lovely little book, and keep the story going.

Tuesday, January 8, 2019