Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Chinese characters are not some kind of alphabet.

Yes, they are. Modern Hanzi are a very bad phonetic alphabet.

While a minority of characters are indeed pure logograms (小,大,田,etc.), most modern Chinese words are two-syllabic. And syllables often don't have meaningful connection to the meaning of the word: 东西 ("east-west" literally, but means "a thing, object"), some characters have lost _any_ semantic meaning in most words (“子”), and many more characters can only be used as a part of another word ("bound forms", e.g. "据").

Classical Chinese was more logographic and less phonetic, but modern Chinese is not really close to it anymore.



> Modern Hanzi are a very bad phonetic alphabet

alphabets, universally have one common property: they are sortable.

I challenge you to sort Chinese characters.

This is an idea from James Gleick's The Information. The Chinese may never be able to invent morse code alone, because encoding Chinese scripts is extremely hard, even today (think of all those massive code-points in CJK Unicode, with dups and errors)

Chinese text on the Internet may have some emulation of phonemes, but it's never systematically standarized. It just borrows some aspects here or there.


> I challenge you to sort Chinese characters.

Chinese characters are in fact definitely sortable. There are multiple keys, the most popular ones being by stroke or by sound.

Example: https://en.m.wikipedia.org/wiki/Stroke-based_sorting


Chinese dictionaries have been sorted in various ways for at least two millennia, but there are some aspects which make alphabetic order sorting simpler:

1. Less ambiguous order: With classic Kangxi radicals for example, it's not always clear which radical to pick, and there is no clear order when there are multiple characters with the same radical and stroke count. There are other, more modern systems out there, but they all have some ambiguities.

2. Phonetic lookup: If you hear a word and don't know how to write it, you can just try to look it up phonetically. Unless the writing system is extremely perverse (I'm looking at you Ongloti, er, I mean English), you can kinda guess how it's written or at least how it starts. With Chinese characters that is not possible. Sure, Chinese dictionaries often have Pinyin or Zhuyin (Bopomofo) indexes, but Pinyin and Zhuyin are alphabets.


good luck dealing with duplicates and hand-written variants.


That's a problem in most alphabets as well. Several Latin letters (and the number symbols we use as well) have significant differences between printed and handwritten versions, and several handwritten versions around (g and z have some of the most variations).


> alphabets, universally have one common property: they are sortable.

Isn't this just an arbitrary order? Why could I not assign numbers to chinese characters and sort them? I know next to nothing about Chinese.


The sort order of the alphabet symbols is arbitrary, but since all of the words are composed of an ordered set of symbols then sorting the words relative to one another is trivial.


> Isn't this just an arbitrary order

yeah but there are very limited number of alphabetical letters and commonly agreed order as a convention.

There's no such a thing in Chinese. For example, you can't easily sort names by A-Z in Chinese except PinYin (or Unicode codepoints for what matters)


Dictionaries written in Chinese exist. They are in a sorted order, just like English dictionaries, so users can quickly look up the word they have in mind.

https://en.wikipedia.org/wiki/Chinese_character_orders


The thing is it's sorted only after PinYin is invented, sorta proves the point.

You can't easily compile an encoding out of it, but for alphabets it's intuitive to invent an index for each letter into dash-dots as morse code. It's extremely difficult to do so for Chinese.

Back to the topic, OP talks about "Character amnesia", if you think Chinese characters as emoji, yeah you talk about actions represented in emoji, but you forgot how it was drawn exactly. You can't sort emoji, and emojis don't generally have a sound.


Alphabet is a very specific thing: it's a small set of letters (usually less than 30) where each letter usually represents a single phoneme.

Sometimes a letter might represent a phoneme cluster (such as the letters "x" and "j" in English, that usually represent the consonant clusters /ks/ and /dʒ/ respectively). Sometimes there might be some ambiguity, like two letters being used for the same sound (both "c" and "k" can produce the sound /k/ in English) or one letter having two different pronunciations ("c" can be pronounced as either /k/ or /s/).

What distinguishes alphabets from all other similar written systems is that a single letter cannot represent a combination of a consonant and a vowel and that vowels can be independently represented by letters.

Other similar scripts are Abjad (like ancient Hebrew), where letters only represent consonants and vowels are implied from the context. The Ancient Hebrew script (which is different than the square Aramaic alphabet used to write Hebrew after circa 300 BC) is a later variant of the Proto-Canaanite script, an abjad which served as a basis to all later European alphabets (Etruscan, Greek, Latin, Runic and Cyrillic) and other Near Eastern alphabets (such as Aramaic, Arabic and Syriac). The only pre-modern alphabet (or abjad or abugida) I'm aware of that is not derived from Proto-Canaanite is Hangul (which is a true alphabet, unlike the two Japanese Kana).

Modern Hebrew and Arabic are mixed-alphabets, since some vowels can be represented by consonants, but not all of the vowels, and the letters that represent a vowel leave some ambiguity with regards to which vowels they represent (or whether they represent a vowel or a consonant).

The next type of similar system is abugida, which covers most of the Ethiopian, South Asian and South-East Asian scripts (Ge'ez, Devanagari, Tamil, Tibetan, Thai, Burmese, Khmer and many more). These are all probably derived from the Aramaic alphabet. In abugidas most letters represent a consonant that comes with default vowel (e.g. क in Devanagari used to write Hindi represents /ka/), but there are special diacritics that can modify a letter to have a different vowel (e.g. कॆ represents /ke/ in Devanagari) or even insert extra consonants or glides before the vowel. These combined forms together with the diacritics can get fairly irregular (especially in Ge'ez) and consonant clusters can become quite unwieldy and then about 80% of the consonants would just get dropped in Tibetan. But that's the general idea.

Then you've got syllabaries: these are pretty straightforward systems, where every letter represent a combination of a consonant (or a consonant cluster) and a vowel (sometimes a diphthong or a vowel with a glide). These scripts require you to remember more letters, but the combinations are simpler and more regular than most alphabets (let alone abjads and abugidas). This is the kind of writing system you see getting developed independently more often than others: Linear B, Japanese Kana, Cherokee, Vai, Yi.

Chinese characters are none of these. Characters never represent a single consonant or a stand-alone vowel that can combine with another consonant. In fact, bar few exceptions (such as 儿 in Mandarin) every character represents a full syllable and does not combine to form a syllable. But Chinese characters are not syllabaries either, since there are many characters that can be used to write each sound and they are not interchangeable with each other. A specific character has to be used based on the meaning of the word. This is how logographic writing systems works and modern Chinese is logographic language par excellence.

To appreciate that you have to compare Chinese characters with other logographic languages. Let's take Akkadian cuneiform (the writing system used for writing Babylonian and Assyrian) for example.

Cuneiform was first developed to write Sumerian, but this language was mostly dead by the times of Hammurabi (18th century BC), and it was a far-gone relic during the heyday of the Neo-Babylonian Empire of Nebuchadnezzar II. The Akkadians (i.e. the various Eastern Semitic language speakers of Mesopotamia) needed to write their own language with characters that represented Sumerian concepts, and they used the same methods modern Chinese (or Japanese) speakers use today: using a single Sumerian logogram in its own original meaning (but Akkadian pronounciation), transcribing a word using syllables that represent different words with same sounds and combining multiple logograms to form a new meaning. Like Japanese (but unlike Chinese), Akkadian cuneiform characters can represent a multi-syllable word and multiple logograms can combine to a new word with completely different (and unexpected) pronunciation. Akkadian is also commonly using logograms as word classifier (e.g to indicate geographical locations, gender, type of object and many other things[1]). These classifiers were written, but rarely (if ever?) pronounced.

Egyptian hieroglyphs, which I am even less familiar with than cuneiform, seem to have a far more developed system of classification (determinaties). They also seem to exhibit combinations of logograms to denote new meanigns and phonetic writing from a very early stage. In fact, Egyptian hieroglyphs, the quintessential "pictographic" in contemporary imagination, are mostly phonetic. Each hieroglyphs generally represents a cluster of 1-3 consonants, which probably came from the original pronunciation of the word it represented.

But this is like an abjad! And yes, the Proto-Canaanite abjad probably originated in a simplification of Egyptian hieroglyphs. And like abjads, which dveloped into mixed scripts (like Modern Hebrew and Arabic) and developed optional diacritics, Egyptian hieroglyphs also needed a method to disambiguate the multitude of similar-sounding words. And for that reason most phonetic Egyptian words (as far as I know) are accompanied by a logographic determinative [2] (classifer) that signifies whether it's the name of a God, a city, a house, a lotus flower, a lotus bud, another part of the lotus (stem, stalk or rhizome) or foxes skins. Yeah, these classifiers get rather specific. [3]

No system out there (including Japanese Kanji) is exactly like Chinese characters as used in Mandarin Chinese and other modern Chinese languages, but what I want to show here is that even though Modern Chinese is quite different from Classical Chinese, the writing system is still logographic. All logographic systems (including classical Chinese) have some phonetic features, at the very least in order to account for words that have no agreed-upon logogram. But what makes them logographic, is the pervasive use of logograms in a semantic role to disambiguate meanings.

[1] https://sumerianlanguage.tumblr.com/post/167245277900/hi-i-w...

[2] https://www.ucl.ac.uk/museums-static/digitalegypt/writing/sy...

[3] http://web.ff.cuni.cz/ustavy/egyptologie/pdf/Gardiner_signli...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: