Is there any specification on how "far" a zero-width joiner can reach? I thought a ZWJ would only affect the codepoints directly adjacent to it, but here we have sequence of 5 codepoints and only one ZWJ for all of them (which is also the only codepoint that signals that "special treatment" should be applied here).
So how would a parser know all five codepoints make up a grapheme cluster and not just the inner three?
Edit: Ok, didn't realize the "Fitzpatrick" codepoints are modifiers that I guess always refer to the previous codepoint. So there is essentially an "operator precedence" defined between modifiers and ZWJs.
Was curious whether the Zero Width Joiner was ad-hoc with anything provided, or a long list of specific combinations. Turned out it was specific combinations, and the list is rather impressive.
Unicode.org has a list of the recommendations [1]. It's somewhat long. Takes a bit to load. There's also a bunch that are "really" long. "man facepalming: medium-light skin tone" is great. U+1F926 U+1F3FC U+200D U+2642 U+FE0F Now I can do a Picard in Unicode. Now if only they had "man who is bald facepalming." Pretty sure that means adding U+1F9B2 somewhere, maybe U+1F468 U+200D U+1F9B2.
Also like U+1F9B9 U+200D U+2642 U+FE0F, "man who is supervillain". Current version looks more like Minions movie, and suggested sample looks more like "mime vampire who is stuck in plastic action figure container"
So how would a parser know all five codepoints make up a grapheme cluster and not just the inner three?
Edit: Ok, didn't realize the "Fitzpatrick" codepoints are modifiers that I guess always refer to the previous codepoint. So there is essentially an "operator precedence" defined between modifiers and ZWJs.
reply