What is the number 840?

jacquesm · on June 19, 2014

6*140. But as your sibling comment points out twitter is not based on octets but on code points so there is actually a lot more information in there. I made the assumption that the limit was the 140 byte limit from GSM messages with a 'payload' of roughly 6 bits per character position but as was pointed out it is in fact now 140 UTF-8 code-points.

https://dev.twitter.com/docs/counting-characters

markburns · on June 19, 2014

Yeah, I'm familiar with tweeting in Japanese, and how much more information you can get across in a tweet, which made me question the number. However, I'm still not sure why (in an imaginary world where twitter doesn't encode in UTF-8), it might be 6-bits. 7 or 8 I can understand. I'm just curious of what your thinking was, not trying to do any one-upmanship.

jacquesm · on June 20, 2014

Well, 'ascii' is ' ' to 'del', above that I can't even type on this keyboard with any reliability so that gives an upper boundary for me of 96 characters. Of those the actual information is carried mostly by the letters, A-Z, twice if you want to count uppercase and lower case for 52 letters, 10 digits, a space. So that's 63 letters. Round up to 64 (maybe add the @ character or the # if you want, those are pretty prevalent in tweets as well). That's approximately 2^6. If you drop the lowercase/uppercase distinction then you can fit all the other ASCII glyphs and punctuation marks in the second half of your imaginary 6 bit code. You can't enter anything below <32 in a tweet, other than a linefeed.