lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

59
active users

#unicode

0 posts0 participants0 posts today

Got a bug report for @novelwriter from someone who uses Cuneiform text in their work. These are 4 byte Unicode symbols, and turned out to be very tricky to handle. 😅

The app is built with Python, which will switch a string to UCS-4 when it contains such characters, so the characters always have a single index in the string.

However, the Qt library uses UTF-16. That means 4-byte characters use two slots, creating a mismatch in indices between the two representations.

#Python#Qt#Code

Did you know that new #Emoji can be proposed by anyone, simply by following some guidelines laid out by the #Unicode consortium? There's a time window each year where they accept proposals, and a select few might make it into future sets.

This year I turned one in: "Circuit Board", which I was surprised to find 1. didn't exist and 2. had not been proposed before (though CPU and Microchip have both been submitted and declined in the last 5 years)

You can read my proposal here:
storage.googleapis.com/greg-ke

and you can see the Unicode emoji proposal guidelines here:
unicode.org/emoji/proposals.ht

Anyway, the odds aren't great of getting accepted, but if it IS then you can say "hey! I know the guy who submitted that one!"

Attached are the sample images I drew up for the proposal - which, incidentally, are now Public Domain as well. Enjoy!

The macOS Character Viewer previews all Unicode space characters as a blank space, so finding any particular space can be a bit of a chore. Here's me trying to find PUNCTUATION SPACE.

The iOS 18.5 SDK finally came out and the only change for Unicode coverage is the *removal* of a bunch of Sinhala codepoints:

ඁ෦෧෨෩෪෫෬෭෮෯𑇡𑇢𑇣𑇤𑇥𑇦𑇧𑇨𑇩𑇪𑇫𑇬𑇭𑇮𑇯𑇰𑇱𑇲𑇳𑇴

(Those of you on iOS 18.4: Enjoy seeing those glyphs while you can!)

It sucks that #Linux tech is fragmented in terms of fonts. Like, the most sensible behavior for fonts is using one main fonts and falling back to other fonts when a given #Unicode codepoint is not found in the main one. But! my window manager uses raw #X11 APIs for text output, and these are primitive enough to just display ??? when they encounter an unfamiliar glyph. Which is... too often, because I have a lot of #Russian and a bit of #Armenian in the stuff I window manage. So I have to guess what all these question marks mean. Not cool.