Mudlet 4.0 internalization roadmap

User avatar
SlySven
Posts: 1019
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: Mudlet 4.0 internalization roadmap

Post by SlySven »

Jor'Mox that width calculation will not work with combining diacritical marks which do not have any/much spacing requirements themselves as they adorn the glyph that they are combined with.

This also brings in the subject of (re)normalisation - as the code point sequence the Mud Server sends may not be what the user entered into their triggers to detect it - we have to normalise both but the Unicode people do have algorithms on how to do that to the incoming byte stream (and only needs to keep a store of a maximum of thirty code-points IIRC - read part 13!)

Jor'Mox
Posts: 1142
Joined: Wed Apr 03, 2013 2:19 am

Re: Mudlet 4.0 internalization roadmap

Post by Jor'Mox »

If we are trying to force fixed-width behavior, then I would assume that diacritical marks would be assigned 0 width, because they would be attached to an existing character (for which horizontal space was already allocated). That said, I would assume that most of the languages you are thinking of that use diacritical marks already have fixed width fonts, as while they may not fit perfectly in the Latin character set, they are very similar in nature, and thus creating a fixed-width typeset for them isn't a challenge. More likely problem languages would be some from SE Asia, Chinese for example, where you are no longer working with alphabets, and potentially you have thousands of distinct glyphs to represent, making the task of generating a fixed-width typeset for them much more daunting, though still potentially something that has been done (from what I understand, all Chinese characters are ideally supposed to fit in a square of a fixed size anyway). Either way, while fixed-width text is a reasonable ideal, I would suggest that we shouldn't stress about "what if we can't find a suitable fixed-width font for language X" until we actually fail to find such a font, and cross that bridge when we get there.

User avatar
Vadi
Posts: 5035
Joined: Sat Mar 14, 2009 3:13 pm

Re: Mudlet 4.0 internalization roadmap

Post by Vadi »

I think fixed width fonts are a given. They're the foundation of a MUD and have worked well for Mudlet's entire lifetime. No need to change that.

I agree, no need to stress.

User avatar
Vadi
Posts: 5035
Joined: Sat Mar 14, 2009 3:13 pm

Re: Mudlet 4.0 internalization roadmap

Post by Vadi »

I've ticketed the original implementation steps in the 4.0 milestone in Github. Tickets are sorted in order of priority. Let's get to work ;)

User avatar
Vadi
Posts: 5035
Joined: Sat Mar 14, 2009 3:13 pm

Re: Mudlet 4.0 internalization roadmap

Post by Vadi »

I think we'll have to think real hard how to make fixed-width stuff work. I don't think we should go non-fixed width - MUDs are a lot about formatting. It's not Word-style formatting, it's a different type of formatting where being able to align things is quite vital, and I don't think we can afford to lose that. I've looked at some web clients that don't use a fixed width font and haven't been sold that it's the right way to go.

User avatar
SlySven
Posts: 1019
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: Mudlet 4.0 internalization roadmap

Post by SlySven »

AC (Ahmed Charles) did make a valid point when they pointed out that we ought to be handling displays in terms of graphemes - which are the discrete units of a written text.

So I think yours truly will be looking to create a TGString class around the Qt QString string class so that we can have grapheme based C++ code using things like TGString::mid(int start, int length = -1) where start = 5 and length = 1 will return the sequence of QChars (that effectively as far as the rest of the code goes is a QString) that make up the fifth grapheme in the original string. This sort of thing will make a useful intermediate between the existing C++ code that assumes each element of a QString (i.e. a QChar is a character{=a grapheme} when for non-BMP graphemes it takes a pair of them - and for a sequence of diacritical accents applied to a Latin letter it can be several! By dropping in this new class TGString in place of existing QString in some key points we should be able to leverage in the required extra functionality - at the cost of it being a little slower because of the overhead that is implicit in the necessary stuff involved in tracking/maintaining the internal state of things...

prool

Re: Mudlet 4.0 internalization roadmap

Post by prool »

Hello, sir Vadi!

I sent you a message on the forum.bylins.su.

I did not manage to make mudlet in Ubuntu 16.10 :-(

With best regards, Serge

User avatar
Vadi
Posts: 5035
Joined: Sat Mar 14, 2009 3:13 pm

Re: Mudlet 4.0 internalization roadmap

Post by Vadi »

Okay, I'll have a look!

Post Reply