Mudlet 4.0 internalization roadmap

Post by **Vadi** » Wed Apr 05, 2017 1:18 pm

Jor'Mox, would you have a recommendation for a translation webservice we could utilise? My only experience was with Launchpad years ago. That seemed good, but I've seen there are newer services available out there. Main feature of Launchpad I liked was how it could offer translations for similar text so it was easier to translate something consistently or even just tick 'approve' on text that was translated by some open-source project elsewhere and is same as the one you've used.

Post by **SlySven** » Wed Apr 05, 2017 2:48 pm

Does that not rely on the existence of matching complete sentences to be translated - I am not sure of the Hit-rate or %age of those that will match other project's work - I think we will need the input of multi-lingual users...

Post by **ftpd** » Wed Apr 05, 2017 3:02 pm

One more thing from me - if you guys decide to translate function names, you need to remember about encodings. For example in Polish users type special characters (ś, ą, ó, ż etc.) mostly in UTF-8, but Microsoft forced it's own encoding, cp1250, for years and it's still default (and sometimes only one possible) encoding on older Windows systems - so what if user1 on Mac uses 'jeżeli' (Polish for 'if') in utf-8 and user2 on Windows XP (yup, still...) tries to import his code?

Polish is just an example, I have no idea, how it is working in, let's say, Chinese. But I'm afraid

Post by **Jor'Mox** » Wed Apr 05, 2017 7:39 pm

Vadi wrote:Jor'Mox, would you have a recommendation for a translation webservice we could utilise? My only experience was with Launchpad years ago. That seemed good, but I've seen there are newer services available out there. Main feature of Launchpad I liked was how it could offer translations for similar text so it was easier to translate something consistently or even just tick 'approve' on text that was translated by some open-source project elsewhere and is same as the one you've used.

Unfortunately, I don't have any ideas for translation services. I was doing translations from Arabic to English for my then employer, but that was based on my having been trained in Arabic, not outsourcing the translations. My prior comment was primarily based on my own experience with less experienced translators making inaccurate translations due to failing to account for grammar differences.

Post by **SlySven** » Thu Apr 06, 2017 2:48 am

@Jor'Mox - that sounds like what I was thinking, that translations can only really be done a sentence at a time so that all the elements of that can be juggled/considered as appropriate, with perhaps some regard to adjoining sentences to maintain the context. Whilst it may be possible to find other uses of some of the same sentences elsewhere out in the wild and be able to reuse such things I am sceptical that many such instances will occur in practice. I would be very tempted to make a first pass through with Google translate and then find users who are fluent in the target language to point out what doesn't make sense or seem to fit or is glaringly wrong to them to suggest corrections...

Post by **Vadi** » Thu Apr 06, 2017 5:27 am

Not many comments on the proposed implementation steps so I'll take it they seem reasonable to people.

I'd would not use Google translate for the first pass, it would make for a hilariously awful translation that people will laugh at. Internalisation is not as easy as just using Google translate, sorry, this will be a giant effort to get everything translated properly.

We can settle on a translation platform later on though, I think the #1 critical thing that's actually blocking people from using Mudlet is the fact that it doesn't support anything other than the Latin encoding in the renderer. Does that sound like a reasonable goal to focus on first then? @SlySven

Post by **SlySven** » Thu Apr 06, 2017 12:00 pm

You mean TConsole / TBuffer / TTextEdit classes - and the code that assumes that a single QChar from a QString represents a single, mono-spaced, glyph on the display when it is actually one of a pair (of High-surrogate/Low-surrogate) Utf-16 16-bit values? Yeah - but we must also do stuff with the cTelnet class to properly handle (in a state-ful manner) the multi-byte sequences of Utf-8 instead of handling each byte at a time as we do now.

A golden rule to remember going forward, everyone: just because you are using a mono-spaced font it does not mean each code-point takes up the same space on screen!

Also QString is Utf-16 so few of the methods for "character"-positions, i.e. indexing through them by "position" mean diddly-squat as far as the end-user will count their language glyphs if they are going off the BMP. Learn up on the QTextBoundaryFinder class...!

Post by **Vadi** » Thu Apr 06, 2017 12:45 pm

Yep, I do.

The original intent behind Mudlet allowing only monospaced fonts was an optimisation measure so that every character's size wouldn't have to be computed. I wonder if we'll be able to keep that optimisation? I'm suspecting not, because a Chinese hieroglyph by its nature would have to take up more space than a latin character, hey. Actually, I suspect this is why the spacing when Chinese and Latin characters are mixed often looks wrong and terrible - getting this right will be an art.

I'll put the roadmap into tickets and milestones on Github so we can branch off selective conversations there now that the roadmap is OK

Post by **Garagoth** » Thu Apr 06, 2017 10:26 pm

Translating API is troublesome - no help available, no way to exchange scripts... UTF8 support in strings is a must ofcourse, maybe if there is serious demand allow utf8 variable names, but we lived without this for ages. Lack of support for displaying national characters (and sending them naturally) is a much bigger issue.

And... most automatic translators are doing a very poor job translating to some languages - for example google translate from English to Polish usually creates something that just cannot be understood (translator picks wrong meanings of words. For example, my favorite, is translation of tank (armored tracked vehicle, but also a container for liquids) - translator always picks up second meaning, so we end up with 'war of "liquid containers"' when translated back. Usually I cannot understand descriptions in Google Play store when translated into Polish.

Also sewing messages from parts ends up very poorly when languages have different order of words (verbs: German - always as second word (or last in some cases), Polish - it can be almost anywhere, but meaning can change completely depending on position, Japanese is very rigorous and verbs are at the end always...), good luck making it work looks like a maintenance nightmare to me.

As for the fonts... um, it might be that monospaced is hard to read sometimes (I am testing on my kids, heh. It simply looks different to what is in books and they read much slower). And glyphs from for example kanji are bigger. Luckily most of ascii-based national characters (so, European based?) fit in size with 'normal' chars.
But hey, we play TEXT games on computers with multiple cores and clocks in GHz. We can afford to compute sizes for wrapping purposes... it will no longer be measured in chars but in... hm... not sure. Container size, pixels, ..?
Not sure what to put in "wrap lines after..." preferences option. Maybe skip it and control wrap by setting margins properly?

TL;DR:
Do not translate APIs. Maybe provide a hook so scripter can catch those and display replacement.
Either make human verify translated error messages or do not translate. Use what OS provides (most system errors are already translated, linux offers locales support, windows too).
Big ++++ for allowing wider charset in output and input.
Use margins for wrap control if you drop monospaced fonts.

G.

Post by **Jor'Mox** » Thu Apr 06, 2017 10:37 pm

You could also just calculate the width of ALL characters in a font set that isn't technically mono-spaced, but is close enough, and then use the largest value as the width allotted to each character. As for allowing UTF8 in strings but not in variable names, I think that would be problematic, because strings can be used to declare variable names in tables, so things would probably get a bit tricky. It would be a lot easier to just allow a wider character set throughout the code, rather than making a special case just for strings.

I definitely agree that allowing for input and output of larger character sets has got to be the priority, but it probably has to be in concert with changing the scripting engine to support them as well, otherwise you are basically turning Mudlet into Telnet...

Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap

Re: Mudlet 4.0 internalization roadmap