Encoding in Cyrillic KOI-8 R.

Jor'Mox
Posts: 1146
Joined: Wed Apr 03, 2013 2:19 am

Re: Encoding in Cyrillic KOI-8 R.

Post by Jor'Mox »

If you managed to find, or coax out of said author, a description of what he did to convert Mudlet to a UNICODE supporting version, you could probably convince someone here to do the same thing using newer Mudlet code.

sangeet
Posts: 6
Joined: Fri Sep 13, 2013 7:52 pm
Contact:

Re: Encoding in Cyrillic KOI-8 R.

Post by sangeet »

yep, i'll give it a try :)

Phazeus
Posts: 10
Joined: Thu Apr 01, 2010 12:49 am

Re: Encoding in Cyrillic KOI-8 R.

Post by Phazeus »

Hello!
I use Linux and Mudlet looks nice to use, I seek an client with LUA support. But problem with linux version of client:
./mudlet: symbol lookup error: /home/tux/mudlet-3.0.0-gamma/bin/libQt5Core.so.5: undefined symbol: u_strToUpper_52
Ok. I has download windows version and run it in wine successfully. But text encoding incorrect. I think it trying output text in unicode, but need windows cp1251. Is it possible to add little fix to use encoding changes?

Here screenshot:
http://itmages.ru/image/view/2179106/1109f438

Thanx.

User avatar
Vadi
Posts: 5042
Joined: Sat Mar 14, 2009 3:13 pm

Re: Encoding in Cyrillic KOI-8 R.

Post by Vadi »

I'm afraid it's not a little fix.

Phazeus
Posts: 10
Joined: Thu Apr 01, 2010 12:49 am

Re: Encoding in Cyrillic KOI-8 R.

Post by Phazeus »

Can I help? Just this is no problem to use different encoding. Just additional layer, for example, iconv... But this is not normal if client does not supports encodings... All other cliemts work with cp1251 correctly, but has not LUA support :(((

User avatar
SlySven
Posts: 1023
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: Encoding in Cyrillic KOI-8 R.

Post by SlySven »

The main issues are that:
  • the Lua subsystem that the user uses to interface with the main application was not constructed to use anything other than ASCII / Latin1 encoding and needs converting and testing to pass Utf-8 both ways - that is doable and is slowly underway but there is something like 2-3000 places where this happens so it will take a while.
  • The MUD server output parsing and Telnet handling was built to handle single bytes at a time so needs an overhaul to interpret the Telnet protocol wrapping the data stream before the latter can be assembled back into something that can be validated as a Utf8 data stream and then renormalised to enable consistent pattern matching by the trigger engine.
  • Parts of the display code is based upon the assumption that, as we are using a mono-spaced font, each byte (unless it is a tab or carriage-return/linefeed pair) is a single character that takes up a single space on screen - with Utf-8 that isn't the case and though Unicode handling internally is done via the QString class which handles individual Unicode character as QChars there is some breakage because Qt's string handling this way only works for Basic Multi Plane Unicode characters so for Unicode glyphs such as 'letter not accepted by Forum' or words like "word not accepted by Forum" each glyph is a Pair of QChars (high and low surrogates) and not one.
As I say there is still plenty to do...

Edit: demonnic! Why can't I post non-BMP character such as {U+1F4A9 = PILE OF POOH} or Old English in range {U+10200 to U+1034A} here?

Phazeus
Posts: 10
Joined: Thu Apr 01, 2010 12:49 am

Re: Encoding in Cyrillic KOI-8 R.

Post by Phazeus »

==anything other than ASCII / Latin1

NO. LUA works with the strings in binary format. And this is not an LUA issue, Just add other CP125* codepages and font woith support of this codepages.
I wok with LUA coding and LUA works fine with ANY one-byte encodings.

User avatar
Vadi
Posts: 5042
Joined: Sat Mar 14, 2009 3:13 pm

Re: Encoding in Cyrillic KOI-8 R.

Post by Vadi »

It does yeah, except the string functions which don't.

We're definitely wanting to achieve this, if you'd like to help, pitch in

Phazeus
Posts: 10
Joined: Thu Apr 01, 2010 12:49 am

Re: Encoding in Cyrillic KOI-8 R.

Post by Phazeus »

This functions possible very easy to fix for supporting any one-byte encodings. I can send functions for cp1251 encoding.

But why LUA functions need to reply text from telnet server? I mean need font with supporting cp125* encodings :)

User avatar
SlySven
Posts: 1023
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: Encoding in Cyrillic KOI-8 R.

Post by SlySven »

All of the Mudlet specific Lua commands that we provide for you have got to be overhauled to handle non-ASCII characters. That is proving to be an issue already e.g. for some users on Windows platforms have problems because even getMudletHomeDirectory() doesn't work because that platform allows non-ASCII characters in their user name and by default for their home directory so they cannot use any scripts that need to find out where to save or load things... The Lua functions is a different problem we have to overcome to the Telnet stuff, the latter is just not up (yet) to handling the variable (1-4) bytes that make up each "character" of a Utf-8 stream and of course using the word "character" is a bit of a misnomer as some of those "characters" can merge together to form each grapheme (a single visual "entity" on the screen) {think of a single lower case 'a' followed by a single "acute" (or more) accents that combine to produce an "lower case latin A with acute accent" character.} We have to do the re-normalisation whilst we extracted this data from an ongoing Telnet data stream which may split some of those variable number of bytes between successive packets AND include other out-of-band traffic that various MUD servers provide to "enhance" the Mudding experience.

All in all the cTelnet class is a vital piece of the Mudlet code - and we have to go thorough it and rewrite chunks without breaking it - not a job for the faint hearted! :shock: :geek:

Post Reply