Input character encoding

Post Reply
Mikal
Posts: 3
Joined: Sat Feb 01, 2014 3:56 pm

Input character encoding

Post by Mikal »

Hi everyone

I'm new to mudlet. Been using zmud for years, but as it doesn't play well with windows 8, I've decided to experiment with other clients. So far I'm very impressed with the speed of mudlet, not to mention actual regular expressions in triggers/aliases aswell as a proper scripting language.

However, funny things happen when I use specific regional letters.
f.ex. if I send:
say æøå
The mud echoes back:
You say: æøå
My guess is that it has something to do with character encoding used by mudlet to send input to the mud, as I can receive text containing those letters.
Shinra tells you: æøå
So I guess my quest is, if it is possible to change character encoding anywhere in mudlet?

Thanks in advance.

User avatar
SlySven
Posts: 1034
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: Input character encoding

Post by SlySven »

That looks like a UTF-8 encoding issue, the character you are looking at are encoded into bytes thus:

'æ' is Unicode point U+00E6 LATIN SMALL LETTER AE and is represented in UTF-8 by the bytes: 0xC3 0xA6
As ISO-8859-1 (Latin1) those two bytes would show as: "æ" LATIN CAPITAL A WITH TILDE, BROKEN BAR.

'ø' is Unicode point U+00F8 LATIN SMALL LETTER O WITH STROKE and is represented in UTF-8 by the bytes: 0xC3 0xB8
As ISO-8859-1 (Latin1) those two bytes would show as: "ø" LATIN CAPITAL A WITH TILDE, CEDILLA.

'å' is Unicode point U+00E5 LATIN SMALL LETTER A WITH RING ABOVE and is represented in UTF-8 by the bytes: 0xC3 0xA5
As ISO-8859-1 (Latin1) those two bytes would show as: "Ã¥" LATIN CAPITAL A WITH TILDE, YEN SIGN.

For those of you watching in ASCII this may look weird - welcome to the world of Mojibake - try switching your browser to UTF-8 encoding ( FireFox/IceWeasel: "View"->"Character Encodings"->"UTF-8" ) and ensure you are using a decent font...

I think we do have issues with Unicode - a big one is support for it in lua but I'd be surprised if that is the issue at this point. May I enquire which MUD you are connecting to - some do not handle anything other than ASCII from their users?

User avatar
Vadi
Posts: 5050
Joined: Sat Mar 14, 2009 3:13 pm

Re: Input character encoding

Post by Vadi »

To be fair, there are 278 cases of toLatin1 in Mudlets code. It seems like it is Mudlet that is the one encoding it wrong.

Mikal
Posts: 3
Joined: Sat Feb 01, 2014 3:56 pm

Re: Input character encoding

Post by Mikal »

The mud in question is deepertrouble.org:4242

I hate to make comparisons to other clients, but I didn't have problems with non-ASCII characters i zmud/cmud.

Mikal
Posts: 3
Joined: Sat Feb 01, 2014 3:56 pm

Re: Input character encoding

Post by Mikal »

The mud itself is in English and doesn't make use of those characters. But alot of the players including myself, are danish and use them when speaking to each other.

I really hope there's a way I can make it work. As I mentioned I'm very impressed with the features and speed of mudlet, but not being able to use those letters is a dealbreak for me :(

User avatar
SlySven
Posts: 1034
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: Input character encoding

Post by SlySven »

Am currently looking at Mudlet's Lua handling of Utf-8 and, um, currently, it doesn't much, ☹ . However, as Vadi points out there are many places where the Lua system is being sent ASCII encoded data via toLatin1() calls instead of toUtf8() which I think we need to use - and we also need to handle when the user inputs non-ASCII characters (reach for your <Compose> or <Multi_key> - often mapped to Alt-Gr on your ⌨). There is a lot of stuff to review and check, so, yep it doesn't work at the moment but it is certainly on my want to do list. As a related but actually separate matter, if I can get the kinks ironed out and increase the performance speed of a hack on the 2D mapper you'll also be able to enter pretty much any displayable grapheme as a marker letter (character) for a room symbol so you could actually use as a marker for a death-trap...!

Post Reply