getMudletHomeDir() and encoding

Jor'Mox
Posts: 1101
Joined: Wed Apr 03, 2013 2:19 am

Re: getMudletHomeDir() and encoding

Post by Jor'Mox »

Just make a thread here in the help forum that is clearly named and that spells out exactly what the problem is.

User avatar
Akaya
Posts: 414
Joined: Thu Apr 19, 2012 1:36 am

Re: getMudletHomeDir() and encoding

Post by Akaya »

Or submit via Launchpad at: https://bugs.launchpad.net/mudlet/+filebug

User avatar
SlySven
Posts: 985
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: getMudletHomeDir() and encoding

Post by SlySven »

Ah, just a quick thought but the Lua sub-system shipped with Mudlet may not handle and may not be receiving non-ASCII characters from the core - by rights we ought to be passing UTF-8 data between the two (which is character encoding that is a superset that encapsulates pure ASCII unchanged) but I think that in some cases a from/toLatin1() conversion is done which may be causing this issue.

Please do flag this up via Launchpad - but in the mean time the safest workaround is to use ONLY ASCII for path and file-names. There are additional Lua modules that will work with UTF-8 encoded data but at present we do not include or, I think, present data in a form that could be used by them.

Disclaimer: note that this is my initial musings and may not accurately reflect the official position/situation...!

User avatar
SlySven
Posts: 985
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: getMudletHomeDir() and encoding

Post by SlySven »

Referring to the example you give the user name has two accented characters which are 'á' U+00e1 {Latin small letter a with acute} and 'č' U+010d {Latin small letter c with caron}. The first, although strictly a non-ASCII character is within the Unicode Latin-1 supplement block and is within range of an unsigned character variable (i.e. a single byte, though as UTF-8 it would be represented as the two bytes 0xc3, 0x83, internal to the Mudlet core it would be stored as the (UCS-4) QChar 0x00c3). The second is within the Unicode Latin Extended-A block and as UFT-8 would be encoded as 0xc4 0x8d, internally it would be stored as a QChar of value 0x010d. These sequences assume that the characters have been (re)normalised (decomposed and recomposed into standardised sequences) and are not presented as, for the latter case say, a Latin small letter c followed by a combining caron ...! :geek:

That might make the situation clear to the casual viewer. :wink:

User avatar
Angie
Posts: 51
Joined: Fri May 02, 2014 11:43 pm

Re: getMudletHomeDir() and encoding

Post by Angie »

Thanks, I reported it through Launchpad.
Angie @ Midnight Sun 2
Alayla @ God Wars 2

User avatar
SlySven
Posts: 985
Joined: Mon Mar 04, 2013 3:40 pm
Location: Deepest Wiltshire, UK
Discord: SlySven#2703

Re: getMudletHomeDir() and encoding

Post by SlySven »

I've posited a one line fix on Launchpad that might solve the problem but can you modify, compile and test it? I can't really duplicate the environment - I found a bug in a Debian system utility trying to create a user with that name on my system but it isn't Windows so isn't really a valid test environment for your issue.

User avatar
Angie
Posts: 51
Joined: Fri May 02, 2014 11:43 pm

Re: getMudletHomeDir() and encoding

Post by Angie »

Erm, I'm not sure I would know how to do that. I'm not really a coder.
Angie @ Midnight Sun 2
Alayla @ God Wars 2

fetaera
Posts: 191
Joined: Sat Aug 03, 2013 7:00 pm

Re: getMudletHomeDir() and encoding

Post by fetaera »

You should be able to fix this by creating a symbolic link to a path not containing a special character.

https://code.google.com/p/symlinker/ seems to be a decent tool for Windows.

User avatar
Angie
Posts: 51
Joined: Fri May 02, 2014 11:43 pm

Re: getMudletHomeDir() and encoding

Post by Angie »

I mean I will gladly test it for you in the next build or if you give me a test version, but I have never compiled anything in my life.
Angie @ Midnight Sun 2
Alayla @ God Wars 2

Post Reply