Page 2 of 2

Re: getMudletHomeDir() and encoding

Posted: Thu May 22, 2014 10:59 pm
by Jor'Mox
Just make a thread here in the help forum that is clearly named and that spells out exactly what the problem is.

Re: getMudletHomeDir() and encoding

Posted: Fri May 23, 2014 1:26 am
by Akaya
Or submit via Launchpad at: https://bugs.launchpad.net/mudlet/+filebug

Re: getMudletHomeDir() and encoding

Posted: Fri May 23, 2014 8:11 pm
by SlySven
Ah, just a quick thought but the Lua sub-system shipped with Mudlet may not handle and may not be receiving non-ASCII characters from the core - by rights we ought to be passing UTF-8 data between the two (which is character encoding that is a superset that encapsulates pure ASCII unchanged) but I think that in some cases a from/toLatin1() conversion is done which may be causing this issue.

Please do flag this up via Launchpad - but in the mean time the safest workaround is to use ONLY ASCII for path and file-names. There are additional Lua modules that will work with UTF-8 encoded data but at present we do not include or, I think, present data in a form that could be used by them.

Disclaimer: note that this is my initial musings and may not accurately reflect the official position/situation...!

Re: getMudletHomeDir() and encoding

Posted: Fri May 23, 2014 8:33 pm
by SlySven
Referring to the example you give the user name has two accented characters which are 'á' U+00e1 {Latin small letter a with acute} and 'č' U+010d {Latin small letter c with caron}. The first, although strictly a non-ASCII character is within the Unicode Latin-1 supplement block and is within range of an unsigned character variable (i.e. a single byte, though as UTF-8 it would be represented as the two bytes 0xc3, 0x83, internal to the Mudlet core it would be stored as the (UCS-4) QChar 0x00c3). The second is within the Unicode Latin Extended-A block and as UFT-8 would be encoded as 0xc4 0x8d, internally it would be stored as a QChar of value 0x010d. These sequences assume that the characters have been (re)normalised (decomposed and recomposed into standardised sequences) and are not presented as, for the latter case say, a Latin small letter c followed by a combining caron ...! :geek:

That might make the situation clear to the casual viewer. :wink:

Re: getMudletHomeDir() and encoding

Posted: Fri May 23, 2014 10:06 pm
by Angie
Thanks, I reported it through Launchpad.

Re: getMudletHomeDir() and encoding

Posted: Sun May 25, 2014 1:18 am
by SlySven
I've posited a one line fix on Launchpad that might solve the problem but can you modify, compile and test it? I can't really duplicate the environment - I found a bug in a Debian system utility trying to create a user with that name on my system but it isn't Windows so isn't really a valid test environment for your issue.

Re: getMudletHomeDir() and encoding

Posted: Sun May 25, 2014 2:31 am
by Angie
Erm, I'm not sure I would know how to do that. I'm not really a coder.

Re: getMudletHomeDir() and encoding

Posted: Sun May 25, 2014 10:19 am
by phasma
You should be able to fix this by creating a symbolic link to a path not containing a special character.

https://code.google.com/p/symlinker/ seems to be a decent tool for Windows.

Re: getMudletHomeDir() and encoding

Posted: Mon May 26, 2014 9:54 am
by Angie
I mean I will gladly test it for you in the next build or if you give me a test version, but I have never compiled anything in my life.