Consider switching from ANSI to Unicode encoding | Naval Warfare Simulations Forums

DeMatt
New Member

Posts: 47

Consider switching from ANSI to Unicode encoding Jul 22, 2019 3:29:07 GMT -6

Quote

Post by DeMatt on Jul 22, 2019 3:29:07 GMT -6

I refer both to the text encoding used inside files, such as GermanyShipNames.dat, as well as the encoding used to name files. It's quite disconcerting to see mojibake in Windows Explorer:

Or to be told that I can't use an armor thickness of "ｽ", like this:

As an added plus, it would make it easier to add foreign-script names and text to the game.

rimbecano
Veteran Member

Posts: 1,230

Consider switching from ANSI to Unicode encoding Jul 22, 2019 4:09:57 GMT -6

Quote

Post by rimbecano on Jul 22, 2019 4:09:57 GMT -6

On Windows this is actually more difficult than it should be because of decisions Microsoft made 30 years ago. Most other systems use UTF-8 for Unicode, which is back-compatible with ASCII (any valid ASCII file is also valid UTF-8 with the same content), so the transition to Unicode was mostly smooth and invisible, and by this point is pretty much complete. Microsoft decided to use UTF-16, which is not back compatible with ASCII or any of the ASCII-extended code pages Microsoft was using before. As a result, Microsoft had to duplicate all text-handling system calls in Windows, one copy for legacy 8-bit text, one for UTF-16. On UTF-8 systems, many programs require no changes at all to use Unicode instead of ASCII, but on Windows at least some changes are necessary even in the best case scenario, and if you use 3rd-party text handling libraries with no Unicode support, for which you have no source code access, it may require a total rewrite of the program to use libraries from another vendor that support Unicode.

cabalamat
Veteran Member

Posts: 760

Consider switching from ANSI to Unicode encoding Jul 27, 2019 9:21:51 GMT -6

Quote

Post by cabalamat on Jul 27, 2019 9:21:51 GMT -6

Jul 22, 2019 4:09:57 GMT -6 rimbecano said:

On Windows this is actually more difficult than it should be because of decisions Microsoft made 30 years ago. Most other systems use UTF-8 for Unicode, which is back-compatible with ASCII (any valid ASCII file is also valid UTF-8 with the same content), so the transition to Unicode was mostly smooth and invisible, and by this point is pretty much complete. Microsoft decided to use UTF-16, which is not back compatible with ASCII or any of the ASCII-extended code pages Microsoft was using before. As a result, Microsoft had to duplicate all text-handling system calls in Windows, one copy for legacy 8-bit text, one for UTF-16. On UTF-8 systems, many programs require no changes at all to use Unicode instead of ASCII, but on Windows at least some changes are necessary even in the best case scenario, and if you use 3rd-party text handling libraries with no Unicode support, for which you have no source code access, it may require a total rewrite of the program to use libraries from another vendor that support Unicode.

i would never use non-ascii characters in filenames, it's just asking for trouble. (Actually, I just stick to letters, digits and underlines.)

Yes it may work on some systems, but when you move to other systems / OSes / differently set up machines, use in web URLs etc, it's bound to cause problems somewhere down the line.

rimbecano
Veteran Member

Posts: 1,230

Consider switching from ANSI to Unicode encoding Jul 27, 2019 12:41:19 GMT -6

Quote

Post by rimbecano on Jul 27, 2019 12:41:19 GMT -6

Jul 27, 2019 9:21:51 GMT -6 cabalamat said:

i would never use non-ascii characters in filenames, it's just asking for trouble. (Actually, I just stick to letters, digits and underlines.)

Yes it may work on some systems, but when you move to other systems / OSes / differently set up machines, use in web URLs etc, it's bound to cause problems somewhere down the line.

These days, you're unlikely to have much trouble with filenames on one system being unsupported by another (though particular applications may have trouble, but those are generally legacy software). 10 or 15 years ago, though, it was something you still had to worry about.

The one thing you have to worry about with filenames these days is that Windows folds case, so distinct filenames on other systems may collide on Windows.