⚠ This page contains old, outdated, obsolete, … historic or WIP content! No warranties e.g. for correctness!
All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
izabera did make a good point in IRC the other day for why we will need to have two locales at the very least in MirBSD – C and C.UTF-8 (the latter being widespread enough by now, thanks to me, interestingly enough. He uses code which leads to unexpected results…
$ generate() { tr -dc "[:alnum:]" < /dev/urandom | dd bs="$len" count=1; } $ len=10; echo $(generate 2>/dev/null) Ut流54Ȫf
… because tr(1) was the first utility I converted to Unicode, to explore possibilities and craft the OPTU encoding and, thus, “流” is, indeed, an alphanumeric character.
This implies two things: we need to change MirBSD libc locale functions back to support two charsets (and make setlocale(3) match), and mksh(1) should implement locale tracking (to change set ±U whenever one of the relevant parameters (${LC_ALL:-${LC_CTYPE:-${LANG:-C}}}) changes in the session; users could still set utf8-mode manually though). For this to not break anything, we’ll have to audit scripts in MirBSD though (usually adding export LC_ALL=C at their begin is enough, and we need this for portable scripts anyway) and remove all occurrences of #ifndef __MirBSD__ before setlocale(3) calls in applications. This will take a while.
Secondly, I opened an issue with POSIX about handling of the (deprecated, and for good reason) `-style command substitutions. The GNU autoconf texinfo manual gives good advice for portable shell scripts, and we all knew that foo="bar `echo \"baz\"`" wasn’t portable due to use of more than one set of double quotes, but my (and the yash authors’) reading of the standard (and mksh R52’s POSIX mode) make it set $foo to bar "baz" instead of the historic bar baz now, and I wish to get this clarified (and, possibly, the standard changed to match historic practice, as this breaks at least the Acrobat Reader 5 start script). Nothing has been decided yet (due to the holidays, I’m sure), but we got input from some other people involved in shell.
So, if any #!/bin/sh scripts break or behave weirdly with R52, you now know why. I’m waiting for an official statement.