All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
I’ve been using “a Unicode (and ASCII) field separator” for my SSV flavour of CSV. I thought I should be using the FS control character (considering “FS”, according to much documentation, is a field separator).
Turns out most Unicode control characters have shitty official names and/or acronyms/abbreviations… such as…
- PLD: partial line forward (not: partial line down)
- SPA: start of guarded area (not: start of protected area)
- VTS: line tabulation set (not: vertical tabulation set)
- DC1: device control one (not: XON)
- RI: reverse line feed (not: reverse index)
- NP: form feed (probably for “new page”)
- NL: line feed (not newline, but we weren’t expecting that either, as an ASCII newline is CR+LF plus Unicode C1 has NEL (next line)…)
- Adding insult to injury, U+0080, U+0081, U+0084 and U+0099 do not even have a name (but Unicode “name aliases” which include an acronym (which (of course) WTF knows about) and at least a longer name).
… and so forth. There’s separators, too!
- FS: [U+001C] [␜] INFORMATION SEPARATOR FOUR [file separator]
- GS: [U+001D] [␝] INFORMATION SEPARATOR THREE [group separator]
- RS: [U+001E] [␞] INFORMATION SEPARATOR TWO [record separator]
- US: [U+001F] [␟] INFORMATION SEPARATOR ONE [unit separator]
And guess what… ASCII and Unicode FS is file separator (US is field separator). Oops. Sorry.
So… I guess when I use SSV next I’ll update (change in an incompatible way) the spec. Again, sorry about that.
It’s only in another 48 minutes but enjoy the Solstice! Blessed be!