Developers’ Weblog

Sponsored by
HostEurope Logo

Developers’ Weblog

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

pkgsrccon 2013

24.03.2013 by bsiegert@
Tags: pkgsrc

On Saturday March 23, this year's pkgsrc conference (pkgsrccon 2013) took place in Berlin. Julian Fagir organized it with unending energy, even though pkgsrc is not the primary focus of his NetBSD work. He just took matters in his hands because no one else stepped forward. A big thanks for that!

The flight from Zurich to Berlin was uneventful. It was my first flight to TXL airport (I normally arrive at SXF), and arriving there is incredibly quick and convenient compared to the latter. The terminal is very small, and it takes just five minutes to go from the plane to a bus to the city.

Now for the conference itself: we started at 12pm on Saturday with a program of talks but no fixed schedule. Due to this, the conference took a long time (we finished only at 9pm or so) but on the other hand, it allowed for lots of interesting and fruitful discussion. At no point did we have to cut a question short because of a lack of time. Overall, I think that this was an excellent choice and made the conference more useful and productive.

We were about 21 people – mostly pkgsrc developers (of course) but also a Debian Developer (Ralf Treinen, who presented his work on Mancoosi), a FreeBSD dev and some interested users. I won't give an exhaustive recollection of all talks here but simply comment on a few ones that I found particularly interesting.

The most important theme of the conference was virtualization and cloud computing. Jonathan Perkin and Filip Hajny gave a talk about their company's product, SmartOS, and how it uses pkgsrc. SmartOS is a "cloud OS" based on OpenSolaris. It boots from a read-only medium (such as a CD) into a lean system that only does the administration of all the zones that it runs. All useful work happens in zones, which are a sort of lightweight VM solution specific to OpenSolaris. The zone images include access to a very complete set of pkgsrc packages for things such as a compiler. They can also run other OSes (NetBSD!) by setting up a zone that runs KVM. Joyent runs a large public cloud with SmartOS, where customers purchase virtual machines by the hour. This is similar to Amazon EC2 but with a focus on high performance.

Hubert Feyrer gave another talk about a similar theme. He described the use of Ansible for provisioning and setting up VMs. Ansible can automatically create VMs on EC2, gather the necessary information (such as the IP address) and do various setup tasks without further user interaction. This was all very impressive, even though the live demo failed. This was for two reasons: Somebody deleted the sudo package for amd64 from the NetBSD ftp server (boo), and the i386 VM failed to come up, the kernel paniced on startup. Joerg speculated that this was due to _some_ machines in their DC not having PAE enabled, while the i386 kernel uses PAE. This was interesting, as I had noticed the very same problem when I set up the netbsd-386-bsiegert continuous builder for Go.

Amitai Schlair alias Schmonz came out in a passionate defense of the venerable pkglint. He put the source on github and started refactoring the code and adding tests. He calls this approach TED for "Test Eventually Development" ;) and advocated a similar approach for the pkgsrc infrastructure: Every time a developer takes five minutes to understand a part of the infrastructure (when making a change, for instance), he or she should write a test for it. This is a very pragmatic and doable approach, in my opinion, and we should all do this.

I gave a slightly amended version of the "Go on NetBSD" talk I had given at FOSDEM 2013. There were a lot of valuable questions and discussion, both about the language and about how to package software written in it.

Aleksej Saushev ended the day with a talk that was not in the program about the Google Code-In and the problems that developers and particularly new contributors face. If pkgsrc can get more contributors, it gets more fixes, which in turn makes it more useful to users. More usefulness leads to more users, leading to more contributors. We should do more to get into this virtuous circle. There are about five different mechanisms to build and/or deploy packages in pkgsrc: build directly with "make package", pkg_chk, pkg_comp, the old bulk build scripts and pbulk. The basic frustration that should be overcome is the following: you want to upgrade a set of packages, the old ones are removed, new ones are rebuilt, and the build fails. Rolling back is difficult in general. pbulk could be a valuable solution to this, but its standard config is heavily tailored for a different use case, and its _two_ separate pieces of documentation are contradictory, incomplete and confusing. So the talk contained a call for action to fix those minor annoyances and generally document things better, which makes it easier for everybody.

My take-home message – and my next project idea – is the following: each time that I do a MirBSD bulk build using pbulk, I have to do a lot of painful steps to set up the right build environment on all my machines. This time, I will try to automate this process with Ansible, making up the recipes as I go along, and then (more importantly) publish these recipes for others to use and to share.

Himbeerichüechli!

04.03.2013 by tg@
Tags: twitxr

Himbeerquarkschnitte *winkt Ventilator*

Natürlich mit MirKaffee (enthält Milch, Kakao, Kaffee, Rohrzucker)!

Too much

23.02.2013 by tg@
Tags: personal

I’ve been doing too much lately, which has led to reduced performance and enjoyment. Also I’ve not been able to work the full hours of my dayjob, reducing what I had on my overtime account. I’ll be taking a step back and try to un-load. This is my notice, I’m not explicit on where, and I’m not cancelling anything special (not even those mentioned in the next paragraphs).

I’m disappointed with Google/Nianticproject Ingress. It’s frustrating (nothing lasts; also read this posting), buggy, battery-draining, sometimes too time-consuming (especially with only GPRS) and I don’t get warm with the Android 2.3 based Cyanogenmod on the borrowed device. Using it without a big screen device having the Intel map next to you is futile. I could go into detail but won’t. I won’t stop playing, as it’s a good excuse to go outside and combines somewhat with geocaching (unless you’re trying to actually play Ingress, in which case you’ll just be walking/cycling/driving between portals at maximum speed). And there’s that connection with Liferay…

Fun is important in securing volunteer work; bugs and other random happenings (example) can drain the fun.

To end on a positive note, I’m absolutely, totally happy with mksh user and distributor feedback, including the bug reports and feature requests, how well almost all people deal with feature rejection, and the speed of integration of mksh(1) updates lately. The only thing I’m unhappy wrt. mksh is my own lack of speed regarding implementing the cool new things I’ve been, as an mksh user, waiting for because I want and even need them for some cool programs written in mksh I would love to write, so I can use them.

I’ve got roughly 350 mails in my INBOX (all read, but most of them being action items; some due… before this weekend, evilly enough, the one I’m thinking of is GnuPG/MIME encrypted, which means extra effort to read it). Just so you know. (And a couple of other things that really could use some fixing, which I can, in theory, do. And lots of requests for spending real life time with.)

I’m still reachable via eMail and IRC (mostly), will respond, will try to persuade my employer to send me to CLT 2013 next month… just, don’t deadline me right now. I’m not taking a VACation either (though I probably should, had I money).

GNU autotools generated files

20.02.2013 by tg@
Tags: debian rant

On Planet Debian, Vincent Bernat wrote:

The drawback of this approach is that if you rebuild configure from the released tarball, you don’t have the git tree and the version will be a date. Just don’t do that.

Excuse me‽

This is totally inacceptable. Regenerating files like aclocal.m4 and Makefile.in (for automake), configure (for autoconf), and the likes is one of the absolute duties of a software package. Things will break sooner or later if people do not do that. Additionally, generated files must be remakable from the distfile, so do not break this!

May I suggest, constructively, an alternative? (People – rightfully, I must admit – complain I’m “just” ranting too much.)
When making a release from git, write the “git describe” output into a file. Then, use that file instead of trying to run the git executable if .git/. is not a directory (“test -d .git/.”). Do not call git, because, in packages, it’s either not installed or/and also undesired.

Couldn’t comment on your blog, but felt strongly enough about this I took the effort of writing a full post of my own.

(But thanks for the book recommendation.)

PSA: Referring to Unicode codepoints.

If your Unicode codepoint is, numerically, between 0 and 65533, inclusive, convert it to hexadecimal and zero-pad it to four nibbles. For example, the Euro sign € is Unicode codepoint #8364 which is 20AC hex; the Eszett ß is 223 which is DF hex, padded 00DF.
Then write an uppercase ‘U’, a plus sign ‘+’, and the four nibbles: U+20AC U+00DF
In mksh, JSON, etc. it’s a backslash ‘\’, a lower-case ‘u’ and four nibbles.

Otherwise, your Unicode codepoint will be, numerically, between 65536 and 1114111, inclusive, that is hex 10000 to 10FFFF. (There’s nothing on 65534 and 65535, nor above these figures.) In this case, convert it to hex, zero-pad it to eight nibbles and write it as an uppercase ‘U’, a hyphen-minus ‘-’ and the eight nibbles. In C-like escapes for environments supporting the Unicode SMP, that’s a backslash ‘\’, an upper-case ‘U’ and eight nibbles. Do not, in either case, use less (or more) hex digits than specified here. For example, there’s a famous Unicode codepoint U-0001F4A9 “PILE OF POO”. That’s not the same as U+1F4A9. The latter reads as U+1F4A “GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA” and a digit 9 (Ὂ9). Be educated.

Since this wlog runs on MirBSD, which limits itself to the Unicode BMP voluntarily, and as nōn-BMP is not widespread anyway, I cannot reproduce the “PILE OF POO” here, but you can just duckduckgo it.

Let’s start a convention: bare-metal machines have the linguistic male gender („der Computer“, he needs to be rebooted), whereas VMs have the linguistic female gender („die virtuelle Maschine“, she runs better since the last upgrade of Linux-KVM), and neutral linguistic gender is used when you cannot or do not want or need to make such distinction.
This is, of course, entirely unrelated to human gender, but not unrelated to #debian-68k (on OFTC) discussions ;-)

ObRant: DO NOT USE xz COMPRESSION LEVELS ABOVE 6! (For -7 we can make exceptions, for example in Debian *-dbg or *-source packages.) You may use -e if you absolutely need the better compression, but please think of the poor sods who have to create the archives. You must not use the highest compression levels -8 or -9 since they have absolutely insane memory requirements on compression and will still hinder machines with less RAM on decompression. (Using -e only affects CPU usage at compression time; decompression is exactly as fast and memory-consuming as without.) Furthermore, DO NOT CHOOSE A COMPRESSION LEVEL WITH A DICTIONARY SIZE MUCH LARGER THAN THE DATA TO COMPRESS, as that makes absolutely no sense and will rather worsen than improve compression. As a reminder, xz uses the following dictionary sizes:

  • 256 KiB at -0 (compresses better than gzip(1) and faster than either gzip(1) or bzip2)
  • 1 MiB at -1
  • 2 MiB at -2 (compresses better than gzip(1) and bzip2 without losing much speed)
  • 4 MiB at -3 and -4 (the difference is in the match finder between these two levels)
  • 8 MiB at -5 and -6
  • 16 MiB at -7 (186 MiB RAM used to compress a file)
  • 32 MiB at -8 (370 MiB RAM used to compress a file)
  • 64 MiB at -9 (674 MiB RAM used to compress a file)

Decompression uses less than 1 MiB more than the dictionary size, but the dictionary must always be allocated wholly. (You’re fine to use custom presets, but mind the RAM usage!) As a general rule, if you have something of up to 20 MiB to compress, -4 is fine, and -5 will only be better if you have similar data spread across the whole of the file instead of close to each other. When I make mksh distfiles, I instead put files close to each other that have related content, which improves compression much more nicely without penalising low-memory systems; for example, you could put documentation, Makefiles, scripts, m4(1) files, and C source code into groups before archiving, instead of doing it alphabetically.

Another note on bzip2: its decompression is slow. I see no reason to use it any more, at all. Use gzip(1) if you care for compatibility or have an issue with xz not having a free copyright licence, and xz otherwise.

mksh made quite some waves (machine translation of the third article) recently. Let’s state it’s not just Amigas – ara5 is a buildd running the Atari kernel, an emulated though. On the other hand, the bare-metal Ataris used to be the fastest buildds, so I expect we get them back online soonish. I’m currently fighting with some buildd software bugfixes, but once they’re in, we will make more of them. Oh, and porterboxen! Does anyone want to host a VM with a porterbox? Requirements: wheezy host system (can be emulated), 1 GiB RAM, one CPU core with about 6500 BogoMIPS or more (so the emulated system has decent speed; an AMD Phenom II X4 3.2 GHz does just fine). Oh, and mksh is ported to more and more platforms, like 386BSD 0.0 with GCC 1.39, and QNX 4 with Watcom… and more bugfixes are also being worked on. And let’s not forget features!

jupp got refreshed: it’s got a bracketed paste mode, which is even auto-enabled on xterm-xfree86 (though the xterm(1) in MirBSD’s a tad too old to know it; will update that later, just imported sendmail(8) 8.14.6 and lynx(1) 2.8.8dev.15 into base, more to come) and will be enhanced later (should disable auto-indent, wordwrap, status line updates, and possibly more), lots of new functions and bindings, now uses mkstemp(3) to create backup files race-free, and more (read the NEWS file).

In MirBSD, Benny and I just added a number of errnos, mostly for SUSv4 compliance and being able to compile more software from pkgsrc® without needing to patch. This is being tested right now (although I should probably go out and watch fireworks in less than a half-hour), together with the new imports and the bunch of small fixes we accumulate (even though most development in MirBSD is currently in mksh(1) and similar doesn’t mean that all is, or worse, we were dead, which we aren’t). I’ll publish a new snapshot some time in January. The Grml 2012.12 also contains a pretty up-to-date MirBSD, with a boot(8/i386)loader that now ignores GUID partition table entries when deciding what to use for the ‘a’ slice.

If you haven’t already done so, read Benjamin Mako Hill’s writings!

Der heilige… Frieden?

15.12.2012 by tg@
Tags: debian politics

(Apologies for putting this on Planet Debian, but it says the one or other non-English post is okay as long as it’s an exception. I feel I need to reach more people with this, but don’t feel like translating this into English right now.)
Update: Tanguy asked for a short English summary: it’s me ranting against the rioting against muslims and the call for more CCTV surveillance after a possible bomb was found at the train station.

In Bonn herrscht immer noch „Bombenstimmung“, wenn man z.B. auf die Webseite der Lokalzeitung schaut – von dem Amoklauf in Connecticut, über den sich im IRC gewunder wird, ist immer noch nichts zu sehen, dafür wird fleißig wider „Islamisten“ gehetzt.

Ich finde das besorgniserregend, muß doch jetzt jeder Angehörige des Islams fürchten, verfolgt oder benachteiligt zu werden. Das reizt doch erst recht zum Gegenschlag, bei dem dann auch Menschen, die absolut nicht mit der hier vorherrschenden Meinung und Politik übereinstimmen, getroffen werden können.

Ich persönlich habe kein Problem mit Menschen anderen Glaubens oder anderer Weltanschauung, solange wir friedlich miteinander leben können. Ich teile eure Unzufriedenheit mit dem herrschenden Staat, der immer weitergehenden Überwachung, Unterdrückung von Leuten, die nicht dem vorherrschenden Menschenbild entsprechen (egal an welchen Kategoriën), und bitte die, die dies lesen, nochmal nachzudenken, bevor sie etwas tun, was hinterher Unschuldige trifft oder gar in „friendly fire“ ausartet.

Hat eigentlich wer die in Bad Godesberg ausgegebenen Koran-Bücher sich mal angeschaut? Als ich davon las, war ich ja zugegebenermaßen neugierig, weil ich vom Koran leider eher wenig kenne, weiß aber nicht, wie neutral oder eben nicht die Übersetzung gehalten ist. Anhand dessen, was ich bereits mitbekam, sollte das eher friedlicher sein als was durch spätere Theologen festgelegt wurde – wie ja auch zum Beispiel im Christentum, aber über die Horrorepisoden der christlichen Kirche will ich jetzt auch nicht mich auslassen, in der Hoffnung, daß auch diese sich mit den Jahren gebessert hat. (Ist nur halt das Problem mit den Leuten, die die „alten Hetzparolen“ jetzt noch verbreiten. Ist wie im Netz mit den Groupies von Theo de Raadt, die noch asiger zu Leuten sind als er selber.) (Außerdem muß man ja befürchten, durch Besitz eines Korans schon vorverurteilt zu werden heutzutage *seufz*… ich finde das nicht gut!)

Update (ich vergaß): auch der Ruf nach mehr Videoüberwachung ist nur Panikmache. Das geht nur zu Lasten des Normalbürgers. Vielleicht lassen sich noch Kleinstdelikte wie Taschendiebstahl damit abschrecken, aber gerade diese Bomben und dergleichen sind doch oft von Leuten, die vor Konsequenzen keine Angst haben, organisiert. Die werden dann maximal Märtyrer. Ich wiederhole nochmal für die Politiker und die ganz langsamen unter den Lesern: Überwachung verhindert keine Straftat.

Update 11.01.2013: Mittlerweile hat auch Fefe was dazu.

There is one week left to submit your talk proposals for the BSD devroom at FOSDEM 2013. We still have quite a few slots open, so do not be shy! See the original announcement below:

FOSDEM 2013 will take place on February 2-3, 2013, in Brussels, Belgium. Just like in the last years, there will be both a BSD booth and a developer's room (on Sunday).

The topics of the devroom include all BSD operating systems. Every talk is welcome, from internal hacker discussion to real-world examples and presentations about new and shiny features. The talks will be 45 minutes including discussion. Feel free to ask if you want to have a longer or shorter slot.

If you want to do a talk, please submit your proposal to

bsiegert at google.com

and include the following information:

  • Your name
  • The title of your talk (please be descriptive, as titles will be listed with ~400 from other projects)
  • A short abstract of one to two paragraphs
  • A short biography introducing yourself
  • Links to related websites/blogs etc.

The deadline for submissions is December 17, 2012. The talk committee, consisting of Daniel Seuffert, Marius Nünnerich and Benny Siegert, will consider the proposals. If yours has been accepted, you will be informed by e-mail within one week of the submission deadline.

Before we begin, everyone should read up on hashtables and what open addressing / closed hashing is. The context is lines 111‥190 of Python’s Objects/dictobject.c as of today (so we get the line numbers straight).

(I’ve reworded this wlog entry a bit; I originally wrote it too late at night for it to read coherent.) Basically, I’ve got an application where I’d like to use a hashtable for a number of things – not as generic as Python, and with focus on small footprint. I’d like to offer associative arrays in a scripting language, where the keys are always arbitrary byte strings excluding NUL. Also, I’d like to use the hashtable as backend for indexed arrays, where the keys are uint32_t and the usual use case is sequential. Finally, I’m using it for several internal tables, such as a list of keywords, one of builtins, one of special variables, etc. which is a reason for me to not use a self-balancing binary tree as data structure (reading further below might suggest that, but getting a sorted list of hashtable keys is not the focus, though not unimportant).
My questions on this are:

① Why is the shift on perturb done after its first use? In my experiments (using 32-bit width everywhere), for the pathological case of an 8-element (i = 3) table with three entries 0, 0x40000000 and 0x800000000, the “second round” yields 1 for all three, so it cannot have to do with the upper bits. My lookup looks like:

	mask = 2ⁱ - 1;
	j = perturb = hash(key);
	goto find_first_slot;

	 find_next_slot:
	j = (j << 2) + j + perturb + 1;
	perturb >>= PERTURB_SHIFT;
	/* FALLTHROUGH */

	 find_first_slot:
	entry = table[j & mask];
	if (!match(entry)) goto find_next_empty_slot;
 

This means that my first check is always the bare hash (so “only do it if needed” is no reason) and, since I’m using gotos, I could just move the perturb >>= PERTURB_SHIFT; line before the line recalculating the next j to use. This seems to make more sense, even in the face of Python. (I actually looked at the Python file’s comments again today because I thought to use a different resolution, but they have a good rationale for using the multiplication by 5.)

② Why can’t we just use i as the PERTURB_SHIFT? Sure, this changes a shift-right by a constant, which can possibly be encoded as immediate value in assembly (unless you’re on a pre-80186, which can only do SHR AX,1 and SHR AX,CL but not SHR AX,4, but that’s outside of mksh’s scope) into a right-shift by a variable, but i is already known, and I think the behaviour is better (it wouldn’t eat any bits; assume the same 8-entry hashtable and pathologic keys 0, 8 and 16). Again: who do I think I am to go against the wisdom of the Python people, who seem to have shed more thought on this than everyone else I saw, asked, read about (including Spammipedia). That’s why I’m asking here. On that reference: I don’t support spammers or people nagging for donations or premium accounts, like Xing and Groundspeak/Geocaching.COM, at all. In fact, I urge others to do the same, so it really hurts them; it may be their business model, but not if they spam me. Besides, OpenCaching.DE exists.

Another thing is: to avoid CVE-2011-4815, I’m randomising the hash used, with one “seed” value per hashtable, changed before a resize operation. I originally thought to seed it with nonzero, but then I have to rehash on hashtable resize, so I’ll be XORing the final hash value instead (thanks ciruZ for the idea). I’m thinking of omitting that for indexed arrays, as an attacker almost certainly cannot determine the keys there. (To directly use the indexed array keys, which are already uint32_t, as hashes makes using i from ② even more important.) The hash I’m using is a modified Jenkins one-at-a-time called NZAAT: it’s my new generic standard nōn-cryptographic hash, and the changes are thus: while adding a byte, another increment of the hash is done (so NUL counts), and the finaliser got prefixed with the shift-left-add+shift-right-xor sequence of the adder (but not adding any value or the +1), to get best avalanche for all bytes. I actually compiled several versions of Hash.cs on a Windows® VM at work to analyse the original one-at-a-time and all of my modifications; these turned out to be the simplest ones (I originally had added 0x100 instead of 1, but the effect was the same, and +1 is usually cheaper on most CPUs).

Also, to avoid people being able to get to the seed, a user will always get only a sorted list of hashtable keys (numeric for indexed arrays, ASCIIbetically otherwise; see also my thoughts on JSON from the previous wlog entry). What algorithm do I use? For strings, comparisons are much more expensive, so I’d like to keep them low. Memory use is also a factor; allocating one large(r) block is better than many small ones due to the pool allocator overhead and due to portability to ancient Unicēs (which is another reason for me to use a hashtable which is a small struct plus an array of pointers, and then pass the list of keys as array of string pointers, instead of a tree). For both reasons, I’m thinking a relatively simple MergeSort: I need to allocate the result array anyway, so I can just get two and free the one that isn’t the end result, and it’s AFAICT the cheapest on comparisons other than Tree Sort (which nobody really seems to use, and which would effect to using a balanced binary tree again). Since keys are unique, stability and duplicate handling is never an issue. I’d like to use only one algorithm and one data structure, not a combination, as compactness is a design goal.

Please drop your thoughts on Freenode, e.g. by /msg MemoServ send mirabilos your text here or per eMail to the domains debian, freewrt or mirbsd, which are organisations, with the localpart tg. Or just contact me as usual, if you’re already acquainted. Or lookup 0xE99007E0. Thanks in advance! (Especially, Python Developers’ thoughts are welcome.)

The following proposal extends the JSON specification, with the idea of using JSON as an information interchange format, rather than just a way of writing certain ECMAscript values. They do not add anything but only restrict valid JSON content and encoders with some rationale.

First of, I’d like to remind everyone, including JSON’s author, that JSON is case-sensitive, except in the four hexdigits after a backslash-u sequence in a String.

Second, I’d like to remind everyone that JSON is not binary-safe. No way around that, it implements Unicode (actually, 16-bit UCS-2, and it doesn’t guarantee that UTF-16 surrogates are correctly paired) text. I also consider only UTF-{8,{16,32}{B,L}E} valid encodings for JSON. (No PDP endian, either. Sorry, guys.)

For my first proposal, I’d like to point out CVE-2011-4815 which was about overflowing hashtables. The obvious fix is to randomise the hash per hashtable; to ensure this doesn’t leak, we sort ASCIIbetically the keys of an Object in the encoder. (Using Unicode is good here – we can just sort the keys as UTF-8 strings by their uint8_t value or as Unicode (UCS-2 or even UCS-4 or UTF-16) strings by the codepoints.) JSON was never preserving the order of elements in an Object anyway so we make it standardised (we still accept any order, and, when parsing, in collision cases, the later value wins). This also helps diffs.

For my second proposal, I’d like to forbid \u0000, \uFFFE, \uFFFF in strings. The first because many implementations use C strings, and for an information interchange format this is better; it also has security implications to allow NUL in a string. The other two, but not unpaired UTF-16 surrogates (as ECMAscript uses UCS-2 and got UTF-16 only later) because they’re not valid Unicode; JSON was not binary-safe already so why bother. Among other benefits, this also helps implementations.

For my third proposal, I’d like to agree that implementations should impose a nesting depth limit that may be user-defined, and in the face of which, cyclic checking may be ignored by an encoder. I emit nesting depth overflows as literalnull; might also throw an error. Since I was asked, the common “standard” value is to restrict nesting depth is 32, unless the user specified one. (I also saw 8, but 32 WFM pretty well.) Most seem to use it even if it may seem low at first. Only specialised applications probably need more, and they can always pass a value.

For my forth proposal, backslash-escape U+007F‥U+009F always. It may upset humans, editors, databases, etc. (This paragraph is newly added, after some IRC discussion.)

All these do not permit anything that wasn’t accepted to be accepted afterwards. I’ve got a fifth proposal which changes acceptance rules – but only for a subset of parsers: formally JSON is defined in ECMA-262 as industry standard that, in contrast to RFC 4627, always allowed any Value as top-level element of a JSON text. I’d like to make it so, and ignore the RFC’s requirement for it to be an Object or Array. Even so, the first two characters (after the BOM, if any) of a JSON text always are in the non-NUL 7-bit ASCII range, allowing for encoding detection. (This is done by the NUL octet pattern in the first four octets.)

JSON has only taken off because it’s a tightly defined simple format that can be used “everywhere” and isn’t too awful for humans (escaping not needed for U+0020‥U+D7FF and U+E000‥U+FFFD after all, although I’d also take the C1 control characters out, see my forth proposal above). I’ve started to use a trailing comma in indexed and associative arrays in code I write at work, when the array values are one a line, to help version control systems to do their diffs, but refrain from asking for a JSON extension to permit that in order to not endanger compatibility any (no comment needed, it’s just not worth it), but I’d like my above proposals to be followed by implementators (and I’m one of them).

Some more discussion with Jonathan pointed out that JSON5 allows for trailing commata in Object and Array; IMHO the only feature of it that is not bad or outright harmful. I’ll probably keep from accepting them because, on their own, they’re not that useful, and I usually would run JSON texts, even configs, through a parser/encoder roundtrip to pretty-print them which would lose them anyway.

As for binary-safeness: probably best to just use base64 and let the outer layers worry about compression. The data is usually unrelated to the JSON-encoded structure, and even if it’s related to other data the base64 representation is usually similar (unless misaligned).

Update 02.12.2012 – Wrong I was about the first two characters: “"€"” is a valid JSON text. Still possible to peek at four octets and determine the encoding by ordering the tests; updated my notes.

Go on NetBSD

12.10.2012 by bsiegert@
Tags: golang

Starting today, I am running continuous builders for Go on NetBSD/386 and NetBSD/amd64. Both are running fine, so Go is now (semi-officially) supported on NetBSD. You need at least version 5.99.51 or, even better, a NetBSD-6.0 release candidate. In addition, the latest Go release (1.0.3) does not have the NetBSD support, so you must build from source on tip.

Go 1.1, which is expected for January 2013, will support NetBSD on x86 officially.

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

MirOS Logo