Developers’ Weblog

Sponsored by
HostEurope Logo

Developers’ Weblog

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Let’s start a convention: bare-metal machines have the linguistic male gender („der Computer“, he needs to be rebooted), whereas VMs have the linguistic female gender („die virtuelle Maschine“, she runs better since the last upgrade of Linux-KVM), and neutral linguistic gender is used when you cannot or do not want or need to make such distinction.
This is, of course, entirely unrelated to human gender, but not unrelated to #debian-68k (on OFTC) discussions ;-)

ObRant: DO NOT USE xz COMPRESSION LEVELS ABOVE 6! (For -7 we can make exceptions, for example in Debian *-dbg or *-source packages.) You may use -e if you absolutely need the better compression, but please think of the poor sods who have to create the archives. You must not use the highest compression levels -8 or -9 since they have absolutely insane memory requirements on compression and will still hinder machines with less RAM on decompression. (Using -e only affects CPU usage at compression time; decompression is exactly as fast and memory-consuming as without.) Furthermore, DO NOT CHOOSE A COMPRESSION LEVEL WITH A DICTIONARY SIZE MUCH LARGER THAN THE DATA TO COMPRESS, as that makes absolutely no sense and will rather worsen than improve compression. As a reminder, xz uses the following dictionary sizes:

  • 256 KiB at -0 (compresses better than gzip(1) and faster than either gzip(1) or bzip2)
  • 1 MiB at -1
  • 2 MiB at -2 (compresses better than gzip(1) and bzip2 without losing much speed)
  • 4 MiB at -3 and -4 (the difference is in the match finder between these two levels)
  • 8 MiB at -5 and -6
  • 16 MiB at -7 (186 MiB RAM used to compress a file)
  • 32 MiB at -8 (370 MiB RAM used to compress a file)
  • 64 MiB at -9 (674 MiB RAM used to compress a file)

Decompression uses less than 1 MiB more than the dictionary size, but the dictionary must always be allocated wholly. (You’re fine to use custom presets, but mind the RAM usage!) As a general rule, if you have something of up to 20 MiB to compress, -4 is fine, and -5 will only be better if you have similar data spread across the whole of the file instead of close to each other. When I make mksh distfiles, I instead put files close to each other that have related content, which improves compression much more nicely without penalising low-memory systems; for example, you could put documentation, Makefiles, scripts, m4(1) files, and C source code into groups before archiving, instead of doing it alphabetically.

Another note on bzip2: its decompression is slow. I see no reason to use it any more, at all. Use gzip(1) if you care for compatibility or have an issue with xz not having a free copyright licence, and xz otherwise.

MirOS Logo