New Thoughts About Package Management¶
On my Linux computers, I like to use only the system wide package management to
install software. On Fedora that is RPM, on Ubuntu that have been Debian
packages. Software that is in the repository can be installed with a simple
dnf install PROGRAM and can be uninstalled with
dnf erase PROGRAM. One
can be certain that all files that have been installed are removed again; the
only leftovers can be configuration files which one sometimes wants to keep
This works because the Linux file system has directories
share. All programs get installed into
bin, all libraries (DLL files on
Windows) get installed into
lib, auxiliary files like pictures and
documentation get installed into
share. Those directories are nowadays
/usr, so one has
A RPM package is basically a ZIP archive which contains the full path and gets
extracted into the root directory. The system wide package management tool
records this list of files and can uninstall it later on. For the package
xss-lock, those files are included:
/usr/bin/xss-lock /usr/share/bash-completion/completions /usr/share/bash-completion/completions/xss-lock /usr/share/doc/xss-lock /usr/share/doc/xss-lock/NEWS /usr/share/doc/xss-lock/dim-screen.sh /usr/share/doc/xss-lock/transfer-sleep-lock-generic-delay.sh /usr/share/doc/xss-lock/transfer-sleep-lock-i3lock.sh /usr/share/doc/xss-lock/xdg-screensaver.patch /usr/share/licenses/xss-lock /usr/share/licenses/xss-lock/LICENSE /usr/share/man/man1/xss-lock.1.gz /usr/share/zsh /usr/share/zsh/site-functions /usr/share/zsh/site-functions/_xss-lock
I like this approach because I can just uninstall the whole software with one command. Even better, updates can be installed for every single program on my system because they are all RPM packages. When an update comes in, the old version is removed and the new version is installed. Also I can just have a list with all the packages that I want to have. When I have a fresh Linux installation, I can just let the package manager install all the software that I want to have. Then I just have to wait for the download and installation to have everything in place.
On Windows, my impression is that every software installer is a self-extracting ZIP archive that just spews files wherever it likes. It records the files installed somewhere and provides an uninstallation program that reads that list of files again. This is not standardised and Windows is not aware of the installed programs and their versions to the extent that a Linux distribution is. This makes updating the software a mess, one has to download new installers from the various websites and click through their steps again. Installing all software on a fresh Windows installation is a lot of work.
There are other ways to manage software on a Linux system than RPM packages:
Download the source code and run
make && make installor
python setup.py installor something else, depending on the programming language and build system used.
Use a package tool that is inherent to the programming language. Those use some language specific repository which has no relation to your Linux distributions. This means that one has to use a different tool for each language. Examples are:
- Ruby packages can be installed with
- Python packages can be installed with
easy_install, from PyPI.
- TeX packages with the TeX Package Manager from CTAN
- PHP packages with
bearand other tools
- Rust packages with Cargo
- R packages from CRAN
- Ruby packages can be installed with
Use one of the new portable app approaches that are implemented in Snappy, Flatpak (previously XDG-App) or the macOS applications. That is a bundle that contains all the depending libraries (and gets quite large) in a compatible version such that you just put one file somewhere and can start it.
This sounds easy to use but has a major drawback: Disk space is wasted by copies of the same library used by different programs. In case a library gets a security update, all the programs that use it need to be updated. Since there is no central update mechanism, it will take weeks or months until most of the programs are updated.
In conjunction with RPM packages, this is yet another mechanism to update software and I think that it is inferior to RPM packages.
In either case, the complexity is a lot higher. It gets worse when you mix
Python packages from the distribution package management with the packages
pip. When it comes to upgrading a package, which tool do
dnf? Debian used to have really ancient LaTeX packages.
People would then uninstall every LaTeX package from the Debian package
management and install everything using the TeX Package Manager. This would at
least have a separation of responsibilities. When you start installing some
Ruby Gems and a couple of Python packages this way, upgrading would become a
On Windows, all those programs have a legitimate purpose: There is no system wide package management. Therefore it is just fine to install third party software somewhere on your system, it is a mess of third party software anyway. And then having at least a tool for all Python packages is an improvement over nothing at all.
My stance so far has been that those third party package manager have no purpose on my Linux distribution, there I can use the superior package management to install everything with a unified mechanism.
That is, until I have to do serious work on a supercomputer. There you do not get any administrative permissions. Actually, I would not want them because I would not want to be responsible for endangering other people’s work. So it is relaxing to be a simple user on that large system without having to worry about the other users. The problem is that I cannot just install software using the package management. I cannot do anything outside of my personal directory, really.
At this point programs like
pip are really handy because they offer
install --user. Instead of installing the files into
/usr/share, it installs into
local is chosen by me, other people use
.local or something else,
it does not really matter; what matters is that it is in my home directory
where I can do whatever I want. The trouble is that I now have programs in
$HOME/local/bin. But it gets worse: They use a RedHat
Linux 6 which ships (in my standards) ancient software. I cannot use that
software for my work because it is simply too old. The C++ library I need to
compile uses C++11, that needs at least GCC 4.9. The system wide installation
is GCC 4.4. On the system, there are multiple versions of software packages
installed in different paths. An utility called
module then allows you to
pick the version that you want. So now we have lots of places where software is
installed. Those are the
gcc versions installed on JUQUEEN:
This is needed because the system packages in the RedHat Linux are all compiled
with version 4.4. User that need a more recent compiler can then select whether
they want to have 4.8 or 4.9 for their project. If I wanted to have the version
6.2 that I have on my Fedora machine, I would have to installed it from source
and then it would reside in
C++ libraries that I need are not installed there. I need to compile them in my home directory. RPM packages are build in a clean environment using a compilation script that makes it reproducible. It means that you know what compilation programs were used and one can reason about the system. When I install something into my home directory, I have no idea what files got installed, I don’t know the version and I don’t know the exact compilation commands. There is no way to cleanly uninstall. All the files are thrown in together and one cannot separate them again. See this section from the directory structure:
local |-- bin | |-- cfgtransf | |-- chroma | |-- chroma-config | |-- const_hmc | |-- hmc | |-- lime_contents | |-- lime_extract_record | |-- lime_extract_type | |-- lime_pack | |-- lime_unpack | |-- print_nodeset | |-- print_xpath | |-- purgaug | |-- qdp++-config | |-- qio-config | |-- qio-convert-mesh-ppfs | |-- qio-copy-mesh-ppfs | |-- qmp-config | |-- replace_xpath | |-- spectrum_s | |-- t_leapfrog | |-- t_lwldslash_array | |-- t_lwldslash_new | |-- t_lwldslash_pab | |-- t_lwldslash_sse | |-- t_meas_wilson_flow | |-- t_mesplq | |-- t_minvert | |-- t_ritz_KS | |-- xml2-config | |-- xmlcatalog | `-- xmllint |-- include | |-- actions | | |-- actions.h | | |-- boson | | | |-- boson.h | | | `-- operator | | | `-- klein_gord.h | | |-- ferm | | | |-- fermacts | | | | |-- asqtad_fermact_params_s.h | | | | |-- asqtad_fermact_s.h | | | | |-- clover_fermact_params_w.h | | | | |-- eo3dprec_s_cprec_t_clover_fermact_w.h | | | | |-- eo3dprec_s_cprec_t_wilson_fermact_w.h | | | | |-- eoprec_clover_extfield_fermact_w.h | | | | |-- eoprec_clover_fermact_w.h
together, it cannot be taken apart any more. In order to make the compilation
reproducible, I have written a compilation script does a fraction of the stuff
RPM would do for me. The snippet that compiles GMP looks like this:
base_flags="-O2 -finline-limit=50000 -Wall -Wpedantic $color_flags" if ! [[ -d gmp-6.1.1 ]]; then wget https://gmplib.org/download/gmp/gmp-6.1.1.tar.xz tar -xf gmp-6.1.1.tar.xz fi pushd gmp-6.1.1 cflags="$base_flags" cxxflags="$base_flags" if ! [[ -f Makefile ]]; then ./configure $base_configure CFLAGS="$cflags" fi if ! [[ -f build-succeeded ]]; then nice make $make_smp_flags make install touch build-succeeded pushd $prefix/lib rm -f *.so *.so.* # Force static linkage. popd fi popd
If I just had to install one library, I would not really need this script. But
the library of interest, Chroma, needs QDP++ and GMP. There is a system wide
installation of GMP but its headers are meant for GCC 4.4 and not for GCC 4.9.
Therefore I need to build it from source with GCC 4.9. QDP++ needs QMP and
libxml2. So those needs to be build first, bottom up, in order to compile
the depending libraries. At least I did not need that many libraries to get it
On the supercomputer I now have a few folders of source code, a shell script
that compiles it and a
local folder where I have installed my stuff. It
works, but it feel nowhere as clean as RPM packages feel on my laptop, the
opposites could not be more extremes. Installing more software is virtually
impossible because I need to install all the dependent libraries myself.
htop is not possible because
libinput is missing. I gave up
on that because
htop is not that important for me. So now I just install
the software that I absolutely need and do everything else on my laptop.
What I do there feels like Linux without a package management. On Windows, you at least have installers that install stuff somewhere. You can get some programs in a portable version. But on that supercomputer, I am basically doing Linux From Scratch which feels like Linux supposedly was 20 years ago.
There must be a better way, right? I fear that right now, there is not much that I can do. The software installation on Linux distributions assumes that you have administrative permissions and that only one version is installed at a time. For most personal computers, this is indeed a sensible solution. On a supercomputer where updating anything will break something, this is not sensible. People depend on certain software versions and do not want to spend their time adjusting their code every month. Multiple versions need to be installed in parallel. People need to be able to install software for their own use.
The Linux package management is “all or nothing”. So far I was in the “all” camp at home but now I am thrown into the “nothing” camp on the supercomputer. There is not much in between, and that is not very cool. Before, I thought that Snappy and Flatpak try to solve a problem which does not exist. Now I do see that this problem exist and look forward to the proposed solutions.