Unprivileged containers: mount namespace

The mount namespace isolates the filesystem mount points for a process, so you can have different mounted filesystems.

Code

We are using the same base class as in part 2 of this series. All code is available at bitbucket

import os

from base import ContainerBase
from system import libc

cb = ContainerBase()
cb.namespace_flags = libc.CLONE_NEWUSER | libc.CLONE_NEWNS
cb.run(os.system, 'bash')
cb.wait()

Note that the flag for a mount namespaces is CLONE_NEWNS and not CLONE_NEWMOUNT or similar. This is for historical reasons; it was the first namespace.

Running this short script will put you in a shell with both a new user namespace and a new mount namespace.

You can now create a new directory and try various mount commands. It will become apparent that not all commands will work as a regular user. The following sections will discuss what will work and what will not.

Directory and file bind mounts

It is possible to use bind mounts. A directory can be bind mounted to another directory and the directory will appear to be at both places in the file system hierarchy.

$ mkdir mnt
$ mount --bind /etc mnt
$ ls mnt
DIR_COLORS                grub.d        printcap
DIR_COLORS.256color       grub2.cfg     profile
DIR_COLORS.lightbgcolor   gshadow       profile.d
GREP_COLORS               gshadow-      protocols
ImageMagick-6             gss           pulse
....

Can also bind mount a file on to a file

$ echo > fmnt
$ mount --bind /etc/passwd fmnt
$ head -2 fmnt
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin

From an another terminal in the root namespace you will just see the empty file and directory.

When done detach the mount points with umount:

$ umount mnt
$ umount fmnt

Or just exiting the namespaced shell will also destroy the mount points.

There is no need for the file or directory that you are covering to be owned or have any particular permissions.

$ ls -l /etc/shadow
----------. 1 nfsnobody nfsnobody 1722 Sep 29 14:18 /etc/shadow
$ mount --bind passwd /etc/shadow
$ cat /etc/shadow
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin

Of course this change to /etc/shadow is only visible in your namespace, it has no effect on the rest of the system!

Limitations

If you are really root, rather than just root in a user namespace, then you can bind mount a directory that has other file systems mounted beneath it. The bind mount does not include those other filesystems and any file that is 'under' the mounted file system will be visible.

For example if /home is a separate file system, then if you mounted / onto a directory 'mnt', then you would usually see an empty directory at mnt/home.

In a user namespace however, it is simply not possible to bind mount a file system that has other file systems mounted beneath it by itself. You can however user the --rbind option to recursively bind the filesystem and the other filesystems beneath it.

Why? I speculate that this is done to prevent revealing files that are beneath the mounted filesystem.

Other file system that can be mounted

At this stage you are also able to mount the tmpfs, ramfs and devpts filesystems. Later on in this series it will be possible to mount the proc and sysfs filesystems.

It is worth noting that overlayfs filesystems can be mounted on Ubuntu but not on Fedora, as Ubuntu modifies the standard kernel to allow that.

What is not possible

  • You cannot mount regular filesystems even from file images that you can access.
  • In a non-user mount namespace you can un-mount a filesystem that is mounted in the parent namespace. You cannot do this in a user namespace.