Namespaces are used to provide a process or a group of processes with the idea of being the only process or group of processes in the system.
They are a way to detach processes from a specific kernel layer assigning them to a new one. Or in other words, they are indirections layers for global resources.
Lets imagine this as an extension to classical chroot() syscall. When setting a new root calling chroot, kernel was isolating new branch from existing one, and thus creating a new namespace for the process.
Namespaces now provide the basis for a complete lightweight virtualization system, in the form of containers.
Currently, linux support following namespaces
NameSpace
from Kernel
Descrition
UTS
2.6.19
domain and hostname
IPC
2.6.19
queues, semaphores, and shared memmory
PID
2.6.19
pid
NS
2.4.19
Filesystems
NET
2.4.24
IP, routes, network devices…
USER
3.8
Uid, Guid,…
I let individual namespaces explanations as simple as this, or in other words, for another day.
Namespaces API
In Linux kernel there are not distinction between process and threads implementions, threads are just light weight processes. Threads are also created by calling clone() but with different arguments (CLONE_VM mainly). From the kernel point of view, a process/thread is a task.
Namespaces can be nested. Limit for nesting namespaces is 32.
Namespaces can be created or modified by clone(), unshare() and setns() system calls. All of them are not POSIX system calls, so only available in linux.
clone() system call is more specific than fork() or vfork() system calls. They are alike, because they are implemented by calling do_fork() function with different flags and args.
copy_process() function in do_fork() calls copy_namespaces(). In case any namespace flags present, it just uses parent namespaces. (fork(), vfork() behavior)
Flags like CLONE_NEW* flags have same effect as they have in clone(). All others ones that unshare() accepts has reverse effect. Since they were copied in above create_new_namespaces example.
When a task ends, all namespaces they belong to that does not have any other process attached are cleaned. This means, mounts unmounted, network interfaced destroyed, etc.
fd argument specifies the namespace to join. It is translated to a nscommon struct calling get_proc_ns(file_inode(file)). We will cover fds in next point.
Per process namespaces can be found under /proc/$pid/ns.
12345678
$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx 1 ubuntu ubuntu 0 Jan 5 21:12 ipc -> ipc:[4026531839]lrwxrwxrwx 1 ubuntu ubuntu 0 Jan 5 21:12 mnt -> mnt:[4026531840]lrwxrwxrwx 1 ubuntu ubuntu 0 Jan 5 21:12 net -> net:[4026531962]lrwxrwxrwx 1 ubuntu ubuntu 0 Jan 5 21:12 pid -> pid:[4026531836]lrwxrwxrwx 1 ubuntu ubuntu 0 Jan 5 21:12 user -> user:[4026531837]lrwxrwxrwx 1 ubuntu ubuntu 0 Jan 5 21:12 uts -> uts:[4026531838]
Each process namespace has an inode number, that corresponds with a namespace struct. If two tasks share same number, they belongs to same namespace. Inode for ns files in namespaces is not the same as stat -c %i shows. They are sym links.
For namespaces, like sockets or pipes inode number is shown in form type:[inode].
123456789101112131415
# for pid in 643 23681 32178 ; do readlink /proc/$pid/ns/mnt ; donemnt:[4026531840]mnt:[4026532430]mnt:[4026532430]# for pid in 643 23681 32178 ; do md5sum /proc/$pid/mounts ; done8dddf7d919672a56849bb487840b94e0 /proc/643/mounts
70159c37e8c8f16c61ceaa047ad9528a /proc/23681/mounts
70159c37e8c8f16c61ceaa047ad9528a /proc/32178/mounts
# unshare -m /bin/bash # readlink /proc/$$/ns/mntmnt:[4026532493]# for pid in 643 23681 32178; do stat -c %i /proc/$pid/ns/mnt ; done1308868
1308731
1308686
Proc namespaces files are implemented through proc_ns_operations struct.
There are availabe two commands that correspond to each system call (like most of the shell commands that are just called like any system call). unshare for unshare() and nsenter for setns().