Linux containers
Different containers can use different subsets of:
Namespaces
- Mount (filesystems hierarchy) (
docker run -v ...
) - Network (devices, IP addresses, routing) (
docker run --net=...
) - Process IDs (
docker run --pid=...
) - User and group IDs (currently not used by Docker)
- UTS (hostname, domainname) (
docker run
's--uts
and-h
) - IPC (SysV IPC, message queues) (
docker run --ipc=...
)
- Mount (filesystems hierarchy) (
- Control groups (cgroups) — setting limits
- SELinux (use
--selinux-enabled
with Docker daemon) - iptables (use
--icc=false
with Docker daemon)
Building image
- Dockerfile:
FROM fedora RUN dnf -y install httpd && dnf clean all RUN dnf -y install /usr/bin/ps /usr/sbin/ip && dnf clean all RUN echo "Test Server" > /var/www/html/index.html CMD [ "/usr/sbin/httpd", "-DFOREGROUND" ]
- Build image:
host$
docker build -t httpd . Sending build context to Docker daemon 2.048 kB Step 0 : FROM fedora [...] Successfully built 911aa3a3937chost$
docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE httpd latest 911aa3a3937c 2 minutes ago 203.7 MB docker.io/fedora latest ded7cd95e059 4 months ago 186.5 MB
Running container
- Start (run) new container from this image:
host$
docker run --name httpd httpd & [1] 1556 host$ AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.4. Set the 'ServerName' directive globally to suppress this messagehost$
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9312c1c40a92 httpd "/usr/sbin/httpd -DFO" 41 seconds ago Up 40 seconds httpdhost$
docker inspect -f '{{ .NetworkSettings.IPAddress }}' httpd 172.17.0.4host$
curl http://172.17.0.4/ Test Server - Just the one process (plus the worker children) is run.
Namespaces
- PID namespace:
host$
docker exec httpd ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 /usr/sbin/httpd -DFOREGROUND 8 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 9 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 10 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 11 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 12 ? Rs 0:00 ps ax - Network namespace:
host$
docker exec httpd ip route default via 172.17.42.1 dev eth0 172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.4 - View namespace transitions on the host:
host#
pstree -S | grep docker |-docker(mnt)-+-httpd(ipc,mnt,net,pid,uts)---4*[httpd] | `-12*[{docker}]
Filesystems and volumes
- The image is mounted as root:
host$
docker exec httpd findmnt -n / / /dev/mapper/docker-253:0-396043-bb73e8f8e3c3e06d7480d4d0f7dd890ad 8e7eda186b541cb79a70ba2eba3cc8a[/rootfs] ext4 rw,relatime,context ="system _u:object_r:svirt_sandbox_file_t:s0:c224,c877",stripe=16,d ata=ordered - We get data / configuration to the container by bind-mounting volumes:
host$
mkdir /tmp/datahost$
echo "Test serving data from volume" > /tmp/data/index.htmlhost$
docker run --name httpd -v /tmp/data:/var/www/html:Z httpd &host$
docker inspect --format '{{ .HostConfig.Binds }}' httpd [/tmp/data:/var/www/html:Z]host$
ls -aZ /tmp/data | cut -d ' ' -f 1,4,5 drwxr-xr-x. system_u:object_r:svirt_sandbox_file_t:s0:c206,c497 . drwxrwxrwt. system_u:object_r:tmp_t:s0 .. -rw-r--r--. system_u:object_r:svirt_sandbox_file_t:s0:c206,c497 index.htmlhost$
curl http://172.17.0.8/ Test serving data from volume
Getting application to container
Usual advice:
- One service / daemon per container.
- Start the daemon / process directly.
- Connect containers to form desired setup (
docker run --link ...
). - Bind-mount volumes with configs / data to containers.
- Configuration in build-time.
Containerized setup
Usual setup:
Container: one service e.g. database | ⇐ link | Container: one service e.g. application | ⇐ link | Container: one service e.g. Web server | ||||
↑ ↖ bind mounts | ↖ bind mount | ⇑ port mapping ⇓ | ||||||
volume | volume | volume | ||||||
host | ||||||||
published ports |
Multi-container setup
Concerns with multi-container setups:
- Starting the individual components (container) manually is error-prone.
- There may be dozen of them in the setup.
- Components may expect Unix sockets and not support TCP.
- The external layer has to provide those shared socket locations.
- There is wealth of knowledge in
- SysV init scripts;
- systemd unit files;
- config / setup tools that assume everything on one machine.
- Extensive initial setup might be needed, in run-time.
Complex application
FreeIPA server:
- Dozen of daemons.
- They form an integrated solution.
- They expect to run on a single machine.
- Their libraries share config files (nss, krb5, ...).
- The
ipa-server-install
oripa-replica-install
setup tools play important role.- They currently do not support multi-host setups.
- The domain / realm for which FreeIPA is reflected in LDAP schemas or directory and file names.
- Only known in run-time.
Approach
- All parts in one container.
- Minimize number of volumes to one:
/data
. - In build time:
- Install software (
yum install -y freeipa-server
). - Move directories and files that will be populate fresh volume aside, to
/data-template
. - On image, create symlinks pointing to
/data-template
.
- Install software (
- During first run, when empty volume is detected:
- Populate the volume, mounted to
/data
, with the template content. - Run
ipa-server-install
(the setup tool).
- Populate the volume, mounted to
- During subsequent runs, just start the services.
Layout
Single FreeIPA server container | |||||||||
389 | KDC | DNS server | D-Bus | PKI/CA | HTTP Server | SSSD | ... | ||
Single image with symlinks to → | /data | ||||||||
↑ | bind mount | ||||||||
volume for one FreeIPA server instance | |||||||||
host |
Initial instance configuration
PID TTY STAT TIME COMMAND 1 ? Ss 0:00 /bin/bash /usr/sbin/ipa-server-configure-first 43 ? S 0:00 xargs /usr/sbin/ipa-server-install -U 44 ? S 0:01 \_ /usr/bin/python2 -E /usr/sbin/ipa-server-install -U --ds-password=Secret12345 --admin-password=Secret12345 -r EXAMPLE.COM --setup-dns --forwarder=192.168.100.1 74 ? S 0:00 \_ /usr/bin/perl /usr/sbin/setup-ds.pl --silent --logfile - -f /tmp/tmpPjmUla 89 ? S 0:00 \_ sh -c /var/lib/dirsrv/scripts-EXAMPLE-COM/ldif2db -n userRoot -i '/var/lib/dirsrv/boot.ldif' 2>&1 90 ? S 0:00 \_ /bin/sh /var/lib/dirsrv/scripts-EXAMPLE-COM/ldif2db -n userRoot -i /var/lib/dirsrv/boot.ldif 91 ? S 0:00 \_ /bin/sh ./ldif2db -n userRoot -i /var/lib/dirsrv/boot.ldif -Z EXAMPLE-COM 119 ? Sl 0:00 \_ /usr/sbin/ns-slapd ldif2db -D /etc/dirsrv/slapd-EXAMPLE-COM -n userroot -i /var/lib/dirsrv/boot.ldif 66 ? Ss 0:00 /usr/sbin/ntpd -u ntp:ntp -g -x
FreeIPA container running
PID TTY STAT TIME COMMAND 1 ? Ss 0:00 /bin/bash /usr/sbin/ipa-server-configure-first 1470 ? Ss 0:00 /bin/dbus-daemon --system --fork 1479 ? Ss 0:00 /usr/sbin/certmonger -S -p /var/run/certmonger.pid -n 2010 ? Ss 0:00 /usr/sbin/kadmind -P /var/run/kadmind.pid 2020 ? Ssl 0:00 /usr/bin/memcached -d -s /var/run/ipa_memcached/ipa_memcached -u apache -m 64 -c 1024 -P /var/run/ipa_memcached/ipa_memcached.pid 2043 ? Ss 0:00 /usr/bin/perl /bin/systemctl-socket-daemon /var/run/krb5kdc/DEFAULT.socket 0600 ipa-otpd@.service 2225 ? Sl 0:01 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM -i /var/run/dirsrv/slapd-EXAMPLE-COM.pid -w /var/run/dirsrv/slapd-EXAMPLE-COM.startpid 2274 ? Ss 0:00 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid 2502 ? Ss 0:00 sh -c export TOMCAT_CFG_LOADED="1"; export TOMCATS_BASE="/var/lib/tomcats/"; export JAVA_HOME="/usr/lib/jvm/jre"; export CATALINA_HOME="/usr/share/tomcat"; export CATALINA_TMPDIR="/var/cache/tomcat/temp"; export SECURITY_MANAGER="false"; export CATALINA_BASE="/var/lib/pki/pki-tomcat"; export CATALINA_TMPDIR=/var/lib/pki/pki-tomcat/temp; export JAVA_OPTS="-DRESTEASY_LIB=/usr/share/java/resteasy"; export TOMCAT_USER="pkiuser"; export SECURITY_MANAGER="true"; export CATALINA_PID="/var/run/pki/tomcat/pki-tomcat.pid"; export TOMCAT_LOG="/var/log/pki/pki-tomcat/tomcat-initd.log"; export PKI_VERSION=10.2.1; export TOMCAT7_USER="pkiuser"; export TOMCAT7_SECURITY="true"; export NSS_ENABLE_PKIX_VERIFY=1; export NAME=pki-tomcat; /usr/sbin/runuser -g pkiuser -u pkiuser -- /usr/libexec/tomcat/server start 2503 ? S 0:00 \_ /usr/sbin/runuser -g pkiuser -u pkiuser -- /usr/libexec/tomcat/server start 2504 ? Sl 0:11 \_ /usr/lib/jvm/jre/bin/java -DRESTEASY_LIB=/usr/share/java/resteasy -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.security.manager -Djava.security.policy==/var/lib/pki/pki-tomcat/conf/catalina.policy org.apache.catalina.startup.Bootstrap start 2635 ? Ssl 0:00 /usr/sbin/named-pkcs11 -u named 2645 ? Ss 0:00 sh -c export LANG=C; /usr/sbin/httpd $OPTIONS -DFOREGROUND 2646 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2647 ? S 0:00 \_ /usr/libexec/nss_pcache 458756 off /etc/httpd/alias 2648 ? Sl 0:01 \_ /usr/sbin/httpd -DFOREGROUND 2649 ? Sl 0:01 \_ /usr/sbin/httpd -DFOREGROUND 2650 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2651 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2652 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2653 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2654 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2685 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND 2733 ? Ss 0:00 /usr/sbin/sssd -D -f 2738 ? S 0:00 \_ /usr/libexec/sssd/sssd_be --domain example.com --uid 0 --gid 0 --debug-to-files 2740 ? S 0:00 \_ /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files 2741 ? S 0:00 \_ /usr/libexec/sssd/sssd_sudo --uid 0 --gid 0 --debug-to-files 2742 ? S 0:00 \_ /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files 2743 ? S 0:00 \_ /usr/libexec/sssd/sssd_pac --uid 0 --gid 0 --debug-to-files
Implementation
- Iterative process.
- Used
docker diff
to check changes confined to volume.
- Used
- FreeIPA's setup tools use
systemctl
heavily.- No
systemd
in the container. systemctl
replacement written, tailored to FreeIPA's services.
- No
- Multiple services on multiple ports:
EXPOSE 53/udp 53 80 443 389 636 88 464 88/udp 464/udp 123/udp 7389 9443 9444 9445
- Exposed by mapping to host's ports (
docker run -p ...
).
- Exposed by mapping to host's ports (
Security implications
- All processes in the container share one SELinux domain.
- Host's SELinux policy is used.
- No isolation of components.
- Capabilities need to be given do the initial process.
- NTP example:
- FreeIPA can setup and run NTP server.
- Capability needed (
docker run --cap-add=SYS_TIME
). - Custom SELinux policy needed to allow
sys_time
capability tosvirt_lxc_net_t
.
Naming, host, and localhost
FreeIPA can include DNS server as well.
- It needs to set its own (public) A record in the zone.
- What is the public IP address of service in container?
- Cannot determine from inside of the container.
- Be explicit, pass the value to the container.
- Forwarder to host's
127.0.0.1
? - Or
resolv.conf
pointing to host's 127.0.0.1?- Bind to bridge address on the host, or to public interface.
Upgrades in containers
Container | |||
Image (Built using | Volume (Bind-mounted in runtime) | ||
Host |
- Build new image (with
yum install
). - Remove the old container and run a new one:
New container | |||
New image (Built using | Original volume content (Bind-mounted in runtime) | ||
Host |
Approach to upgrades
- Since software was installed and not upgraded, upgrade (postinstall) scriptlets in rpms never kick in.
- Config files or data still need to be brought up-to-date.
- Generate
/etc/build-id
in image.- Copy it to
/data
- Upon next run, detect that different image is used.
- Copy it to
- When upgrade is needed during container start:
- If standalone upgrade tool is available in the project, use it.
- Parse rpm scriptlets and just run them.
- They really need to be idempotent.
- Populate missing (new) locations in volume that new image expects.
Future work
- Revisit the possibility of
systemd
in the FreeIPA container. - In FreeIPA, support running some components on separate hosts.
- Needs to be done in upstream.
- Orchestration needed.
Side note: complex service
- Even single-daemon service can be complex.
- SSSD (System Security Services Daemon): client-side to FreeIPA.
- Data on host, mounted to numerous locations:
-v /etc/ipa/:/etc/ipa/:ro \ -v /etc/krb5.conf:/etc/krb5.conf:ro \ -v /etc/krb5.keytab:/etc/krb5.keytab:ro \ -v /etc/nsswitch.conf:/etc/nsswitch.conf:ro \ -v /etc/openldap/:/etc/openldap/:ro \ -v /etc/pam.d/:/etc/pam.d/:ro \ -v /etc/passwd:/etc/passwd.host:ro \ -v /etc/pki/nssdb/:/etc/pki/nssdb/:ro \ -v /etc/ssh/:/etc/ssh/:ro \ -v /etc/sssd/:/etc/sssd/:ro \ -v /etc/systemd/system/sssd.service.d:/etc/systemd/system/sssd.service.d:ro \ -v /etc/sysconfig/authconfig:/etc/sysconfig/authconfig:ro \ -v /etc/sysconfig/network:/etc/sysconfig/network:ro \ -v /etc/sysconfig/sssd:/etc/sysconfig/sssd:ro \ -v /etc/yp.conf:/etc/yp.conf:ro \ -v /var/cache/realmd/:/var/cache/realmd/ \ -v /var/lib/authconfig/last/:/var/lib/authconfig/last/:ro \ -v /var/lib/ipa-client/sysrestore/:/var/lib/ipa-client/sysrestore/:ro \ -v /var/lib/samba/:/var/lib/samba/ \ -v /var/lib/sss/:/var/lib/sss/ \ -v /var/log/sssd/:/var/log/sssd/ \ -v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
- Configures multiple libraries on the machine.
- Setup in runtime:
ipa-client-install
. - Special SELinux domain needed to bridge the host/container boundary.
- Separation of install and runtime.
- The
atomic
tool usesLABEL INSTALL
/LABEL RUN
to capture the glue (docker run
options) needed.
- The
Conclusion
- Running multiple services in one container is possible.
- Easy testing of wild upgrade scenarios.
- Even across operating systems.
- Just pair the image with data.
- The state of the volume can drive the behaviour.
- Install when empty.
- Upgrade when created with different image.
- Unless you can depend on orchestration, minimize number of containers and volumes.
- Use native init system or work around it.
References
- https://github.com/adelton/docker-freeipa
- https://www.freeipa.org/page/Docker
- https://github.com/fedora-cloud/Fedora-Dockerfiles/tree/master/sssd