Container quickstart
- Dockerfile:
FROM fedora RUN yum -y install httpd && yum clean all RUN echo "Test Server" > /var/www/html/index.html CMD [ "/usr/sbin/httpd", "-DFOREGROUND" ]
- Build image:
host$docker build -t httpd . Sending build context to Docker daemon Step 0 : FROM fedora [...] Successfully built 4b46d7c43d40 - Run new container based on the image and talk to it:
host$docker run --name httpd-c httpd &host$docker inspect -f '{{ .NetworkSettings.IPAddress }}' httpd-c 172.17.0.3host$curl http://172.17.0.3/ Test Server
Technologies involved
Namespaces
- Mount (filesystems hierarchy)
- Network (devices, IP addresses, routing)
- Process IDs
- User and group IDs (currently not used by Docker)
- UTS (hostname, domainname)
- IPC (SysV IPC, message queues)
- Control groups (cgroups) — setting limits
- SELinux (use
--selinux-enabledwith Docker daemon) - iptables (use
--icc=falsewith Docker daemon)
Namespacing examples
- PID namespace:
host$docker exec httpd-c ps ax PID TTY STAT TIME COMMAND 1 ? Ss 0:00 /usr/sbin/httpd -DFOREGROUND 12 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 13 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 14 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 15 ? S 0:00 /usr/sbin/httpd -DFOREGROUND 50 ? Rs 0:00 ps ax - Network namespace:
host$docker run fedora tail -n +2 /proc/net/route eth0 00000000 012A11AC 0003 0 0 0 000000000 0 0 eth0 000011AC 00000000 0001 0 0 0 0000FFFF0 0 0 - View namespace transitions on the host:
host#pstree -S | grep docker |-docker(mnt)-+-httpd(ipc,mnt,net,pid,uts)---4*[httpd] | `-12*[{docker}]
Filesystems and volumes
- The image is mounted as root:
host$docker exec httpd-c mount | head -1 /dev/mapper/docker-252:17-8193-600d0ac578e0b955c25632be5398921c2ee1e1d6 288b7c687335488f99cb4c28 on / type ext4 (rw,relatime,context="system_u: object_r:svirt_sandbox_file_t:s0:c264,c680",discard,stripe=16,data=orde red) - Bind-mounting volume:
host$mkdir /tmp/datahost$echo "Test serving data from volume" > /tmp/data/index.htmlhost$docker run --name httpd-c -v /tmp/data:/var/www/html:Z httpd &host$docker inspect --format '{{ .HostConfig.Binds }}' httpd-c [/tmp/data:/var/www/html:Z]host$ls -aZ /tmp/data | cut -d ' ' -f 1,4,5 drwxr-xr-x. system_u:object_r:svirt_sandbox_file_t:s0:c206,c497 . drwxrwxrwt. system_u:object_r:tmp_t:s0 .. -rw-r--r--. system_u:object_r:svirt_sandbox_file_t:s0:c206,c497 index.htmlhost$curl http://172.17.0.8/ Test serving data from volume
Approach to containerization
Typical advice when moving application to a container:
- One daemon/service per component.
- Containers can run with their own network and UTS namespaces — they can act as separate machines.
- Use
docker run --linkto connect them together. - Bind-mount volumes with configuration/data into directories where programs expect them.
- Install and configure in build time.
- In run time, just start the daemon.
Typical setup
| Container: one service | ← link | Container: one service | ← link | Container: one service | |||||
| ↑ ↖ bind mounts | ↗ bind mount | ||||||||
| volume | volume | volume | |||||||
| host | |||||||||
Nontrivial application
Running one daemon like httpd above is easy.
- Especially when it does not require any runtime-specific configuration.
- And it does not store state and can be stopped at any moment.
- How about application which consists of a dozen of daemons?
- Application which needs to do heavy initialization upon the first run.
- Individual components use their own paths for configuration and data.
- Their startup needs to be synchronized.
- There is common configuration tool which assumes everything is on single machine.
- FreeIPA is such an application — umbrella on top of multiple services.
Containerizing nontrivial application
If components do not know how to communicate across network, separating them into individual containers might not be feasible.
- Perhaps Unix sockets are used.
- Or the installer simply assumes everything is on localhost.
- Security, authentication.
Locations of files that the programs work with might be hardcoded.
- For OS-level tools, they are often standardized.
- For some, not really documented.
- Bind-mounting dozens of directories increases chance of mismatch.
- Components might only be able to finalize their setup in runtime.
- Startup and shutdown procedures were polished to perfection by maintainers for individual distributions over the years.
In case of FreeIPA ...
Configuration tools like
ipa-server-installoripa-replica-installare major part of the whole benefit of the project.- We want to use them, not duplicate their logic.
- They assume all parts are local.
- Only when domain and realm are known once container is run, LDAP, Kerberos, DNS, or CA can be properly set up.
- Large number of various directories and files, all over the filesystem.
- FreeIPA uses native init system and systemd unit files for service start/stop.
The data and configuration
To minimize number of volumes that will need to be bind-mounted, all data directories and files live under
/data.- In build time, install software with
yum install freeipa-server. - Then move directories and files that will hold instance config and data (and thus define it) to
/data-template. - And create symlinks from original locations to paths under
/data. - Container is run with
docker run -v /opt/ipa-data:/data ... - Upon the first run when empty
/datais detected, copy over the vanilla content from/data-templateto/data, populating the volume. - Used
docker diffduring the work to verify that no unexpected changes get written to the image.
- In build time, install software with
- Eventually, we might want to put at least logs to separate volume.
FreeIPA setup
| Single container | |||||||||
| 389 | KDC | DNS server | D-Bus | PKI/CA | HTTP Server | SSSD | ... | ||
| Single image with symlinks to → | /data | ||||||||
| ↑ | bind mount | ||||||||
| volume | |||||||||
| host | |||||||||
Using the native configuration tool
- The process run as PID 1 is a bash script which detects initial (setup) run vs. routine startup.
For initial,
ipa-server-installis run.- The configuration and data get stored into the volume, via symlinks.
- We had to cheat a bit in some cases — for example keytab files have to be created in image and copied over afterwords.
The setup tool uses
systemctlheavily but there is no systemd running —systemctlreplacement scripted to start services directly, while observing systemd unit files.- Only supporting syntaxes used by our services.
- We might want to use native systemd once it runs in Docker containers seamlessly.
- For subsequent startup, it just starts the enabled services.
Initial instance configuration
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /bin/bash /usr/sbin/ipa-server-configure-first
43 ? S 0:00 xargs /usr/sbin/ipa-server-install -U
44 ? S 0:01 \_ /usr/bin/python2 -E /usr/sbin/ipa-server-install -U --ds-password=Secret12345 --admin-password=Secret12345 -r EXAMPLE.COM --setup-dns --forwarder=192.168.100.1
74 ? S 0:00 \_ /usr/bin/perl /usr/sbin/setup-ds.pl --silent --logfile - -f /tmp/tmpPjmUla
89 ? S 0:00 \_ sh -c /var/lib/dirsrv/scripts-EXAMPLE-COM/ldif2db -n userRoot -i '/var/lib/dirsrv/boot.ldif' 2>&1
90 ? S 0:00 \_ /bin/sh /var/lib/dirsrv/scripts-EXAMPLE-COM/ldif2db -n userRoot -i /var/lib/dirsrv/boot.ldif
91 ? S 0:00 \_ /bin/sh ./ldif2db -n userRoot -i /var/lib/dirsrv/boot.ldif -Z EXAMPLE-COM
119 ? Sl 0:00 \_ /usr/sbin/ns-slapd ldif2db -D /etc/dirsrv/slapd-EXAMPLE-COM -n userroot -i /var/lib/dirsrv/boot.ldif
66 ? Ss 0:00 /usr/sbin/ntpd -u ntp:ntp -g -x
FreeIPA container running
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /bin/bash /usr/sbin/ipa-server-configure-first
1470 ? Ss 0:00 /bin/dbus-daemon --system --fork
1479 ? Ss 0:00 /usr/sbin/certmonger -S -p /var/run/certmonger.pid -n
2010 ? Ss 0:00 /usr/sbin/kadmind -P /var/run/kadmind.pid
2020 ? Ssl 0:00 /usr/bin/memcached -d -s /var/run/ipa_memcached/ipa_memcached -u apache -m 64 -c 1024 -P /var/run/ipa_memcached/ipa_memcached.pid
2043 ? Ss 0:00 /usr/bin/perl /bin/systemctl-socket-daemon /var/run/krb5kdc/DEFAULT.socket 0600 ipa-otpd@.service
2225 ? Sl 0:01 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM -i /var/run/dirsrv/slapd-EXAMPLE-COM.pid -w /var/run/dirsrv/slapd-EXAMPLE-COM.startpid
2274 ? Ss 0:00 /usr/sbin/krb5kdc -P /var/run/krb5kdc.pid
2502 ? Ss 0:00 sh -c export TOMCAT_CFG_LOADED="1"; export TOMCATS_BASE="/var/lib/tomcats/"; export JAVA_HOME="/usr/lib/jvm/jre"; export CATALINA_HOME="/usr/share/tomcat"; export CATALINA_TMPDIR="/var/cache/tomcat/temp"; export SECURITY_MANAGER="false"; export CATALINA_BASE="/var/lib/pki/pki-tomcat"; export CATALINA_TMPDIR=/var/lib/pki/pki-tomcat/temp; export JAVA_OPTS="-DRESTEASY_LIB=/usr/share/java/resteasy"; export TOMCAT_USER="pkiuser"; export SECURITY_MANAGER="true"; export CATALINA_PID="/var/run/pki/tomcat/pki-tomcat.pid"; export TOMCAT_LOG="/var/log/pki/pki-tomcat/tomcat-initd.log"; export PKI_VERSION=10.2.1; export TOMCAT7_USER="pkiuser"; export TOMCAT7_SECURITY="true"; export NSS_ENABLE_PKIX_VERIFY=1; export NAME=pki-tomcat; /usr/sbin/runuser -g pkiuser -u pkiuser -- /usr/libexec/tomcat/server start
2503 ? S 0:00 \_ /usr/sbin/runuser -g pkiuser -u pkiuser -- /usr/libexec/tomcat/server start
2504 ? Sl 0:11 \_ /usr/lib/jvm/jre/bin/java -DRESTEASY_LIB=/usr/share/java/resteasy -classpath /usr/share/tomcat/bin/bootstrap.jar:/usr/share/tomcat/bin/tomcat-juli.jar:/usr/lib/java/commons-daemon.jar -Dcatalina.base=/var/lib/pki/pki-tomcat -Dcatalina.home=/usr/share/tomcat -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/lib/pki/pki-tomcat/temp -Djava.util.logging.config.file=/var/lib/pki/pki-tomcat/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.security.manager -Djava.security.policy==/var/lib/pki/pki-tomcat/conf/catalina.policy org.apache.catalina.startup.Bootstrap start
2635 ? Ssl 0:00 /usr/sbin/named-pkcs11 -u named
2645 ? Ss 0:00 sh -c export LANG=C; /usr/sbin/httpd $OPTIONS -DFOREGROUND
2646 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2647 ? S 0:00 \_ /usr/libexec/nss_pcache 458756 off /etc/httpd/alias
2648 ? Sl 0:01 \_ /usr/sbin/httpd -DFOREGROUND
2649 ? Sl 0:01 \_ /usr/sbin/httpd -DFOREGROUND
2650 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2651 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2652 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2653 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2654 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2685 ? S 0:00 \_ /usr/sbin/httpd -DFOREGROUND
2733 ? Ss 0:00 /usr/sbin/sssd -D -f
2738 ? S 0:00 \_ /usr/libexec/sssd/sssd_be --domain example.com --uid 0 --gid 0 --debug-to-files
2740 ? S 0:00 \_ /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
2741 ? S 0:00 \_ /usr/libexec/sssd/sssd_sudo --uid 0 --gid 0 --debug-to-files
2742 ? S 0:00 \_ /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
2743 ? S 0:00 \_ /usr/libexec/sssd/sssd_pac --uid 0 --gid 0 --debug-to-files
Publicly accessible server
- FreeIPA server provides multiple services on multiple ports
EXPOSE 53/udp 53 80 443 389 636 88 464 88/udp 464/udp 123/udp 7389 9443 9444 9445
- Even if bridge networking is used, it is possible to use
-poptions todocker runto map ports on host's public interface to the container. - But our server is also DNS server and it has record about itself that clients wil query.
- From within container, we have no way to find out host's IP address.
- Solution: be explicit, host's prefered IP address will be passed in explicitly via environment variable.
The resolv.conf and localhost
- With FreeIPA, DNS server (bind) can be run in the container.
- We rewrite
nameserverin container's/etc/resolv.confto point to127.0.0.1. - What if we wanted to use DNS server on host's localhost?
- No good answer — use either bridge address or host's public IP address.
NTP in container
- FreeIPA can setup and run NTP, Kerberos loves time to be in sync.
- By default, processes in container do not have capabilities to set time.
- Use
--cap-add=SYS_TIMEto add the capability back. - AVC denial.
- Custom SELinux policy needed to allow
sys_timecapability tosvirt_lxc_net_t.
How upgrades work?
| Container | |||
| Image (Built using | Volume (Bind-mounted in runtime) | ||
| Host | |||
- Build new image (with
yum install). - Remove the old container and run a new one:
| New container | |||
| New image (Built using | Original volume content (Bind-mounted in runtime) | ||
| Host | |||
Upgrades
- Upgrade (postinstall) scriptlets in rpms never kick in.
The script which handles initial population needs to detect and handle upgrade situation as well.
- If standalone upgrade process is available in the project, use it.
- Parsing and running the rpm scriptlets also works.
- It helps if the existing mechanisms are idempotent.
- Generate
/etc/build-idto easily detect different image. - Make sure
/datahas all the locations that symlinks in the new image expect to exist.
Conclusion
- Running multiple services in one container is possible.
- Maximize number of steps done in build time.
- If your init works in container use it, otherwise work around it.
- Minimize number of volumes that the user has to deal with.
References
- https://github.com/adelton/docker-freeipa
- https://www.freeipa.org/

