<center>
# Monadical Docker Guide
</center>
[TOC]
## Intro
- what actually is docker, how are containers implemented under the hood (briefly): syscall filtering + resource namespace mapping
- using docker well in practice is really about understanding these 3 separate technologies independently:
- a container image building spec: OCI / dockerfile
- a container runtime: containerd
- a container orchestration system: docker-compose
## Part 1: How to write a good Dockerfile, best practices for building images
**Part 1 Video (~1hr): https://www.youtube.com/watch?v=CCFQFQ3vPfE**
<a href="https://www.youtube.com/watch?v=CCFQFQ3vPfE"><img src="https://docs.monadical.com/uploads/upload_40cc41d7c513cc58c9be911d8af1c42a.png" width="450px"/></a>
- package installation, in what order and what magic flags? (system -> lang global -> lang local)
- "should you pip3 install --upgrade pip inside a python:3-based dockerfile"? i.e. should you upgrade packages provided by the docker image you're inheriting from (no)
- clearing packaging caches, how big a difference does it make? (~50-200mb)
- multi-stage builds, when to use them and when not to (usually worth it beyond ~500loc projects)
- user setup, permissions, UID / GID (most common sticking point, get this right)
- COPY / ADD / VOLUME and how they're different (volumes dont exist at build time)
- many thin layers or a few fat layers? (usually few fat, with heavy trimming by using multi-stage builds)
- should you mount the codebase as a volume or build it into the image or both? (both)
- entrypoint vs cmd (entrypoint should almost always be bash, cmd should be your `build && runserver` equivalent)
- precedence order for dockerfile options
CLI args -> docker-compose file -> Dockerfile defaults
- what is `dumb-init`, why would you want an init system inside of docker (keep only one service in each container, but you need dumb-init if your command spawns subprocesses, so that they are killed when the container stops and you [dont get zombie processes](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/))
- what is `gosu`, and why not use `sudo` (it's simpler and more secure, use `exec` as well to avoid unecessary bash processes wrapping your process)
- strategies to make smaller images: multi-stage, cache removal, [docker-slim](https://github.com/docker-slim/docker-slim), etc.
## Part 2: The container runtime, best practices on running containers
**Part 2 Video (~1hr): https://www.youtube.com/watch?v=jbM3ybCKNgM**
<a href="https://www.youtube.com/watch?v=jbM3ybCKNgM"><img src="https://docs.monadical.com/uploads/upload_40cc41d7c513cc58c9be911d8af1c42a.png" width="450px"/></a>
- storage driver: overlayfs / zfs / etc. (just use the default, or zfs is you want)
- what are volumes, why to never use named volumes, and volume modes `:ro` `:rw` `:z`, etc.
- best practices for backups: instant filesystem snapshot / or script to dump db to file + rsync-style incremental backup (never incremental backup a database data dir directly)
- best practices for permissions: use root / or pass UID/GID on linux w/ entrypoint to chmod & drop perms to execute
- best practices for networking: use `expose:` to define which ports are used by other containers (it's a n00p all containers can see all ports on other containers, it's just for humans to know)
- sidecar containers: for networking, filesystems, etc.
- handling ingress: sidecar containers for argo, tailscale, wireguard, socks, caddy, traefik, etc.
- `cap_add`, `privileged: true`, what they mean and when you need them
- stdout / stderr handling (different defaults: `-it` with `docker` vs `-T` in docker-compose)
- logging handling (use a json-file driver, or use supervisord to catch output of `docker-compose up` and do logrotation)
- init systems: inside the container or outside the container or both? (`supervisord` for multi-project control, docker-compose within a project, and no init system inside the container)
- clearning stopped containers and orphan containers, `docker system prune --all`, and `run --rm` and when it's necessary
## Part 3: Orchestration w/ Compose, best-practices for multi-container services
- version 2.4, 3.x, or 3.9+ and what is the future with COMPOSE_SCHEMA?
- how to reuse images between containers in a project (`image:` defines the name, `build: .` defines how to build the image, build order matters)
- how to reuse data volumes between containers in a project (mount it as a bind volume, use `:z` if necessary, never use named volumes)
- `expose` vs `ports` vs `network_mode` inter-container networking (prefer `expose`, use ports only if you have a firewall)
- how DNS works inside containers (it's basically just set in `/etc/hosts` with each container having a static ip)
- defining service startup order / dependencies (using healthchecks and `depends_on: status: healthy` instead of entrypoint scripts)
- `docker-compose up/down` vs `start/stop` and when to use `-d` (always `down` twice, check for orphan containers and networks)
- how to handle ingress (sidecar containers, lock down all ports dont bind any ports publicly)
- how to handle backups (mount data volume and use a periodic script to snapshot or rsync away)
- using supervisord around docker-compose to manage multiple projects (and footguns like `stopasgroup=false`, `restart: always`, etc.)
- monitoring and alerting (use digitalocean alerts for host resources, use docker healthchecks and statping for service status)
---
## Bonus Content
- WTF is argo
- what are ingress tunnels
- why are they good?
- why argo in particular vs rolling your own?
- load balancing / performance improvments in practice
- other cloudflare benefits you get access to: redirect rules, rate limiting, load balancing, caching, image resizing, minimization, CDN, scraping protection, analytics, alerting, etc.. the list goes on
- Creating docker images (techtalk): https://drive.google.com/file/d/1qOwTsIuEglyryzE2B1iSxQ1Om1s0lEV8/view?usp=share_link
- speed up fast repeated docker builds by >20x by adding granular caching with
`RUN --mount=type=cache,sharing=locked ...`
https://github.com/docker/buildx/issues/549
```dockerfile
ARG TARGETPLATFORM
ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT
...
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
...
RUN --mount=type=cache,id=apt-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/apt \
&& apt-get update -qq \
&& apt-get install -qq -y --no-install-recommends \
apt-transport-https ca-certificates apt-utils gnupg2 curl wget ... \
&& rm -rf /var/lib/apt/lists/*
```
---
## Further Reading
- https://github.com/wagoodman/dive
- https://depot.dev/dockerfile-explorer
- https://github.com/remorses/docker-phobia
- https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html
- https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
- https://depot.dev/blog/buildkit-in-depth
- https://privsec.dev/posts/linux/docker-and-oci-hardening/
- https://github.com/docker-slim/docker-slim
- https://github.com/docker/docker-bench-security
- https://docs.docker.com/storage/storagedriver/select-storage-driver/
- http://jdlm.info/articles/2016/03/06/lessons-building-node-app-docker.html
- https://blog.gougousis.net/file-permissions-the-painful-side-of-docker/
- https://vsupalov.com/docker-env-vars/
- https://docs.docker.com/engine/reference/builder/