The bad old days
Applications run businesses. If applications break, businesses break. Sometimes they
even go bust. These statements get truer every day!
Most applications run on servers. And in the past, we could only run one application
per server. The open-systems world of Windows and Linux just didn’t have the
technologies to safely and securely run multiple applications on the same server. Every time the business needed a
new application, IT would go out and buy a new server. And most of the time nobody
knew the performance requirements of the new application! This meant IT had to
make guesses when choosing the model and size of servers to buy.
As a result, IT did the only thing it could do — it bought big fast servers with lots of
resiliency. After all, the last thing anyone wanted, including the business, was under-powered servers. Under-powered servers might be unable to execute transactions,
which might result in lost customers and lost revenue. So, IT usually bought big.
This resulted in huge numbers of servers operating as low as 5-10% of their potential
capacity. A tragic waste of company capital and resources!
Hello VMware!
Amid all of this, VMware, Inc. gave the world a gift — the virtual machine (VM).
And almost overnight, the world changed into a much better place! We finally had a technology that would let us safely and securely run multiple business applications
on a single server. Cue wild celebrations!
This was a game changer! IT no longer needed to procure a brand new oversized
server every time the business asked for a new application. More often than not,
they could run new apps on existing servers that were sitting around with spare
capacity.
All of a sudden, we could squeeze massive amounts of value out of existing corporate
assets, such as servers, resulting in a lot more bang for the company’s buck ($).
VM warts
But… and there’s always a but! As great as VMs are, they’re far from perfect!
The fact that every VM requires its own dedicated OS is a major flaw. Every OS
consumes CPU, RAM and storage that could otherwise be used to power more
applications. Every OS needs patching and monitoring. And in some cases, every
OS requires a license. All of this is a waste of op-ex and cap-ex.
The VM model has other challenges too. VMs are slow to boot, and portability
isn’t great — migrating and moving VM workloads between hypervisors and cloud
platforms is harder than it needs to be.
Hello Containers!
For a long time, the big web-scale players, like Google, have been using container
technologies to address the shortcomings of the VM model.
In the container model, the container is roughly analogous to the VM. The major
difference is that every container does not require its own full-blown OS. In fact, all
containers on a single host share a single OS. This frees up huge amounts of system
resources such as CPU, RAM, and storage. It also reduces potential licensing costs
and reduces the overhead of OS patching and other maintenance. Net result: savings
on the cap-ex and op-ex fronts.
Containers are also fast to start and ultra-portable. Moving container workloads from
your laptop, to the cloud, and then to VMs or bare metal in your data center, is a
breeze.
The Container is a wrapper of system process.
Windows containers vs Linux containers
It’s vital to understand that a running container shares the kernel of the host machine
it is running on. This means that a containerized app designed to run on a host with
a Windows kernel will not run on a Linux host. This means that you can think of it
like this at a high level — Windows containers require a Windows Host, and Linux
containers require a Linux host. However, it’s not that simple…
At the time of writing, it is possible to run Linux containers on Windows machines.
For example, Docker for Windows (a product offering from Docker, Inc. designed for
Windows 10) can switch modes between Windows containers and Linux containers.
This is an area that is developing fast and you should consult the Docker documenatation for the latest.
Docker
Docker is software that runs on Linux and Windows kernels. It creates, manages and
orchestrates containers. The software is developed in the open as part of the Moby
open-source project on GitHub.
Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing your Docker containers. The Docker client and daemon can run on the same system, or you can connect a Docker client to a remote Docker daemon.
When you install Docker, you get two major components:
- The Docker client.
- The Docker daemon (sometimes called “server” or “engine”).
The Docker Engine is the infrastructure plumbing software that runs and orchestrates
containers. The Docker Engine is the core container runtime that runs containers.
Images
It’s useful to think of a Docker image as an object that contains an OS filesystem
and an application. If you work in operations, it’s like a virtual machine template. An image contains enough of an operating
system (OS), as well as all the code and dependencies to run whatever application
it’s designed for.
Images are made up of multiple layers that get stacked on top of each other and
represented as a single object. Inside of the image is a cut-down operating system
(OS) and all of the files and dependencies required to run an application. Because
containers are intended to be fast and lightweight, images tend to be small. As a result, images and layers are immutable,
and we can easily identify any changes made to either.
Containers
Now that we have an image pulled locally, we can use the docker container run
command to launch a container from it.
For Linux:
docker container run -it ubuntu:latest /bin/bash
The docker container run tells
the Docker daemon to start a new container. The -it flags tell Docker to make
the container interactive and to attach our current shell to the container’s terminal
(we’ll get more specific about this in the chapter on containers).
The
container engine then takes OS resources such as the process tree, the filesystem,
and the network stack, and carves them up into secure isolated constructs called
containers. Each container looks smells and feels just like a real OS. Inside of each
container we can run an application.
The lifecycle of a container… You can stop, start, pause, and restart
a container as many times as you want. And it’ll all happen really fast. But the
container and its data will always be safe. It’s not until you explicitly delete a
container that you run any chance of losing its data. And even then, if you’re storing
container.
Attaching to running containers
You can attach your shell to the terminal of a running container with the docker
container exec command.
Self-healing containers with restart policies
It’s often a good idea to run containers with a restart policy. It’s a form of self-healing
that enables Docker to automatically restart them after certain events or failures have
occurred.
Restart policies are applied per-container, and can be configured imperatively on
the command line as part of docker-container run commands, or declaratively in
Compose files for use with Docker Compose and Docker Stacks.
At the time of writing, the following restart policies exist:
• always
• unless-stopped
• on-failed
The always policy is the simplest. It will always restart a stopped container unless
it has been explicitly stopped, such as via a docker container stop command. An
easy way to demonstrate this is to start a new interactive container with the --
restart always policy, and tell it to run a shell process. When the container starts
you will be attached to its shell.
Dockerfile
Dockerfile containing instructions on how to build the app into a Docker
image.
Use the docker image build command to create a new image using the instructions
in the Dockerfile.
The build process used by Docker has the concept of a cache. The best way to
see the impact of the cache is to build a new image on a clean Docker host, then repeat the same build immediately after. The first build will pull images and take
time building layers. The second build will complete almost instantaneously. This is
because artefacts form the first build, such as layers, are cached and leveraged by
later builds.
s we know, the docker image build process iterates through a Dockerfile one-line at-a-time starting from the top. For each instruction, Docker looks to see if it already
has an image layer for that instruction in its cache. If it does, this is a cache hit and
it uses that layer. If it doesn’t, this is a cache miss and it builds a new layer from the
instruction. Getting cache hits can hugely speed up the build process.
Docker Compose
Deploying and managing lots of services can be hard. This is where Docker Compose
comes in to play.
Instead of gluing everything together with scripts and long docker commands,
Docker Compose lets you describe an entire app in a single declarative configuration
file. You then deploy it with a single command.
Once the app is deployed, you can manage its entire lifecycle with a simple set of
commands. You can even store and manage the configuration file in a version control
system!
Compose uses YAML files to define multi-service applications. YAML is a subset of
JSON, so you can also use JSON. However, all of the examples in this chapter will be
YAML.
The default name for the Compose YAML file is docker-compose.yml. However, you
can use the -f flag to specify custom filenames.
Using docker compose, we can deploy multiple isolated environments like dev and test into docker.
Service discovery
Service discovery allows an application or component to discover information about their environment and neighbors.
Volumes and persistent data
There are two main categories of data. Persistent and non-persistent.
Persistent is the stuff you need to keep. Things like; customer records, financials,
bookings, audit logs, and even some types of application log data. Non-persistent is
the stuff you don’t need to keep.
Both are important, and Docker has options for both.
Every Docker container gets its own non-persistent storage. It’s automatically
created, alongside the container, and it’s tied to the lifecycle of the container. That
means deleting the container will delete this storage and any data on it.
If you want your container’s data to stick around (persist), you need to put it on a
volume. Volumes are decoupled from containers, meaning you create and manage
them separately, and they’re not tied to the lifecycle of any container. Net result, you
can delete a container with a volume, and the volume will not be deleted.
The recommended way to persist data in containers is with volumes.
At a high-level, you create a volume, then you create a container, and you mount the
volume into it. The volume gets mounted to a directory in the container’s filesystem,
and anything written to that directory is written to the volume. If you then delete
the container, the volume and its data will still exist.
Integrating Docker with external storage systems makes it easy to share the external
storage between cluster nodes. A major concern with a configuration like this is data corruption.
Namespaces
Kernel namespaces are at the very heart of containers! They let us slice up an
operating system (OS) so that it looks and feels like multiple isolated operating
systems. This lets us do really cool things like run multiple web servers on the same
OS without having port conflicts. It also lets us run multiple apps on the same OS
without them fighting over shared config files and shared libraries.
A couple of quick examples:
• You can run multiple web servers, each on port 443, on a single OS. To do this
you just run each web server app inside of its own network namespace. This
works because each network namespace gets its own IP address and full range
of ports. You may have to map each one to a separate port on the Docker host,
but each can run without being re-written or reconfigured to use a different
port.
• You can run multiple applications, each requiring their own particular version
of a shared library or configuration file. To do this you run each application
inside of its own mount namespace. This works because each mount namespace can have its own isolated copy of any directory on the system (e.g. /etc,
/var, /dev etc.).
Control Groups
If namespaces are about isolation, control groups (cgroups) are about setting limits.
Think of containers as similar to rooms in a hotel. Yes, each room is isolated, but each
room also shares a common set of resources — things like water supply, electricity
supply, shared swimming pool, shared gym, shared breakfast bar etc. Cgroups let us
set limits so that (sticking with the hotel analogy) no single container can use all of
the water or eat everything at the breakfast bar.
In the real world, not the silly hotel analogy, containers are isolated from each other
but all share a common set of OS resources — things like CPU, RAM and disk I/O.
Cgroups let us set limits on each of these so that a single container cannot use all of
the CPU, RAM, or storage I/O of the host.