Rationale for using Docker to containerize applications

  • I'm trying to get a better understanding of the reasons to use [and not use] Docker based on specific use cases.

    From my current understanding, Docker helps to isolate applications and their dependencies within containers. This is useful to ensure consistent reproducible builds in varied environments.

    However, I'm struggling to understand the rationale of using Docker where the environments are essentially the same, and the applications are relatively simple.

    Say I have the following:

    • a cloud VM instance (DigitalOcean, Vultr, Linode, etc.) with 1Gb RAM running Ubuntu 20.
    • a Node.js Express app (nothing too complicated)

    The following issues come to the fore:

    1. Dockerizing this application will produce an image that is ~100Mb after optimization (without optimization probably 500Mb or higher based on my research). The app could be 50Kb in size, but the Docker container dependencies to run it are significantly higher by a factor of up to 10,000 or above. This seems very unreasonable from an optimization standpoint.

    2. I have to push this container image to a hub before I can use Docker to consume it. So that's 500Mb to the hub, and then 500Mb down to my VM instance; total of about 1Gb of bandwidth per build. Multiply this by the number of times the build needs to be updated and you could be approaching terabytes in bandwidth usage.

    3. I read in a https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04 that before I can run my container image, I have to do the following:

    docker pull ubuntu

    This pulls an image of Ubuntu. But, I'm already on Ubuntu, so does this mean I'm running a container that's running Ubuntu inside an existing VM that is running Ubuntu? This appears to be needless duplication, but I'd appreciate clarification.

    1. The https://docs.docker.com/desktop/install/linux-install/ specify that I should have 4Gb RAM. This means I have to use more expensive VM instances even when my application does not necessarily require it.

    How exactly does containerization [using Docker or similar] optimize and enhance the DevOps experience, especially on an ongoing basis?

    I'm not quite getting it but I'm open to clarification.

  • I'm struggling to understand the rationale of using Docker where the environments are essentially the same, and the applications are relatively simple.

    In reality, it is highly unlikely that any development environment on any project would ever be anywhere near the same as staging/production.

    • Services running in staging/production will nearly always be physically hosted and managed somewhere which is not intended to be operated interactively by a human day-to-day, with an appropriate IT/security profile to match;
    • The nature of development work, and even internal build/testing typically requires a different IT profile to that of a production server.
    • Developers rarely have control over the underlying infrastructure or the organisation's IT/security policies.

    There are many ways in which the IT profile of developer environments, including build agents and even test machines, can deviate from production:

    • Users/permissions or other security settings.
    • Installed tools, SDKs, runtimes, debug/test tools, OS features/packages, and other dependencies operating with debug/test configurations enabled.
    • Environment variables
    • Filesystem structure and the content of files in globally shared directories
    • Configurations of globally-installed dependencies such as web servers.

    Furthermore, consider the nature of physical devices and VMs

    • They are stateful and mutable
    • Every change to any aspect of a device or VM, including installed software and configuration changes potentially affects its entire state for all processes running on it.
    • Physical devices and VMs typically run many processes/services concurrently, it would usually not be considered economical to have a whole server or VM just for a single running process.

    What containers provide:

    • Isolation from the host device/VM and from an organisation's IT, Network and Infrastructure policies.
    • Isolation from each other - for example, consider the issue of requiring multiple versions of globally-installed runtime dependencies or modifications to shared host resources such as environment variables or local files.
    • Developers typically have full control over their choices of container images and the networking/orchestration inside the container runtime.
    • Images are based on immutable layers, meaning it is not possible for the state of any layer in an image to change, so a published image should always be a good, known, valid starting point.
    • The size of a parent image tends to be inconsequential because there's typically no reason to duplicate nor to re-download it unless a new version of that parent image is published.
    • A container is its own thin, mutable layer on top of an image, usually negligible in size, and uses the parent image for all dependencies.
    • if a container ends up in an invalid state, it may be quickly and cheaply disposed and replaced with a fresh, clean new container almost instantly by recreating another fresh new container layer.
    • The cheap, light-weight nature of containers makes it very efficient to run a single process per-container.

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2