Tao
Tao

Deep Dive into Docker cp Command: From Basics and Internals to Security Best Practices

The docker container cp command is something we use frequently in our daily Docker interactions. While it appears simple on the surface, most of us lack deep understanding of its internal implementation and security risks. This article will take you on a comprehensive journey through this command - from basic usage to core source code, then to security vulnerabilities and best practices, helping you truly master it.

The primary function of docker container cp (or simply docker cp) is to bidirectionally copy files or directories between the host file system and a running container’s file system.

Its syntax is very similar to the familiar scp or cp commands.

bash

# Copy from container to host
docker cp <containerId>:<path/in/container> <path/on/host>

# Copy from host to container
docker cp <path/on/host> <containerId>:<path/in/container>

Examples:

bash

# Copy the /etc/nginx/nginx.conf file from container my-nginx to the current directory on host
docker cp my-nginx:/etc/nginx/nginx.conf .

# Copy the my-app.jar file from host to the /app directory in container my-app
docker cp my-app.jar my-app:/app/

Many people assume docker cp is a direct file system operation, but it’s actually a client-server (C/S) architecture API call with tar archive streams at its core.

Whether you’re copying files from or to a container, the entire process follows this flow:

  1. Client Parsing: The docker CLI (client) parses your command.
  2. API Request:
    • Copy to container: The client first packages the files or directories to be copied into a local tar archive, then calls the Docker Daemon’s PUT /containers/{id}/archive API, sending this tar stream as the request body to the Docker server.
    • Copy from container: The client calls the GET /containers/{id}/archive API, specifying which path’s files are needed from the server.
  3. Server Processing: Docker Daemon (server) handles the request:
    • Copy to container: The server receives the tar stream and directly extracts it within the container’s file system.
    • Copy from container: The server packages the specified files/directories from the container into a tar stream and sends it back as an HTTP response to the client.
  4. Client Completion:
    • Copy to container: After the server completes extraction, the API call ends.
    • Copy from container: The client receives the server’s tar stream response and extracts it locally, completing the file copy operation.

In the Docker CLI source code (docker/cli), we can see the clear implementation of this process. Here’s a simplified Go code snippet showing the core logic of CopyToContainer.

go

// (Code located in docker/cli/cli/command/container/cp.go and client/interface.go)

// CopyToContainer copies content to a container
func (cli *Client) CopyToContainer(ctx context.Context, container, path string, content io.Reader, options CopyToContainerOptions) error {
    // Prepare API request
    // The core here is the API endpoint: /containers/{id}/archive
    apiPath := "/containers/" + container + "/archive"

    query := url.Values{}
    query.Set("path", path)
    query.Set("noOverwriteDirNonDir", strconv.FormatBool(options.NoOverwriteDirNonDir))

    // Make a PUT request, with the request body being the tar archive stream (content)
    resp, err := cli.put(ctx, apiPath, query, content, headers)
    if err != nil {
        return err
    }
    ensureReaderClosed(resp)
    return nil
}

Explanation: The client’s CopyToContainer function essentially makes a PUT request to /containers/{id}/archive, sending the packaged local files as an io.Reader (tar stream) in the request body.

In the Moby project (moby/moby) source code, we can find the server-side logic that handles this API request.

go

// (Code located in moby/moby/api/server/router/container/container_routes.go)

// putContainersArchive handles PUT requests for copying files to containers
func (s *containerRouter) putContainersArchive(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
    // ... parse parameters ...
    container, err := s.backend.ContainerGet(vars["name"])
    if err != nil {
        return err
    }

    // Call the backend's ExtractToDir method, where r.Body is the tar stream from the client
    if err := s.backend.ContainerExtractToDir(container.Name, query.Get("path"), r.Body); err != nil {
        return err
    }

    w.WriteHeader(http.StatusOK)
    return nil
}

Explanation: The server’s routing passes the request to the putContainersArchive function, whose core action is calling ContainerExtractToDir, which directly extracts the request body (r.Body, i.e., the tar stream) at the specified path within the container.


When application logs aren’t mounted out through data volumes, we can use the docker cp command to copy files from inside the container to a specified directory on the host.

bash

# Retrieve application log files from a container named "my-prod-app"
docker cp my-prod-app:/var/log/app.log ./app-debug.log

Dynamically update an application’s configuration without restarting the container.

bash

# Update Nginx configuration file
docker cp nginx.conf my-nginx:/etc/nginx/nginx.conf

# Enter the container and reload Nginx configuration
docker exec my-nginx nginx -s reload

Used in CI/CD or deployment scripts for transferring build artifacts.

bash

#!/bin/bash
CONTAINER_ID=$(docker ps -qf "name=my-web-server")
BUILD_ARTIFACT="./dist/index.html"

if [ -n "$CONTAINER_ID" ]; then
  echo "Copying $BUILD_ARTIFACT to $CONTAINER_ID:/usr/share/nginx/html/"
  docker cp "$BUILD_ARTIFACT" "$CONTAINER_ID:/usr/share/nginx/html/"
  echo "Copy complete."
else
  echo "Error: Container not found."
  exit 1
fi

Due to its tar packaging and unpacking mechanism, docker cp performs extremely poorly when handling large numbers of small files. Each file needs to be added to the tar archive, which is a serial process with significant overhead. For large files, while performance is acceptable, it’s not streaming and creates memory buffers, potentially consuming substantial memory.

By default, docker cp preserves file user and group information. If you copy files from a container (typically running as root user) to the host, the file owner on the host will also be root. This can lead to permission issues, requiring manual chown to change ownership.

docker cp behavior with symbolic links requires special attention:

  • When copying to container: If the source is a symbolic link, it copies the link itself, not the target file the link points to.
  • When copying from container: If the path in the container is a symbolic link, it copies the content the link points to (files or directories). This inconsistent behavior is the root cause of some security vulnerabilities.

docker cp relies on a running Docker Daemon API, but it can operate on stopped containers. This is a common misconception. You can copy data from a stopped container because its file system still exists.


The biggest issue with docker cp lies in its security model - it’s a classic example of the “Confused Deputy” problem.

  • You (the attacker): A low-privilege user who can only operate containers.
  • The Deputy (Docker Daemon): A process with the highest system privileges (root).
  • The Confused Action: You send what appears to be a harmless request to the high-privilege Docker Daemon through the docker cp command. But due to implementation flaws, this high-privilege “deputy” can be deceived into operating host resources it shouldn’t access while executing your request, achieving container escape.

This vulnerability perfectly illustrates the above risks.

  • Vulnerability Rating: Critical, CVSS v3.1 Score: 9.8
  • Vulnerability Mechanism:
    1. When Docker executes docker cp, it starts a helper process called docker-tar inside the container. Although this process runs within the container’s namespace, it executes with the host’s root user identity.
    2. An attacker can replace normal system libraries (like libnss_*.so) with malicious library files inside the container.
    3. When docker cp is triggered, the docker-tar process starts and attempts to load these libnss libraries.
    4. Since it loads the attacker’s malicious libraries and runs with root privileges, the malicious code executes on the host with root permissions.
  • Consequences: Attackers can completely control the host from what appears to be an isolated container, achieving complete container escape.

Given the various issues with docker cp, we should use safer, more efficient alternatives in most situations.

Method Advantages Disadvantages Best Use Cases
Dockerfile COPY/ADD Declarative, immutable, secure Only usable during build time Embedding application code, static configurations into images.
Data Volumes High performance, persistence, decoupling Requires advance planning and management Database files, user uploads, application runtime state.
Bind Mounts Real-time sync, development convenience Couples with host paths, permission risks Local development environments, real-time code changes.
docker exec + tar Strong control, can bypass cp limitations Complex commands, manual operation Emergency manual packaging and transfer of complex data.
  • Dockerfile COPY/ADD: This is the preferred method for putting code and resources into images. It aligns with immutable infrastructure principles, creating self-contained and predictable images.
  • Data Volumes: This is the standard approach for managing container runtime data. Managed by Docker, decoupled from the host file system, excellent performance, and easy to backup and migrate.
  • Bind Mounts: Primarily used in development environments, directly mapping a host directory into the container. While very convenient, it breaks container isolation and may introduce permission issues. Not recommended for production environments.
  • docker exec + tar: This is the manual version of docker cp, more flexible but more complex.

    bash

    # Copy from container to host
    docker exec my-container tar czf - /path/in/container > local-archive.tar.gz
    
    # Copy from host to container
    tar czf - ./local-file | docker exec -i my-container tar xzf - -C /path/in/container

docker cp is a tool designed for emergency diagnostics and manual intervention - it’s like a sharp but dangerous scalpel. In automated workflows, CI/CD pipelines, or any routine production environment operations, it should be strictly prohibited.

Best Practice Guidelines:

  1. Use COPY for build-time data: All static code and configurations deployed with the application should use the COPY command in Dockerfile.
  2. Use Volumes for runtime data: All data that needs persistence and is generated by applications at runtime (like database files, user uploads) must use data volumes.
  3. Use Bind Mounts for development environments: Only consider bind mounts for local development when convenient real-time coding is needed.
  4. Treat docker cp as a last resort: Only use it when you need to urgently extract logs or diagnostic files from a “black box” container without configured data volumes.
  5. Protect the Docker Socket: Never mount Docker’s Unix socket (/var/run/docker.sock) into untrusted containers - this is equivalent to handing over the host’s root privileges and opens up docker cp risk exposure.

By following these principles, you can build safer, more robust, and more maintainable containerized applications, fundamentally avoiding the potential risks that docker cp brings.

Related Content