Deep Dive into Docker cp Command: From Basics and Internals to Security Best Practices
The docker container cp
command is something we use frequently in our daily Docker interactions. While it appears simple on the surface, most of us lack deep understanding of its internal implementation and security risks. This article will take you on a comprehensive journey through this command - from basic usage to core source code, then to security vulnerabilities and best practices, helping you truly master it.
1. Basic Usage
Command Purpose
The primary function of docker container cp
(or simply docker cp
) is to bidirectionally copy files or directories between the host file system and a running container’s file system.
Command Syntax
Its syntax is very similar to the familiar scp
or cp
commands.
# Copy from container to host
docker cp <containerId>:<path/in/container> <path/on/host>
# Copy from host to container
docker cp <path/on/host> <containerId>:<path/in/container>
Examples:
# Copy the /etc/nginx/nginx.conf file from container my-nginx to the current directory on host
docker cp my-nginx:/etc/nginx/nginx.conf .
# Copy the my-app.jar file from host to the /app directory in container my-app
docker cp my-app.jar my-app:/app/
2. Implementation Principles & Source Code Analysis
Many people assume docker cp
is a direct file system operation, but it’s actually a client-server (C/S) architecture API call with tar
archive streams at its core.
Core Mechanism: API & Tar Streams
Whether you’re copying files from or to a container, the entire process follows this flow:
- Client Parsing: The
docker
CLI (client) parses your command. - API Request:
- Copy to container: The client first packages the files or directories to be copied into a local
tar
archive, then calls the Docker Daemon’sPUT /containers/{id}/archive
API, sending thistar
stream as the request body to the Docker server. - Copy from container: The client calls the
GET /containers/{id}/archive
API, specifying which path’s files are needed from the server.
- Copy to container: The client first packages the files or directories to be copied into a local
- Server Processing: Docker Daemon (server) handles the request:
- Copy to container: The server receives the
tar
stream and directly extracts it within the container’s file system. - Copy from container: The server packages the specified files/directories from the container into a
tar
stream and sends it back as an HTTP response to the client.
- Copy to container: The server receives the
- Client Completion:
- Copy to container: After the server completes extraction, the API call ends.
- Copy from container: The client receives the server’s
tar
stream response and extracts it locally, completing the file copy operation.
Source Code Perspective: Client Implementation (Go)
In the Docker CLI source code (docker/cli
), we can see the clear implementation of this process. Here’s a simplified Go code snippet showing the core logic of CopyToContainer
.
// (Code located in docker/cli/cli/command/container/cp.go and client/interface.go)
// CopyToContainer copies content to a container
func (cli *Client) CopyToContainer(ctx context.Context, container, path string, content io.Reader, options CopyToContainerOptions) error {
// Prepare API request
// The core here is the API endpoint: /containers/{id}/archive
apiPath := "/containers/" + container + "/archive"
query := url.Values{}
query.Set("path", path)
query.Set("noOverwriteDirNonDir", strconv.FormatBool(options.NoOverwriteDirNonDir))
// Make a PUT request, with the request body being the tar archive stream (content)
resp, err := cli.put(ctx, apiPath, query, content, headers)
if err != nil {
return err
}
ensureReaderClosed(resp)
return nil
}
Explanation: The client’s CopyToContainer
function essentially makes a PUT
request to /containers/{id}/archive
, sending the packaged local files as an io.Reader
(tar stream) in the request body.
Source Code Perspective: Server Implementation (Go)
In the Moby project (moby/moby
) source code, we can find the server-side logic that handles this API request.
// (Code located in moby/moby/api/server/router/container/container_routes.go)
// putContainersArchive handles PUT requests for copying files to containers
func (s *containerRouter) putContainersArchive(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
// ... parse parameters ...
container, err := s.backend.ContainerGet(vars["name"])
if err != nil {
return err
}
// Call the backend's ExtractToDir method, where r.Body is the tar stream from the client
if err := s.backend.ContainerExtractToDir(container.Name, query.Get("path"), r.Body); err != nil {
return err
}
w.WriteHeader(http.StatusOK)
return nil
}
Explanation: The server’s routing passes the request to the putContainersArchive
function, whose core action is calling ContainerExtractToDir
, which directly extracts the request body (r.Body
, i.e., the tar
stream) at the specified path within the container.
3. Practical Use Case Examples
Copying Logs from Containers
When application logs aren’t mounted out through data volumes, we can use the docker cp command to copy files from inside the container to a specified directory on the host.
# Retrieve application log files from a container named "my-prod-app"
docker cp my-prod-app:/var/log/app.log ./app-debug.log
Uploading Configuration Files to Containers
Dynamically update an application’s configuration without restarting the container.
# Update Nginx configuration file
docker cp nginx.conf my-nginx:/etc/nginx/nginx.conf
# Enter the container and reload Nginx configuration
docker exec my-nginx nginx -s reload
Automation in Scripts
Used in CI/CD or deployment scripts for transferring build artifacts.
#!/bin/bash
CONTAINER_ID=$(docker ps -qf "name=my-web-server")
BUILD_ARTIFACT="./dist/index.html"
if [ -n "$CONTAINER_ID" ]; then
echo "Copying $BUILD_ARTIFACT to $CONTAINER_ID:/usr/share/nginx/html/"
docker cp "$BUILD_ARTIFACT" "$CONTAINER_ID:/usr/share/nginx/html/"
echo "Copy complete."
else
echo "Error: Container not found."
exit 1
fi
4. Limitations & Common Issues
Performance Bottlenecks
Due to its tar
packaging and unpacking mechanism, docker cp
performs extremely poorly when handling large numbers of small files. Each file needs to be added to the tar
archive, which is a serial process with significant overhead. For large files, while performance is acceptable, it’s not streaming and creates memory buffers, potentially consuming substantial memory.
User & Permissions (UID/GID)
By default, docker cp
preserves file user and group information. If you copy files from a container (typically running as root
user) to the host, the file owner on the host will also be root
. This can lead to permission issues, requiring manual chown
to change ownership.
Symbolic Links
docker cp
behavior with symbolic links requires special attention:
- When copying to container: If the source is a symbolic link, it copies the link itself, not the target file the link points to.
- When copying from container: If the path in the container is a symbolic link, it copies the content the link points to (files or directories). This inconsistent behavior is the root cause of some security vulnerabilities.
Cannot Operate on Stopped Containers
docker cp
relies on a running Docker Daemon API, but it can operate on stopped containers. This is a common misconception. You can copy data from a stopped container because its file system still exists.
5. In-Depth Security Risk Analysis
The biggest issue with docker cp
lies in its security model - it’s a classic example of the “Confused Deputy” problem.
The “Confused Deputy” Problem
- You (the attacker): A low-privilege user who can only operate containers.
- The Deputy (Docker Daemon): A process with the highest system privileges (
root
). - The Confused Action: You send what appears to be a harmless request to the high-privilege Docker Daemon through the
docker cp
command. But due to implementation flaws, this high-privilege “deputy” can be deceived into operating host resources it shouldn’t access while executing your request, achieving container escape.
Critical Vulnerability Case: CVE-2019-14271
This vulnerability perfectly illustrates the above risks.
- Vulnerability Rating: Critical, CVSS v3.1 Score: 9.8
- Vulnerability Mechanism:
- When Docker executes
docker cp
, it starts a helper process calleddocker-tar
inside the container. Although this process runs within the container’s namespace, it executes with the host’sroot
user identity. - An attacker can replace normal system libraries (like
libnss_*.so
) with malicious library files inside the container. - When
docker cp
is triggered, thedocker-tar
process starts and attempts to load theselibnss
libraries. - Since it loads the attacker’s malicious libraries and runs with
root
privileges, the malicious code executes on the host withroot
permissions.
- When Docker executes
- Consequences: Attackers can completely control the host from what appears to be an isolated container, achieving complete container escape.
6. Best Alternative Solutions Comparison
Given the various issues with docker cp
, we should use safer, more efficient alternatives in most situations.
Solution Comparison Table
Method | Advantages | Disadvantages | Best Use Cases |
---|---|---|---|
Dockerfile COPY/ADD |
Declarative, immutable, secure | Only usable during build time | Embedding application code, static configurations into images. |
Data Volumes |
High performance, persistence, decoupling | Requires advance planning and management | Database files, user uploads, application runtime state. |
Bind Mounts |
Real-time sync, development convenience | Couples with host paths, permission risks | Local development environments, real-time code changes. |
docker exec + tar |
Strong control, can bypass cp limitations |
Complex commands, manual operation | Emergency manual packaging and transfer of complex data. |
Solution Details
- Dockerfile
COPY
/ADD
: This is the preferred method for putting code and resources into images. It aligns with immutable infrastructure principles, creating self-contained and predictable images. - Data Volumes: This is the standard approach for managing container runtime data. Managed by Docker, decoupled from the host file system, excellent performance, and easy to backup and migrate.
- Bind Mounts: Primarily used in development environments, directly mapping a host directory into the container. While very convenient, it breaks container isolation and may introduce permission issues. Not recommended for production environments.
docker exec
+tar
: This is the manual version ofdocker cp
, more flexible but more complex.# Copy from container to host docker exec my-container tar czf - /path/in/container > local-archive.tar.gz # Copy from host to container tar czf - ./local-file | docker exec -i my-container tar xzf - -C /path/in/container
7. Conclusion
docker cp
is a tool designed for emergency diagnostics and manual intervention - it’s like a sharp but dangerous scalpel. In automated workflows, CI/CD pipelines, or any routine production environment operations, it should be strictly prohibited.
Best Practice Guidelines:
- Use
COPY
for build-time data: All static code and configurations deployed with the application should use theCOPY
command inDockerfile
. - Use
Volumes
for runtime data: All data that needs persistence and is generated by applications at runtime (like database files, user uploads) must use data volumes. - Use
Bind Mounts
for development environments: Only consider bind mounts for local development when convenient real-time coding is needed. - Treat
docker cp
as a last resort: Only use it when you need to urgently extract logs or diagnostic files from a “black box” container without configured data volumes. - Protect the Docker Socket: Never mount Docker’s Unix socket (
/var/run/docker.sock
) into untrusted containers - this is equivalent to handing over the host’sroot
privileges and opens updocker cp
risk exposure.
By following these principles, you can build safer, more robust, and more maintainable containerized applications, fundamentally avoiding the potential risks that docker cp
brings.