Volumes

We just saw that containers are stateless - when we created a directory and then started a new container, our directory was gone. This is actually a feature, not a bug! It ensures that every time someone runs your container, they start with the exact same environment. But what if we do need to save data or share files between our computer and the container?

Introducing Volumes

Volumes are how we share files and folders between our computer (the "host") and the container. Think of them like a shared folder that both can see and modify.

Let's try this:

# Create a folder for our work
mkdir myproject
cd myproject

# Create a simple Python script
echo "print('Hello from the host!')" > script.py

# Run Python container with current directory mounted
docker run -v $(pwd):/work python:3.9 python /work/script.py

Let's break down what happened: - -v $(pwd):/work creates a volume: - $(pwd) is our current directory on the host - :/work is where it appears in the container - The files are the same in both places!

Persistent Data

Now let's see how volumes solve our earlier problem:

# Run container with mounted volume
docker run -it -v $(pwd):/work python:3.9 bash

# Inside container:
cd /work
mkdir data
echo "This will persist!" > data/note.txt
exit

# Back on host, check the files:
ls data
cat data/note.txt

The files we created in the container are right there on our computer! This is crucial for: - Saving analysis results - Working with data files - Developing code - Storing configuration

Common Volume Use Cases

Code Development

docker run -it -v $(pwd):/code python:3.9 bash

Data Analysis

docker run -v $(pwd)/data:/data -v $(pwd)/notebooks:/notebooks jupyter/datascience-notebook

Results Output

docker run -v $(pwd)/results:/results myanalysis

Important Volume Tips

Use Absolute Paths: While $(pwd) works for current directory, absolute paths are more reliable:
```
docker run -v /Users/me/project:/work python:3.9
```
Read-Only Volumes: Add :ro to prevent container from modifying host files:
```
docker run -v $(pwd):/work:ro python:3.9
```

Multiple Volumes: You can mount multiple volumes:

docker run \
  -v $(pwd)/data:/data \
  -v $(pwd)/config:/config \
  -v $(pwd)/results:/results \
  python:3.9

Best Practices for Research

Organize Your Mounts:

project/
  ├── data/         # Mount as /data
  ├── notebooks/    # Mount as /notebooks
  ├── scripts/      # Mount as /scripts
  └── results/      # Mount as /results

Document Your Volumes:

# Run analysis with required volumes
docker run \
  -v $(pwd)/data:/data:ro      # Input data (read-only)
  -v $(pwd)/results:/results    # Analysis output
  -v $(pwd)/config:/config:ro   # Configuration files
  myanalysis

Consider Data Size:
Large datasets might be better referenced externally
Consider using data subsets for development
Document data requirements clearly

Next, we'll look at creating our own images with Dockerfiles, so we can package up our entire research environment for others to use.