Skip to content

Building your own images

Project overview

In this section we will build our own images using Dockerfiles. In order to demonstrate the process, we will have some example project files. In the cancer-prediction directory, we have the following:

├── cancer_prediction
│   ├── cancer_model.py
│   ├── data
│      ├── breast_cancer.csv
│      ├── breast_cancer_test.csv
│      └── breast_cancer_train.csv
│   ├── models
│      └── cancer_model.pkl
│   └── notebook.ipynb
└── requirements.txt

We might want our collaborators to be able to run all of the code in the cancer-prediction directory without having to install all of the dependencies. We can create a Dockerfile to build an image that contains all of the dependencies and code needed to run the project.

You should fork this repository (include all branches, not just main). Then create a new Codespace on the start branch. You should then see the above directory plus some other stuff like a LICENSE file and a README.md file.

This overview sets us up to dive deeper into each of these concepts and see how they work in practice with our machine learning project.

Our Dockerfile

Here is the Dockerfile:

# Start from an official Python base image
FROM python:3.11-slim-bookworm

# Install git and clean up apt cache in the same layer
RUN apt-get update && \
    apt-get install -y --no-install-recommends git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Set working directory in the container
WORKDIR /workspace

# Copy requirements first to leverage Docker cache
COPY requirements.txt .

# Install dependencies - combine commands to reduce layers
RUN pip install --no-cache-dir \
    jupyterlab \
    jupyterlab-git \
    httpx==0.27.2 \
    -r requirements.txt

# Copy the entire project
COPY . .

# Expose the port Jupyter will run on
EXPOSE 8888

# Start Jupyter Lab from the cancer_prediction directory
WORKDIR /workspace/cancer-prediction
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--LabApp.token=''", "--LabApp.password=''"]

Let's walkthrough the Dockerfile.

FROM python:3.11-slim-bookworm

  • Starts with official Python 3.11 image
  • 'slim-bookworm' means minimal Debian Bookworm-based image, reducing container size
  • Alternative to full image which includes many unnecessary packages

RUN apt-get update && ...

  • Installs git for version control
  • Cleans up apt cache to reduce image size
  • Combines commands to reduce layers

WORKDIR /workspace

  • Creates and sets the working directory to /workspace
  • All subsequent commands will run from this directory
  • Standard practice for development containers

COPY requirements.txt .

  • Copies only requirements.txt first
  • Helps with build caching - if requirements don't change, cache this layer
  • The '.' means copy to current WORKDIR

RUN pip install --no-cache-dir \ jupyterlab \ -r requirements.txt

  • Installs Python packages
  • --no-cache-dir reduces image size by not caching pip downloads
  • Combines installations in one RUN to create single layer
  • Backslashes allow multiple lines for readability

COPY . .

  • Copies all remaining project files
  • First '.' means everything in build context
  • Second '.' means copy to current WORKDIR
  • Done after requirements for better caching

USER jupyter

  • Switches to non-root user
  • All subsequent commands run as this user

EXPOSE 8888

  • Documents that container uses port 8888
  • Doesn't actually open port - that's done at runtime
  • JupyterLab's default port

WORKDIR /workspace/cancer_prediction

  • Changes working directory again
  • Ensures Jupyter starts in project directory

CMD ["jupyter", "lab", "--ip=0.0.0.0" ...]

  • Command to run when container starts
  • --ip=0.0.0.0 allows external connections
  • --no-browser since running in container
  • Empty token/password for workshop access

Building the image

To build the image, run the following command in the terminal:

docker build -t cancer-prediction .

This command builds the image using the Dockerfile in the current directory and tags it with the name cancer-prediction.

Running the container

To run the container, use the following command:

docker run -p 8888:8888 cancer-prediction

You should see the Jupyter Lab URL open in the browser. If you run something in the notebook and save it, the changes will not persist in the container. To achieve this, you will need to mount your volume directory when running the container:

docker run -p 8888:8888 -v $(pwd):/workspace cancer-prediction