I want to prepare custom image (based on offical Postges image) with two tasks:
- Download data (eg. get CSV file by wget),
- Load data into database (creating tables, inserts).
I want to do both steps during building image, not during running container, because each of them takes a lot of time, and I want to build image once and run many containers quickly.
I know how to do step 1 (download data) during building image, but I don't know how to load data into database during building image instead of run container (step 2).
Example:
(download - during building image, load - during running container)
Dockerfile:
FROM postgres:10.7
RUN apt-get update \
&& apt-get install -y wget \
&& rm -rf /var/lib/apt/lists/*
COPY download.sh /download.sh
RUN /download.sh
download.sh:
#!/bin/bash
cd /docker-entrypoint-initdb.d/
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/northwindextended/northwind.postgre.sql
To download data I run script myself. To load data I use "initialization scripts" utility from official Postgres image.
Building image:
docker build -t mydbimage .
Running image:
docker run --name mydbcontainer -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d mydbimage
After running, you can see how much loading data takes time:
docker logs mydbcontainer
This example dataset is small, but with bigger, long time running container is awkward.