Status: 🔜 starting Week 1. This page fills in as I go — it’s a living note.
Mental model
- Docker = reproducible runtime. Ship the environment, not just the code.
- Terraform = reproducible infrastructure. Declare the desired state; let the provider reconcile.
- Together: nothing about my setup lives only in my head or my laptop.
Docker
# Build an image for the ingestion script
docker build -t taxi_ingest:v001 .
# Run Postgres locally
docker run -it \
-e POSTGRES_USER=root \
-e POSTGRES_PASSWORD=root \
-e POSTGRES_DB=ny_taxi \
-v "$(pwd)/ny_taxi_postgres_data:/var/lib/postgresql/data" \
-p 5432:5432 \
postgres:15
Gotcha (Windows): volume paths and line endings (\r\n) in entrypoint scripts will bite you — keep shell scripts LF.
docker compose
services:
postgres:
image: postgres:15
environment:
POSTGRES_USER: root
POSTGRES_PASSWORD: root
POSTGRES_DB: ny_taxi
volumes:
- ./ny_taxi_postgres_data:/var/lib/postgresql/data
ports: ["5432:5432"]
pgadmin:
image: dpage/pgadmin4
environment:
PGADMIN_DEFAULT_EMAIL: admin@admin.com
PGADMIN_DEFAULT_PASSWORD: root
ports: ["8080:80"]
Why compose: services resolve each other by name on a shared network — pgAdmin connects to host
postgres, notlocalhost.
GCP setup
- Create a project; enable BigQuery + Cloud Storage APIs.
- Service account with least privilege (Storage Admin + BigQuery Admin for the course; tighten for real work).
- Download a key JSON → never commit it (see
.gitignore).
Terraform
terraform {
required_providers {
google = { source = "hashicorp/google" }
}
}
provider "google" {
project = var.project
region = var.region
}
resource "google_storage_bucket" "data_lake" {
name = "${var.project}-data-lake"
location = var.region
force_destroy = true
}
resource "google_bigquery_dataset" "raw" {
dataset_id = "ny_taxi_raw"
location = var.region
}
terraform init # download provider, set up backend
terraform plan # preview the diff against real state
terraform apply # reconcile
terraform destroy # tear it all down (cost control!)
What’s new since I last did this
- TODO: fill in after Week 1.
Open questions
- TODO
Homework
- Link to my submission / repo