Status: 🔜 starting Week 1. This page fills in as I go — it’s a living note.

Mental model

  • Docker = reproducible runtime. Ship the environment, not just the code.
  • Terraform = reproducible infrastructure. Declare the desired state; let the provider reconcile.
  • Together: nothing about my setup lives only in my head or my laptop.

Docker

# Build an image for the ingestion script
docker build -t taxi_ingest:v001 .

# Run Postgres locally
docker run -it \
  -e POSTGRES_USER=root \
  -e POSTGRES_PASSWORD=root \
  -e POSTGRES_DB=ny_taxi \
  -v "$(pwd)/ny_taxi_postgres_data:/var/lib/postgresql/data" \
  -p 5432:5432 \
  postgres:15

Gotcha (Windows): volume paths and line endings (\r\n) in entrypoint scripts will bite you — keep shell scripts LF.

docker compose

services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: root
      POSTGRES_PASSWORD: root
      POSTGRES_DB: ny_taxi
    volumes:
      - ./ny_taxi_postgres_data:/var/lib/postgresql/data
    ports: ["5432:5432"]
  pgadmin:
    image: dpage/pgadmin4
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@admin.com
      PGADMIN_DEFAULT_PASSWORD: root
    ports: ["8080:80"]

Why compose: services resolve each other by name on a shared network — pgAdmin connects to host postgres, not localhost.

GCP setup

  • Create a project; enable BigQuery + Cloud Storage APIs.
  • Service account with least privilege (Storage Admin + BigQuery Admin for the course; tighten for real work).
  • Download a key JSON → never commit it (see .gitignore).

Terraform

terraform {
  required_providers {
    google = { source = "hashicorp/google" }
  }
}

provider "google" {
  project = var.project
  region  = var.region
}

resource "google_storage_bucket" "data_lake" {
  name          = "${var.project}-data-lake"
  location      = var.region
  force_destroy = true
}

resource "google_bigquery_dataset" "raw" {
  dataset_id = "ny_taxi_raw"
  location   = var.region
}
terraform init      # download provider, set up backend
terraform plan      # preview the diff against real state
terraform apply     # reconcile
terraform destroy   # tear it all down (cost control!)

What’s new since I last did this

  • TODO: fill in after Week 1.

Open questions

  • TODO

Homework

  • Link to my submission / repo