[{"content":" Status: 🔜 starting Week 1. This page fills in as I go — it\u0026rsquo;s a living note.\nMental model Docker = reproducible runtime. Ship the environment, not just the code. Terraform = reproducible infrastructure. Declare the desired state; let the provider reconcile. Together: nothing about my setup lives only in my head or my laptop. Docker # Build an image for the ingestion script docker build -t taxi_ingest:v001 . # Run Postgres locally docker run -it \\ -e POSTGRES_USER=root \\ -e POSTGRES_PASSWORD=root \\ -e POSTGRES_DB=ny_taxi \\ -v \u0026#34;$(pwd)/ny_taxi_postgres_data:/var/lib/postgresql/data\u0026#34; \\ -p 5432:5432 \\ postgres:15 Gotcha (Windows): volume paths and line endings (\\r\\n) in entrypoint scripts will bite you — keep shell scripts LF.\ndocker compose services: postgres: image: postgres:15 environment: POSTGRES_USER: root POSTGRES_PASSWORD: root POSTGRES_DB: ny_taxi volumes: - ./ny_taxi_postgres_data:/var/lib/postgresql/data ports: [\u0026#34;5432:5432\u0026#34;] pgadmin: image: dpage/pgadmin4 environment: PGADMIN_DEFAULT_EMAIL: admin@admin.com PGADMIN_DEFAULT_PASSWORD: root ports: [\u0026#34;8080:80\u0026#34;] Why compose: services resolve each other by name on a shared network — pgAdmin connects to host postgres, not localhost.\nGCP setup Create a project; enable BigQuery + Cloud Storage APIs. Service account with least privilege (Storage Admin + BigQuery Admin for the course; tighten for real work). Download a key JSON → never commit it (see .gitignore). Terraform terraform { required_providers { google = { source = \u0026#34;hashicorp/google\u0026#34; } } } provider \u0026#34;google\u0026#34; { project = var.project region = var.region } resource \u0026#34;google_storage_bucket\u0026#34; \u0026#34;data_lake\u0026#34; { name = \u0026#34;${var.project}-data-lake\u0026#34; location = var.region force_destroy = true } resource \u0026#34;google_bigquery_dataset\u0026#34; \u0026#34;raw\u0026#34; { dataset_id = \u0026#34;ny_taxi_raw\u0026#34; location = var.region } terraform init # download provider, set up backend terraform plan # preview the diff against real state terraform apply # reconcile terraform destroy # tear it all down (cost control!) What\u0026rsquo;s new since I last did this TODO: fill in after Week 1. Open questions TODO Homework Link to my submission / repo ","permalink":"https://oroshimado.com/notes/module-01-docker-terraform-gcp/","summary":"Reproducible local data stack + cloud infra as code. Working notes.","title":"Module 1 — Docker, Terraform \u0026 GCP"},{"content":" The dare: rebuild data engineering from first principles in 8 weeks, ~10–15 hrs/week, with today\u0026rsquo;s tooling — and write up every step.\nStart date: 2026-06-01 · Target finish: 2026-07-26 · Cohort reference: DataTalksClub DE Zoomcamp\nProgress at a glance Wk Focus Tools Notes Write-up Status 1 Containers \u0026amp; IaC Docker, Compose, Postgres, Terraform, GCP notes — 🔜 2 Workflow orchestration Kestra, data lakes — — ⬜ 3 Data ingestion workshop dlt, REST APIs, incremental loads — — ⬜ 4 Data warehousing BigQuery, partitioning, clustering, BQML — — ⬜ 5 Analytics engineering dbt, DuckDB, BigQuery, tests, docs — — ⬜ 6 Data platform end-to-end Bruin, data quality — — ⬜ 7 Batch processing Apache Spark, DataFrames, SQL — — ⬜ 8 Streaming + capstone kickoff Kafka, Kafka Streams, KSQL, Avro — — ⬜ 9+ Capstone project everything above — — ⬜ Legend: ✅ done · 🟡 in progress · 🔜 next · ⬜ not started\nThe senior\u0026rsquo;s angle I\u0026rsquo;m not learning these concepts cold — I\u0026rsquo;m re-deriving them and stress-testing what I think I know. For each module the lens is:\nBuild it the course way — no shortcuts, do the homework. What\u0026rsquo;s genuinely new? — Kestra, dlt, DuckDB, and Bruin didn\u0026rsquo;t exist (or weren\u0026rsquo;t mainstream) when I learned this. Note what\u0026rsquo;s changed. Where would this break at scale? — connect each toy pipeline back to production reality. Teach it back — the weekly write-up is the test. If I can\u0026rsquo;t explain it simply, I didn\u0026rsquo;t relearn it. Week 1 — Containerization \u0026amp; Infrastructure as Code Goal: a reproducible local data stack and cloud infra defined as code.\nDockerize a Postgres + ingestion script; load NYC taxi data Compose the stack (Postgres + pgAdmin) with docker compose GCP project + service account + IAM (least privilege) Terraform: GCS bucket + BigQuery dataset, plan/apply/destroy Homework submitted Write-up: \u0026ldquo;What Terraform state actually buys you\u0026rdquo; Re-derive: image layers \u0026amp; caching, why Compose networks let containers resolve by service name, Terraform\u0026rsquo;s plan/apply/state lifecycle.\nWeek 2 — Workflow Orchestration (Kestra) Goal: schedule and orchestrate the ingestion pipeline; land data in a lake.\nKestra up via Docker; first declarative (YAML) flow Parameterized + scheduled flow; backfills Load taxi data → GCS (data lake) → BigQuery Homework submitted Write-up: \u0026ldquo;Kestra vs. the Airflow muscle memory\u0026rdquo; Re-derive: idempotency, scheduling vs. event triggers, DAG semantics, backfill correctness.\nWeek 3 — Data Ingestion Workshop (dlt) Goal: robust, scalable ingestion from APIs.\nConsume a paginated REST API with dlt Schema inference \u0026amp; normalization into nested tables Incremental / merge loads (only new rows) Homework submitted Write-up: \u0026ldquo;Incremental loading patterns, ranked\u0026rdquo; Re-derive: full vs. incremental vs. CDC, idempotent upserts, schema evolution.\nWeek 4 — Data Warehousing (BigQuery) Goal: model for cost and speed in a columnar warehouse.\nExternal vs. native tables Partitioning + clustering — measure bytes scanned before/after Query cost \u0026amp; performance tuning Touch BigQuery ML (CREATE MODEL) Homework submitted Write-up: \u0026ldquo;Partitioning vs. clustering: when each actually helps\u0026rdquo; Re-derive: columnar storage, why pruning beats indexing here, slot-based pricing.\nWeek 5 — Analytics Engineering (dbt) Goal: turn raw tables into tested, documented, deployable models.\ndbt project against DuckDB locally, then BigQuery Staging → marts layering; sources, refs, seeds Tests (generic + singular) and docs site Deployment / scheduled run Homework submitted Write-up: \u0026ldquo;dbt as the discipline I should\u0026rsquo;ve always had\u0026rdquo; Re-derive: ELT vs. ETL, dimensional modeling, DAG of refs, test-as-contract.\nWeek 6 — Data Platform End-to-End (Bruin) Goal: one tool, full pipeline — ingest, transform, quality, deploy.\nBuild an end-to-end Bruin pipeline to BigQuery Built-in data quality checks Cloud deployment Homework submitted Write-up: \u0026ldquo;Where an all-in-one platform helps vs. best-of-breed\u0026rdquo; Re-derive: data quality dimensions, contracts, the build-vs-buy line.\nWeek 7 — Batch Processing (Apache Spark) Goal: process data that doesn\u0026rsquo;t fit on one machine.\nSpark DataFrames + Spark SQL GroupBy and Join internals (shuffles, partitions) Run a job on the taxi dataset Homework submitted Write-up: \u0026ldquo;Reading a Spark execution plan without fear\u0026rdquo; Re-derive: lazy evaluation, narrow vs. wide transforms, shuffle cost, skew.\nWeek 8 — Streaming (Kafka) + Capstone kickoff Goal: move from batch to unbounded data; scope the capstone.\nKafka producers/consumers; topics \u0026amp; partitions Kafka Streams / KSQL Avro + schema registry Homework submitted Capstone proposal drafted Write-up: \u0026ldquo;Exactly-once is a lie I now understand\u0026rdquo; Re-derive: log-based messaging, partitions \u0026amp; ordering, delivery semantics, windowing.\nWeek 9+ — Capstone Project Goal: one end-to-end pipeline that uses the whole stack and gets peer-reviewed.\nPick a dataset + a real question Batch (or streaming) ingestion → lake → warehouse IaC + orchestration + dbt models + a dashboard README, diagram, reproducible setup Submit for peer review Write-up: \u0026ldquo;What 8 weeks of relearning changed about how I build\u0026rdquo; ","permalink":"https://oroshimado.com/roadmap/","summary":"The 8-week fast-track plan through the DE Zoomcamp, with live progress.","title":"Roadmap \u0026 Progress"},{"content":"I\u0026rsquo;ve been doing data engineering long enough, and I\u0026rsquo;ve always felt left out, incapable, and inferior.\nThe itch The majority of the roles I have taken only made me stupider, and paired with an ever-growing impostor syndrome — you can imagine how that\u0026rsquo;s a recipe for disaster. Recently I saw a job offer for a senior data engineer: not only am I not qualified, but I had never even heard of the concepts in it. Talk about humbling.\nA few years ago I read the most amazing article ever, in the best blog ever, and it changed how I think about learning technology. I had been convinced I was inferior — I\u0026rsquo;d even developed a mechanism to protect myself from failing: unfortunately, that mechanism was not trying.\nNever again. This blog is for me and for every other person stuck in limbo — that state of dumbification, lack of purpose, and lack of direction.\nThe purpose is to create new, amazing things, and to develop curiosity.\nSo I\u0026rsquo;m taking the DataTalksClub Data Engineering Zoomcamp — a course aimed at people newer than me — and doing every piece of it honestly. No skipping homework. The dare is to be a beginner again on purpose.\nAI helped me build this website, and I\u0026rsquo;ll be editing it as I go — sharing my progress through the course, blog ideas, etc.\nHow I\u0026rsquo;m pacing it Fast track: 8 weeks, ~10–15 hours a week, one module per week, capstone after. The full plan and live progress live on the Roadmap.\nFor each module:\nBuild it the course way. Do the homework, no shortcuts. Name what\u0026rsquo;s new. What changed since I learned this? Find the breaking point. Where does the toy version fall apart at scale? Teach it back. This write-up is the exam. If I can\u0026rsquo;t explain it plainly, I didn\u0026rsquo;t relearn it. What \u0026ldquo;in public\u0026rdquo; means Everything lands here: terse notes as reference, and a weekly write-up like this one as the narrative. Publishing forces a standard — vague understanding doesn\u0026rsquo;t survive contact with a blank page. If it helps one other person retracing these steps, even better.\nWeek 1 is containers and infrastructure as code. Let\u0026rsquo;s go back to the beginning.\n","permalink":"https://oroshimado.com/posts/week-00-why-im-relearning-the-basics/","summary":"Why I\u0026rsquo;m going back to the fundamentals with the DE Zoomcamp, how I\u0026rsquo;ll pace it, and what \u0026lsquo;in public\u0026rsquo; means here.","title":"Week 0 — Why a senior engineer is relearning the basics"},{"content":"I\u0026rsquo;m Ahmed Adnane Amil, a senior data engineer. This site documents a deliberate experiment: going back to the fundamentals of data engineering — in public — using the DataTalksClub Data Engineering Zoomcamp.\nThe goal isn\u0026rsquo;t a certificate. It\u0026rsquo;s to re-derive the things experience let me stop thinking about, re-learn to learn, and leave behind notes useful to anyone walking the same road.\n🗺️ Roadmap — the 8-week plan and live progress 📓 Notes — module-by-module reference ✍️ Write-ups — weekly reflections Want to follow along? Grab the RSS feed. Visit my twitter Want to work together contact me on twitter :D, let\u0026rsquo;s build amazing things together.\nBuilt with Hugo + PaperMod.\n","permalink":"https://oroshimado.com/about/","summary":"\u003cp\u003eI\u0026rsquo;m \u003cstrong\u003eAhmed Adnane Amil\u003c/strong\u003e, a senior data engineer. This site documents a deliberate experiment: going back to the fundamentals of data engineering — in public — using the \u003cstrong\u003e\u003ca href=\"https://github.com/DataTalksClub/data-engineering-zoomcamp\"\u003eDataTalksClub Data Engineering Zoomcamp\u003c/a\u003e\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eThe goal isn\u0026rsquo;t a certificate. It\u0026rsquo;s to re-derive the things experience let me stop thinking about, re-learn to learn, and leave behind notes useful to anyone walking the same road.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e🗺️ \u003cstrong\u003e\u003ca href=\"/roadmap/\"\u003eRoadmap\u003c/a\u003e\u003c/strong\u003e — the 8-week plan and live progress\u003c/li\u003e\n\u003cli\u003e📓 \u003cstrong\u003e\u003ca href=\"/notes/\"\u003eNotes\u003c/a\u003e\u003c/strong\u003e — module-by-module reference\u003c/li\u003e\n\u003cli\u003e✍️ \u003cstrong\u003e\u003ca href=\"/posts/\"\u003eWrite-ups\u003c/a\u003e\u003c/strong\u003e — weekly reflections\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eWant to follow along? Grab the \u003ca href=\"/index.xml\"\u003eRSS feed\u003c/a\u003e.\nVisit my \u003cstrong\u003e\u003ca href=\"https://x.com/oroshimado/followers\"\u003etwitter\u003c/a\u003e\u003c/strong\u003e\nWant to work together contact me on twitter :D, let\u0026rsquo;s build amazing things together.\u003c/p\u003e","title":"About"}]