Context

sheehan-workspace today is a GCP-based personal monorepo: OpenTofu IaC in infra/ provisioning a Hugo static site (site/) on Firebase Hosting, deployed via GitHub Actions with Workload Identity Federation. There is no Python, no database, and no scheduled work.

The goal is to add a place to write Python cron jobs that persist data to Postgres via an ORM, on the same GCP project, without breaking the existing site stack. The longer-term aim is a Python web backend (Django) fronted by Flutter — so this scaffold needs to be expandable into a web app, not a throwaway shell.

Key locked decisions:

  • Compute: Cloud Run Jobs, one per cron task, triggered by Cloud Scheduler.
  • Database: Cloud SQL Postgres (db-f1-micro), public IP, connected via the platform-managed unix socket (cloud_sql_instances on the Cloud Run Job). No VPC connector — saves ~$10/mo.
  • ORM: Django ORM in standalone mode — cron jobs run as Django management commands (python manage.py run_job <name>), so manage.py handles django.setup() for free. Same project gains web views later by adding urls.py entries and a second Cloud Run Service.
  • Secrets: Secret Manager, mounted as env into the Cloud Run Job.
  • Deploy: new .github/workflows/deploy-jobs.yml triggered by jobs-v* tags, reusing the existing WIF + site-deployer SA with extra IAM bindings.

Estimated cost at idle: ~$12–14/mo, ~95% of which is Cloud SQL.


1. OpenTofu additions under infra/

Modify

  • apis.tf — append to local.required_services: artifactregistry.googleapis.com, run.googleapis.com, cloudscheduler.googleapis.com, sqladmin.googleapis.com, secretmanager.googleapis.com.
  • variables.tf — add jobs_image_tag (default "latest"), db_tier (default "db-f1-micro"), db_name (default "jobsdb"), db_user (default "jobs").
  • cicd.tf — grant site-deployer SA: roles/artifactregistry.writer on the jobs repo, roles/run.developer project-wide, and roles/iam.serviceAccountUser on the new jobs-runtime SA (required to deploy a job that runs as that SA).
  • outputs.tf — add jobs_image_repo, jobs_runtime_sa, db_connection_name.

Create

  • artifact_registry.tfgoogle_artifact_registry_repository.jobs (Docker format, us-central1).
  • cloudsql.tf:
    • google_sql_database_instance.main: POSTGRES_15, var.db_tier, zonal, 10 GB HDD, public IP, no authorized networks, cloudsql.iam_authentication = on, deletion_protection = true.
    • google_sql_database.app, google_sql_user.app (password auth), random_password.db.
  • secrets.tf:
    • google_secret_manager_secret.db_password + version from random_password.db.
    • google_secret_manager_secret.django_secret_key + version from random_password.django (length 64).
    • secretAccessor IAM binding for jobs-runtime SA on both.
  • cloudrun_jobs.tf:
    • google_service_account.jobs_runtime (account_id = "jobs-runtime").
    • roles/cloudsql.client on jobs_runtime.
    • locals.jobs map: { migrate = { args = ["migrate"], schedule = null }, example_stats = { args = ["run_job", "example_stats"], schedule = "0 * * * *" } }.
    • google_cloud_run_v2_job.this with for_each = local.jobs:
      • template.template.service_account = jobs_runtime.email.
      • template.template.cloud_sql_instances = [google_sql_database_instance.main.connection_name] — provides /cloudsql/<conn> unix socket.
      • containers.image = "${region}-docker.pkg.dev/${project}/jobs/app:${var.jobs_image_tag}".
      • containers.command = ["python", "manage.py"], containers.args = each.value.args.
      • Env: DJANGO_SETTINGS_MODULE=config.settings.cloud, GCP_PROJECT_ID, DB_INSTANCE_CONNECTION_NAME, DB_NAME, DB_USER; DB_PASSWORD and DJANGO_SECRET_KEY via value_source.secret_key_ref.
      • lifecycle.ignore_changes = [template[0].template[0].containers[0].image] so gcloud run jobs update from CI doesn’t fight tofu.
  • scheduler.tf:
    • google_service_account.scheduler + google_cloud_run_v2_job_iam_member.scheduler_invoker (roles/run.invoker) per scheduled job.
    • google_cloud_scheduler_job.this with for_each = { for k, v in local.jobs : k => v if v.schedule != null }.
    • http_target.uri = "https://${region}-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/${project}/jobs/jobs-${each.key}:run".
    • oauth_token (not oidc_token) since the target is *.googleapis.com — common pitfall.

2. Python project under jobs/

Tree

jobs/
├── Dockerfile
├── .dockerignore
├── pyproject.toml
├── manage.py
├── config/
│   ├── settings/{__init__,base,local,cloud}.py
│   ├── urls.py          # empty list today; ready for views later
│   ├── wsgi.py
│   └── asgi.py
├── core/
│   ├── apps.py
│   ├── models.py
│   ├── admin.py
│   ├── migrations/
│   └── management/commands/run_job.py
└── jobs_pkg/
    ├── registry.py      # name -> callable
    └── example_stats.py

Key points

  • Jobs are Django management commands, not standalone scripts. manage.py runs django.setup() automatically — no bare-script init needed.
  • run_job is a one-line dispatcher: JOBS[name](). Adding a new cron job = a new module in jobs_pkg/, a line in registry.py, a line in local.jobs in tofu.
  • config/urls.py exists but is empty today. Adding web views later = no restructure.
  • INSTALLED_APPS includes django.contrib.{contenttypes,auth} + core from day one, so future web migrations stay clean.
  • DATABASES["default"] reads env vars; on Cloud Run DB_HOST=/cloudsql/${DB_INSTANCE_CONNECTION_NAME} (unix socket via the platform proxy).
  • pyproject.toml deps: django>=5,<6, psycopg[binary]>=3.2, httpx. Dev: ruff, pytest, pytest-django.

Example job (jobs_pkg/example_stats.py)

Fetches top 30 Hacker News story IDs hourly, stores them in HnTopStory (fields: captured_at, rank, item_id, title, score, url). No API keys, idempotent per timestamp — a plausible “what was HN doing when I posted this” stat feed for the personal site. Demonstrates HTTP fetch + bulk ORM insert.

Dockerfile

python:3.12-slim, non-root user, pip install --no-cache-dir ., ENTRYPOINT ["python", "manage.py"]. No Cloud SQL Auth Proxy in the image — Cloud Run Jobs gen2 provides the socket.


3. Connectivity choice

Cloud SQL public IP + cloud_sql_instances unix socket on the Cloud Run Job. GCP’s managed proxy authenticates via IAM; no public ingress to Postgres even though the instance has a public IP. Saves ~$10/mo vs. a Serverless VPC Connector.

Django uses the built-in jobs user + Secret Manager password — IAM DB auth requires a custom psycopg connection factory in Django, not worth the complexity today. The cloudsql.iam_authentication flag is on so you can switch later.


4. GitHub Actions: .github/workflows/deploy-jobs.yml

Trigger: push.tags: ['jobs-v*'] + workflow_dispatch. Permissions: id-token: write, contents: read.

Steps:

  1. Checkout, auth via existing WIF (site-deployer SA), setup-gcloud.
  2. gcloud auth configure-docker us-central1-docker.pkg.dev.
  3. Compute IMAGE_TAG (tag → version, or ${GITHUB_SHA} for manual dispatch); tag image as both $TAG and latest.
  4. docker build ./jobs && docker push to Artifact Registry.
  5. gcloud run jobs update jobs-migrate --image $IMAGE:$TAG --region us-central1.
  6. gcloud run jobs execute jobs-migrate --region us-central1 --wait (the --wait is required to surface migration failures as workflow failures).
  7. Loop over cron jobs (jobs-example_stats, …) running the same update.

Do not run tofu apply from this workflow. Infra changes stay manual / separate.


5. Verification

Local

cd jobs
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
docker run --rm -d -p 5432:5432 -e POSTGRES_PASSWORD=dev postgres:15
DJANGO_SETTINGS_MODULE=config.settings.local python manage.py migrate
DJANGO_SETTINGS_MODULE=config.settings.local python manage.py run_job example_stats
DJANGO_SETTINGS_MODULE=config.settings.local python manage.py shell \
    -c "from core.models import HnTopStory; print(HnTopStory.objects.count())"   # expect 30

Cloud bring-up

  1. cd infra && tofu init && tofu apply — provisions DB, secrets, jobs, scheduler.
  2. git tag jobs-v0.1.0 && git push origin jobs-v0.1.0 — runs the workflow.
  3. gcloud run jobs executions list --job jobs-migrate --region us-central1 — confirm success.
  4. gcloud run jobs execute jobs-example_stats --region us-central1 --wait — manual smoke.
  5. gcloud scheduler jobs run run-example_stats --location us-central1 — verify Scheduler → Job wiring.
  6. After first scheduled fire (or temporarily set */5 * * * *): connect via cloud-sql-proxy <conn_name> & then psql "host=127.0.0.1 user=jobs dbname=jobsdb"SELECT count(*), max(captured_at) FROM core_hntopstory;.

6. Critical files

Create:

  • infra/{artifact_registry,cloudsql,secrets,cloudrun_jobs,scheduler}.tf
  • jobs/Dockerfile
  • jobs/pyproject.toml
  • jobs/manage.py
  • jobs/config/settings/{base,local,cloud}.py
  • jobs/config/urls.py
  • jobs/core/models.py
  • jobs/core/management/commands/run_job.py
  • jobs/jobs_pkg/{registry,example_stats}.py
  • .github/workflows/deploy-jobs.yml

Modify:

  • infra/apis.tf — extend local.required_services.
  • infra/variables.tf — 4 new vars.
  • infra/cicd.tf — 3 IAM bindings for site-deployer.
  • infra/outputs.tf — 3 new outputs.
  • .gitignore — Python artifacts.

7. Gotchas

  • Cloud Scheduler → Cloud Run Jobs uses oauth_token, not oidc_token (target host is *.googleapis.com, not the run.app URL).
  • --wait on gcloud run jobs execute is what propagates the exit code; without it a failing migration silently passes CI.
  • lifecycle.ignore_changes on the job image lets tofu and gcloud run jobs update coexist without drift fights.
  • cloud_sql_instances on the Cloud Run Job provides /cloudsql/<connection_name> automatically; do not also embed the Cloud SQL Auth Proxy in the Dockerfile.
  • db-f1-micro is not available in every region/Postgres-version combodb_tier is a variable so you can swap to db-g1-small without code changes.