Python cron jobs on Cloud Run + Cloud SQL

Context

sheehan-workspace today is a GCP-based personal monorepo: OpenTofu IaC in infra/ provisioning a Hugo static site (site/) on Firebase Hosting, deployed via GitHub Actions with Workload Identity Federation. There is no Python, no database, and no scheduled work.

The goal is to add a place to write Python cron jobs that persist data to Postgres via an ORM, on the same GCP project, without breaking the existing site stack. The longer-term aim is a Python web backend (Django) fronted by Flutter — so this scaffold needs to be expandable into a web app, not a throwaway shell.

Key locked decisions:

Compute: Cloud Run Jobs, one per cron task, triggered by Cloud Scheduler.
Database: Cloud SQL Postgres (db-f1-micro), public IP, connected via the platform-managed unix socket (cloud_sql_instances on the Cloud Run Job). No VPC connector — saves ~$10/mo.
ORM: Django ORM in standalone mode — cron jobs run as Django management commands (python manage.py run_job <name>), so manage.py handles django.setup() for free. Same project gains web views later by adding urls.py entries and a second Cloud Run Service.
Secrets: Secret Manager, mounted as env into the Cloud Run Job.
Deploy: new .github/workflows/deploy-jobs.yml triggered by jobs-v* tags, reusing the existing WIF + site-deployer SA with extra IAM bindings.

Estimated cost at idle: ~$12–14/mo, ~95% of which is Cloud SQL.

1. OpenTofu additions under `infra/`

Modify

apis.tf — append to local.required_services: artifactregistry.googleapis.com, run.googleapis.com, cloudscheduler.googleapis.com, sqladmin.googleapis.com, secretmanager.googleapis.com.
variables.tf — add jobs_image_tag (default "latest"), db_tier (default "db-f1-micro"), db_name (default "jobsdb"), db_user (default "jobs").
cicd.tf — grant site-deployer SA: roles/artifactregistry.writer on the jobs repo, roles/run.developer project-wide, and roles/iam.serviceAccountUser on the new jobs-runtime SA (required to deploy a job that runs as that SA).
outputs.tf — add jobs_image_repo, jobs_runtime_sa, db_connection_name.

Create

artifact_registry.tf — google_artifact_registry_repository.jobs (Docker format, us-central1).
cloudsql.tf:
- google_sql_database_instance.main: POSTGRES_15, var.db_tier, zonal, 10 GB HDD, public IP, no authorized networks, cloudsql.iam_authentication = on, deletion_protection = true.
- google_sql_database.app, google_sql_user.app (password auth), random_password.db.
secrets.tf:
- google_secret_manager_secret.db_password + version from random_password.db.
- google_secret_manager_secret.django_secret_key + version from random_password.django (length 64).
- secretAccessor IAM binding for jobs-runtime SA on both.
cloudrun_jobs.tf:
- google_service_account.jobs_runtime (account_id = "jobs-runtime").
- roles/cloudsql.client on jobs_runtime.
- locals.jobs map: { migrate = { args = ["migrate"], schedule = null }, example_stats = { args = ["run_job", "example_stats"], schedule = "0 * * * *" } }.
- google_cloud_run_v2_job.this with for_each = local.jobs:
  - template.template.service_account = jobs_runtime.email.
  - template.template.cloud_sql_instances = [google_sql_database_instance.main.connection_name] — provides /cloudsql/<conn> unix socket.
  - containers.image = "${region}-docker.pkg.dev/${project}/jobs/app:${var.jobs_image_tag}".
  - containers.command = ["python", "manage.py"], containers.args = each.value.args.
  - Env: DJANGO_SETTINGS_MODULE=config.settings.cloud, GCP_PROJECT_ID, DB_INSTANCE_CONNECTION_NAME, DB_NAME, DB_USER; DB_PASSWORD and DJANGO_SECRET_KEY via value_source.secret_key_ref.
  - lifecycle.ignore_changes = [template[0].template[0].containers[0].image] so gcloud run jobs update from CI doesn’t fight tofu.
scheduler.tf:
- google_service_account.scheduler + google_cloud_run_v2_job_iam_member.scheduler_invoker (roles/run.invoker) per scheduled job.
- google_cloud_scheduler_job.this with for_each = { for k, v in local.jobs : k => v if v.schedule != null }.
- http_target.uri = "https://${region}-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/${project}/jobs/jobs-${each.key}:run".
- oauth_token (not oidc_token) since the target is *.googleapis.com — common pitfall.

2. Python project under `jobs/`

Tree

jobs/
├── Dockerfile
├── .dockerignore
├── pyproject.toml
├── manage.py
├── config/
│   ├── settings/{__init__,base,local,cloud}.py
│   ├── urls.py          # empty list today; ready for views later
│   ├── wsgi.py
│   └── asgi.py
├── core/
│   ├── apps.py
│   ├── models.py
│   ├── admin.py
│   ├── migrations/
│   └── management/commands/run_job.py
└── jobs_pkg/
    ├── registry.py      # name -> callable
    └── example_stats.py

Key points

Jobs are Django management commands, not standalone scripts. manage.py runs django.setup() automatically — no bare-script init needed.
run_job is a one-line dispatcher: JOBS[name](). Adding a new cron job = a new module in jobs_pkg/, a line in registry.py, a line in local.jobs in tofu.
config/urls.py exists but is empty today. Adding web views later = no restructure.
INSTALLED_APPS includes django.contrib.{contenttypes,auth} + core from day one, so future web migrations stay clean.
DATABASES["default"] reads env vars; on Cloud Run DB_HOST=/cloudsql/${DB_INSTANCE_CONNECTION_NAME} (unix socket via the platform proxy).
pyproject.toml deps: django>=5,<6, psycopg[binary]>=3.2, httpx. Dev: ruff, pytest, pytest-django.

Example job (`jobs_pkg/example_stats.py`)

Fetches top 30 Hacker News story IDs hourly, stores them in HnTopStory (fields: captured_at, rank, item_id, title, score, url). No API keys, idempotent per timestamp — a plausible “what was HN doing when I posted this” stat feed for the personal site. Demonstrates HTTP fetch + bulk ORM insert.

Dockerfile

python:3.12-slim, non-root user, pip install --no-cache-dir ., ENTRYPOINT ["python", "manage.py"]. No Cloud SQL Auth Proxy in the image — Cloud Run Jobs gen2 provides the socket.

3. Connectivity choice

Cloud SQL public IP + cloud_sql_instances unix socket on the Cloud Run Job. GCP’s managed proxy authenticates via IAM; no public ingress to Postgres even though the instance has a public IP. Saves ~$10/mo vs. a Serverless VPC Connector.

Django uses the built-in jobs user + Secret Manager password — IAM DB auth requires a custom psycopg connection factory in Django, not worth the complexity today. The cloudsql.iam_authentication flag is on so you can switch later.

4. GitHub Actions: `.github/workflows/deploy-jobs.yml`

Trigger: push.tags: ['jobs-v*'] + workflow_dispatch. Permissions: id-token: write, contents: read.

Steps:

Checkout, auth via existing WIF (site-deployer SA), setup-gcloud.
gcloud auth configure-docker us-central1-docker.pkg.dev.
Compute IMAGE_TAG (tag → version, or ${GITHUB_SHA} for manual dispatch); tag image as both $TAG and latest.
docker build ./jobs && docker push to Artifact Registry.
gcloud run jobs update jobs-migrate --image $IMAGE:$TAG --region us-central1.
gcloud run jobs execute jobs-migrate --region us-central1 --wait (the --wait is required to surface migration failures as workflow failures).
Loop over cron jobs (jobs-example_stats, …) running the same update.

Do not run tofu apply from this workflow. Infra changes stay manual / separate.

5. Verification

Local

cd jobs
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
docker run --rm -d -p 5432:5432 -e POSTGRES_PASSWORD=dev postgres:15
DJANGO_SETTINGS_MODULE=config.settings.local python manage.py migrate
DJANGO_SETTINGS_MODULE=config.settings.local python manage.py run_job example_stats
DJANGO_SETTINGS_MODULE=config.settings.local python manage.py shell \
    -c "from core.models import HnTopStory; print(HnTopStory.objects.count())"   # expect 30

Cloud bring-up

cd infra && tofu init && tofu apply — provisions DB, secrets, jobs, scheduler.
git tag jobs-v0.1.0 && git push origin jobs-v0.1.0 — runs the workflow.
gcloud run jobs executions list --job jobs-migrate --region us-central1 — confirm success.
gcloud run jobs execute jobs-example_stats --region us-central1 --wait — manual smoke.
gcloud scheduler jobs run run-example_stats --location us-central1 — verify Scheduler → Job wiring.
After first scheduled fire (or temporarily set */5 * * * *): connect via cloud-sql-proxy <conn_name> & then psql "host=127.0.0.1 user=jobs dbname=jobsdb" → SELECT count(*), max(captured_at) FROM core_hntopstory;.

6. Critical files

Create:

infra/{artifact_registry,cloudsql,secrets,cloudrun_jobs,scheduler}.tf
jobs/Dockerfile
jobs/pyproject.toml
jobs/manage.py
jobs/config/settings/{base,local,cloud}.py
jobs/config/urls.py
jobs/core/models.py
jobs/core/management/commands/run_job.py
jobs/jobs_pkg/{registry,example_stats}.py
.github/workflows/deploy-jobs.yml

Modify:

infra/apis.tf — extend local.required_services.
infra/variables.tf — 4 new vars.
infra/cicd.tf — 3 IAM bindings for site-deployer.
infra/outputs.tf — 3 new outputs.
.gitignore — Python artifacts.

7. Gotchas

Cloud Scheduler → Cloud Run Jobs uses oauth_token, not oidc_token (target host is *.googleapis.com, not the run.app URL).
--wait on gcloud run jobs execute is what propagates the exit code; without it a failing migration silently passes CI.
lifecycle.ignore_changes on the job image lets tofu and gcloud run jobs update coexist without drift fights.
cloud_sql_instances on the Cloud Run Job provides /cloudsql/<connection_name> automatically; do not also embed the Cloud SQL Auth Proxy in the Dockerfile.
db-f1-micro is not available in every region/Postgres-version combo — db_tier is a variable so you can swap to db-g1-small without code changes.

Context#

1. OpenTofu additions under infra/#

Modify#

Create#

2. Python project under jobs/#

Tree#

Key points#

Example job (jobs_pkg/example_stats.py)#

Dockerfile#

3. Connectivity choice#

4. GitHub Actions: .github/workflows/deploy-jobs.yml#

5. Verification#

Local#

Cloud bring-up#

6. Critical files#

7. Gotchas#