Jamie Thomson

Thoughts, about stuff

Schedule a Dataflow job – full terraform example

with 2 comments

First blog post in two and a half years, there’s been a whole pandemic since then!!

I’ve been messing around with GCP Dataflow recently and using GCP Cloud Scheduler to get my Dataflow batch jobs to run. I had a particularly tricky IAM problem I had to solve to make this work (read more on Stack Overflow if you’re interested) and in order to demonstrate the problem I built a repro of the problem using terraform and made it available at https://github.com/jamiet-msm/dataflow-scheduler-permission-problem.

Now that I’ve solved the problem (read my answer on the Stack Overflow post if interested) this repo now serves as a fully fledged example of how to create a working Dataflow job that is scheduled to run using Cloud Scheduler, all created using terraform. It takes care of all the pre-requisites such as:

  • a service account
  • IAM grants
  • bigquery table that the dataflow job loads data into
  • sample data

The README explains how to deploy and subsequently destroy it, you don’t even need terraform installed.

Head over to the repo if this sounds useful and check it out.

Written by Jamiet

December 19, 2021 at 8:52 pm

Posted in Uncategorized

Tagged with , ,

2 Responses

Subscribe to comments with RSS.

  1. Maybe you need to re-badge yourself as GCPjunkie.

    Peter Hanlon

    December 19, 2021 at 9:04 pm

    • I think the SSIS junkie moniker died a long long time ago.

      Jamiet

      December 19, 2021 at 9:18 pm


Leave a comment