Getting Started with Github, Fivetran and Snowflake

2021.04.19

(この記事の日本語版はこちらから)

Background:

Fivetran as a robust modern data stack tool, offers a wide variety of native connectors to create a data pipeline for your end needs. In this post, we will go through the Github connector.

The purpose here is to establish a connection between Fivetran and Github, and finally loading the data into Snowflake for warehousing and transformations. A graphic representation is provided below.

Data such as commits, users, pull-requests, etc. can be imported and the destination schema is described in this page.

How to Achieve:

Initially, make sure you have connected the Fivetran account to Snowflake as a destination. It can be achieved as shown here in detail.

Login in to your Fivetran instance, and click on the “Add Connector” button. From the list of several connectors search for “Github” and click on the desired service as shown below.

A connection menu will be loaded as shown below, enter the destination schema name and click the “Authorize” button to proceed.

An authorization window will take you towards the actual authorization process and provide the description of data to be imported.

A password window will ask for your Github authentication.

After a successful authorization is validated, the connection will be established and Fivetran will ask for a detailed setup for the new connection as shown below. In this detailed settings users can either choose to either synchronize all repositories or certain specific repositories.

Once the sync mode is confirmed, click on the “Save & Test” button to test the connection. Fivetran will then verify all the parameters and establish the connection.

In the next page, you can view the Fivetran connector window, here you can click the “Start Initial Sync” button to load the necessary data from Github.

Once the synchronization is completed in Fivetran, you can navigate to Snowflake and validate the data connection as well as the data loaded into Snowflake as shown below.

Downstream data processing can now be efficiently accomplished in Snowflake (or another tool of your choice). Fivetran works as an ELT platform, at first it will Extract and Load the data into the warehouse, where it can be later Transformed.

Summary:

In this post, we saw how data from Github can be easily connected using Fivetran.