Getting Started with Dropbox, Fivetran and Snowflake

2021.04.14

(この記事の日本語版はこちらから)

Background:

Fivetran as a robust modern data stack tool, offers a wide variety of native connectors to create a data pipeline for your end needs. In this post, we will go through the Dropbox connector.

The purpose here is to establish a connection between Fivetran and Dropbox, and finally loading the data into Snowflake for warehousing and transformations. A graphic representation is provided below.

The file types that can be connected from Box are currently limited to: Separated Value Files (such as CSV, TSV, etc.), JSON text files delimited by new lines, JSON Arrays, Avro and Parquet. For compatibility purposes, the data encoding for these files should be either: UTF-8, UTF-16, or UTF-32, with big or little endian order. UTF-8 encoding is automatically assigned if no Byte-Order Mark is present at the beginning of a file.

How to Achieve:

Initially, make sure you have connected the Fivetran account to Snowflake as a destination. It can be achieved as shown here in detail.

Login in to your Fivetran instance, and click on the “Add Connector” button. From the list of several connectors search for “Dropbox” and click on the desired service as shown below.

A connection menu will be loaded as shown below, enter the destination schema name and table name as desired and click on the “Authorize” button.

An authorization window will take you towards the actual authorization.

After a successful authorization is validated, the connection will be established and Fivetran will ask for a detailed setup for the new connection as shown below. In this detailed settings, user can choose: Folder / subfolder path, File name’s pattern, File type, whether decompression is needed to process those files, how to handle file that have loading error, how to handle a modified file if it contains new data, specify a file’s delimiter, escape certain characters from file, whether to skip header lines or footer lines, how to handle a file with no headers etc.

Once all the preferences are set, click on the “Save & Test” button to test the connection. Fivetran will then verify all the parameters and establish the connection.

In the next page, you can view the Fivetran connector window, here you can click the “Start Initial Sync” button to load the necessary data from Dropbox.

Once the synchronization is completed in Fivetran, you can navigate to Snowflake and validate the connection and data loaded into Snowflake as shown below.

Downstream data processing and reporting can now be efficiently accomplished in Snowflake (or another tool of your choice). Fivetran works as an ELT platform, and it will Extract and Load the data into the warehouse, where it can be later Transformed.

Summary:

In this post, we saw how data from Dropbox can be easily connected using Fivetran.