Contents

Basic ETL using Python, Big Query, Data Studio & Airflow

ETL Diagram of my first project

Idea:

Get data raw data from Austin Crime, transform it, store the data in the cloud and utilize a visualization too to properly present the data.

List of Technology used in this project are:

  • Visual Studio
  • Python Pandas
  • Big Query
  • Data Studio
  • Airflow

../etl_diagram.png

1. Library Imports

../Imports.png

2. Extraction: API containing the crime data from Austin, Texas.

../api_extraction.png

3. Transformation: Used Visual Studio & Pandas.

Transformation: Rename of Columns

../transformation_rename.png

Transformation: Change format of “Date Ocurred” from military time, to standard time

../trasformation_date_occured.png

Transformation: Change format of “Date Reported” from military time, to standard time

../transformation_date_reported.png

4. Load: Upload data into Big Query.

../data_load.png

5. Airflow: Preferred scheduler (and wanted to learn the application)

Useful: Aiflow documentation

Defaulting arguments

../airflow_arguments.png

Declaring DAGs

../dags_setup.png

Setting up Task [] = Makes the task run parallel

../task_setup.png

Setting up dependencies

../dependencies_setup.png

Airflow DAG: etl_workflow Graph

../dag_workflow.png

6. Big Query Table: Our preferred data warehouse

Table Schema

../bigquery_schema.png

Table data (example)

../bigquery_data.png

7. Data Studio: Our preferred visualization tool

Dashboard Link

../visualization.png