Basic ETL using Python, Big Query, Data Studio & Airflow
Contents
ETL Diagram of my first project
Idea:
Get data raw data from Austin Crime, transform it, store the data in the cloud and utilize a visualization too to properly present the data.
List of Technology used in this project are:
- Visual Studio
- Python Pandas
- Big Query
- Data Studio
- Airflow

1. Library Imports

2. Extraction: API containing the crime data from Austin, Texas.

3. Transformation: Used Visual Studio & Pandas.
Transformation: Rename of Columns

Transformation: Change format of “Date Ocurred” from military time, to standard time

Transformation: Change format of “Date Reported” from military time, to standard time

4. Load: Upload data into Big Query.

5. Airflow: Preferred scheduler (and wanted to learn the application)
Useful: Aiflow documentation
Defaulting arguments

Declaring DAGs

Setting up Task [] = Makes the task run parallel

Setting up dependencies

Airflow DAG: etl_workflow Graph

6. Big Query Table: Our preferred data warehouse
Table Schema

Table data (example)

7. Data Studio: Our preferred visualization tool
