Site Impact - Big Data Pipeline
Customer Success Story
Role in Project
Site Impact needed a way to modernize and scale-out their data management procedures. We were tasked with architecting a cloud-run platform, built to scale their business operations by presenting normalized data for better analysis.
The Challenge
Site Impact reached out to us to discuss the ability of tackling uptime and scale problems faced with their current data management platform. Their current workflow for leveraging the data was also a piecemeal approach and consisted of a lot of manual data manipulations.
The Solution
After assessing the project during a Discovery phase, we were able to extract an MVP in order to produce what was desired. We focused heavily on Data Science and built a multi-functional data pipeline that allowed the client to provide data to run through standardizing and deduping processes that allowed anyone in their organization to analyze the exported data wherever they desire.
The Results
Key Technologies: BigQuery – Ability to execute efficient SQL queries on tables 400GB large with hundreds of millions of rows of data, some tables spanning 600+ columns. Composer – Airflow provides a dependency-driven ETL pipeline which runs all needed manipulations and automatically presents the data up to BigQuery. Dataproc – Pyspark code utilizing dataproc’s compute, built to handle PB’s of data.
We focused heavily on Data Science and built a multi-functional data pipeline that allowed the client to provide data to run through standardizing and deduping processes in real time or batch form.
About Site Impact, LLC
Site Impact is one of the leading providers in
data and marketing resources. Specializing in
multi-channel direct marketing services.
Industry: Advertising & Marketing
Primary project location: United States