Data stack
The data environment at pass Culture is built around a structured pipeline that takes data from collection to real-world use, through processing and distribution.
🔄 Data Collection
Internal sources: from our applications, users, and backend systems.
External sources: sourced from public providers like data.gouv.fr and INSEE.
🧱 Data Processing & Transformation
At the heart of this setup is the Data Engineering team, responsible for processing (ETL) and orchestrating data flows using:
Google Cloud for infrastructure,
Airflow for orchestration,
dbt for data transformation and modeling.
🧭 Data Delivery
Once processed, the data is made available to several downstream services:
Data Analytics: for analysis and reporting (via BigQuery),
Backend: to provide fast access to aggregated statistics (via ClickHouse),
Data Science: to train machine learning models and expose them through APIs (using TensorFlow and Python).
🧑💼 Final Use Cases
The refined data powers multiple concrete use cases:
Internal dashboards (via Metabase),
Partner-facing statistics (via our pro interface),
Personalized recommendations for users in the app.