Data Engineering Patterns (DEP)
Understanding Patterns: Generally repeated, identifiable designs or practices.
Welcome to the chapter on Data Engineering Patterns (DEP). This chapter builds on the bespoke convergent evolutions from previous chapters, illustrated by the Overall Graph of convergent evolution to pattern and ultimate design patterns. Here, we explore how DEPs recognize patterns, features, and general implementation, fitting common practices and procedures.
Some examples:
- How can data be effectively cached for BI: dbt, materialized views, OLAP, and data virtualization?
- How do I orchestrate my decoupled data stack: cron, python, framework, self-built?
- How do I integrate central master data into my data landscape: Unique customers, products, etc, instead of duplicating them? What about fuzzy join?
- What is my data engineering tool stack? How do I effectively package my data stack: Docker, Kubernetes, and cloud solutions?
- How do I effectively share data across the organization? Export/import files, share with delta live tables, open format/standards
- How do I integrate data assets into my data stack? Assets are stateful; my stack is stateless. What's the best combination?
These patterns can help mitigate the Challenges along the Data Engineering Lifecycle, staying relevant for years and serving as a solid reference for data engineering problems.
In the chapter after this, Data Engineering Design Patterns (DEDP), we'll focus on best practices and higher-level design problems that are systematically repeated in data engineering. Follow the link for a Recap of the differences between these chapters.
Let's dive into the following chapters, where we'll delve deeper into these data engineering patterns.
Join the Discussion