Summary
The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.
Announcements
• Hello and welcome to the Data Engineering Podcast, the show about modern data management
• Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro ( dataengineeringpodcast.com/miro) . That’s three free boards at dataengineeringpodcast.com/miro ( dataengineeringpodcast.com/miro) .
• Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack ( dataengineeringpodcast.com/rudderstack)
• You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize ( dataengineeringpodcast.com/materialize) today to get 2 weeks free!
• Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst ( dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
• Your host is Tobias Macey and today I'm interviewing Dustin Dorsey and Cameron Cyr about how to design your dbt projects
Interview
• Introduction
• How did you get involved in the area of data management?
• What was your path to adoption of dbt?
• What did you use prior to its existence?
• When/why/how did you start using it?
• What are some of the common challenges that teams experience when getting started with dbt?
• How does prior experience in analytics and/or software engineering impact those outcomes?
• You recently wrote a book to give a crash course in best practices for dbt. What motivated you to invest that time and effort?
• What new lessons did you learn about dbt in the process of writing the book?
• The introduction of dbt is largely responsible for catalyzing the growth of "analytics engineering". As practitioners in the space, what do you see as the net result of that trend?
• What are the lessons that we all need to invest in independent of the tool?
• For some...
- Unlocking Your dbt Projects With Practical Advice For Practitioners ( Download)
- From slow to swift: Proven methods for optimizing your dbt project - Coalesce 2023 ( Download)
- Beyond 10+ dbt projects: Leveraging automation and avoiding chaos - Coalesce 2023 ( Download)
- Coalesce 2024: Optimize your dbt pipelines in Snowflake instantly ( Download)
- Developing on dbt Cloud ( Download)
- Building Mature dbt projects: Dave Connors and Emily Hawkins ( Download)
- Coalesce 2024: Make data analysis effortless for all with dbt Semantic Layer ( Download)
- Unlocking analytics engineering at scale ( Download)
- Excel at nothing: How to be an effective generalist ( Download)
- CSP 2023 Family Webinar 4 How to Talk About Treatment ( Download)
- Adapting data at the speed of business with Sigma & dbt ( Download)
- 3-Minute Stress Management: Reduce Stress With This Short Activity ( Download)
- Coalesce 2024: Automating migration with AI: How to convert and validate a migration to dbt at scale ( Download)
- Practical Tips to Get Started with Technical Blogging ( Download)
- Nobody puts metrics in a corner: How to activate your dbt models ( Download)