Durable Data Discovery Making Exploratory Analysis Stick James Campbell Mlops Meetup 86

MLOps Community Meetup #86! Last Wednesday we talked to James Campbell, Chief Technology Officer of Superconductive.

//Abstract
Building an effective ML pipeline requires understanding the data available to you and how it's changing. Exploring a new dataset is often an iterative, interactive process that gives the engineer doing it tremendous insight into the underlying data generating processes and the pipelines that have touched it. Yet too often, those insights are lost when a system goes into production or after internal handoff between teams.

We'll talk about how to capture Exploratory Data Analysis done when first working with a dataset. With a clear understanding of what data characteristics were important in crafting a dataset, it becomes possible to collaborate on and share clear expectations about the true differentiator in ML pipelines -- the data that fuels them.

// Bio
James Campbell is the CTO at Superconductive, the company behind the open-source data quality project Great Expectations, which he co-founded in 2017. Prior to that, he spent nearly 15 years working across a variety of quantitative and qualitative analytic roles in the US intelligence community.

James studied Mathematics and Philosophy at Yale and is passionate about creating tools that help communicate uncertainty and build intuition about complex systems.

// Related links
Team slack - greatexpectations.io/slack
Job application - jobs.superconductive.com

----------- ✌️Connect With Us ✌️-------------
Join our Slack community: go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: go.mlops.community/register
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: mlops.community/

Connect with Demetrios on LinkedIn: linkedin.com/in/dpbrinkm/
Connect with James on LinkedIn: linkedin.com/in/jpcampbell42/

Timestamps:
[00:00] Introduction to James Campbell
[02:57] Durable Data Discovery
[03:25] Superconductive is hiring!
[04:11] Agenda
[05:00] Feedback (so workflows) are central to "ops"
[06:22] Great Expectations
[06:59] Great Expectations brings automated testing to data
[08:45] 4 Great Expectations characteristics
1. Expectations can be deployed directly into existing infrastructure.
2. Expectations compile directly to human-readable documentation.
3. The language of Expectations can be extended into specific data domains.
4. Automated Profilers embody checklists of questions enabling new modes of
collaboration.
[10:42] A Data Asset lives at an important intersection
[13:20] Data Asset carry heritable assumptions
[18:27] Attention makes batches
[21:29] Purpose drives process
[24:14] Demo: Configuring a Data Asset with Monthly Batches
[32:28] We have a batch!
[32:41] Exploratory Data Analysis is a way of asking questions
[33:30] The Cutting Room Floor
[34:39] Rule-based profilers: Defining Custom Workflows
[37:56] Getting rid of comments
[42:03] Great Expectations
[50:10] Role of a data scientist using Great Expectations
[53:46] Differentiating factors of Great Expectations

  • Durable Data Discovery: Making Exploratory Analysis Stick // James Campbell // MLOps Meetup #86 ( Download)
  • ML Drift - How to Identify Issues Before They Become Problems // Amy Hodler // MLOps Meetup #89 ( Download)
  • MLOps Challenges and Adoption Insights w/ NTT Data | Product Days 2021 ( Download)
  • Modern ML Stack is a Lie ( Download)
  • How To Easily Use Data Discovery | Data Preprocessing In Python Episode 1 ( Download)
  • DC_THURS on Great Expectations w/ James Campbell ( Download)