The Best Library For Building Data Pipelines

Building data pipelines with #python is an important skill for data engineers and data scientists. But what's the best library to use? In this video we look at three options: pandas, polars, and spark (pyspark).

Timeline:
00:00 Data Pipelines
01:11 The Data
02:32 Pandas
04:34 Polars
06:15 PySpark
09:15 Spark SQL

Follow me on twitch for live coding streams: twitch.tv/medallionstallion_

My other videos:

Speed Up Your Pandas Code: youtube.com/watch?v=SAFmrTnEHLg
Intro to Pandas video: youtube.com/watch?v=_Eb0utIRdkw
Exploratory Data Analysis Video: youtube.com/watch?v=xi0vhXFPegw

Working with Audio data in Python: youtube.com/watch?v=ZqpSb5p1xQo
Efficient Pandas Dataframes: youtube.com/watch?v=u4_c2LDi4b8

* Youtube: youtube.com/@robmulla?sub_confirmation=1
* Discord: discord.gg/HZszek7DQc
* Twitch: twitch.tv/medallionstallion_
* Twitter: twitter.com/Rob_Mulla
* Kaggle: kaggle.com/robikscube

#python #polars #spark #dataengineering

  • The BEST library for building Data Pipelines... ( Download)
  • What is Data Pipeline | Why Is It So Popular ( Download)
  • dbt + DuckDB vs Spark: What's the best way to build data pipelines in 2023 ( Download)
  • Data Pipelines Explained ( Download)
  • Data pipelines should be simple! ( Download)
  • Best Practices for Building and Deploying Data Pipelines in Apache Spark - Vicky Avison ( Download)
  • What the HECK is a “Data Pipeline” 👩🏻‍🔧📊🪠 ( Download)
  • Database vs Data Warehouse vs Data Lake | What is the Difference ( Download)
  • Application Working with Data Science | AIML End-to-End Session 66 ( Download)
  • Marco Bonzanini - Building Data Pipelines in Python ( Download)
  • Marco Bonzarini - Building data pipelines in python ( Download)
  • How to build an ETL pipeline with Python | Data pipeline | Export from SQL Server to PostgreSQL ( Download)
  • Data Pipelines: Introduction to Streaming Data Pipelines ( Download)
  • How to Build Data Pipelines for ML Projects (w/ Python Code) ( Download)
  • Martin Trautmann - pydiverse pipedag: A library for data pipeline orchestration ( Download)