May 18, 2025

ParquetDiff: a lightweight tool to compare Parquet Schemas

ParquetDiff is a small utility to identify differences between Parquet schemas.

May 16, 2023

Apache Iceberg with PySpark

In a previous post I have described how to use Apache Iceberg table format with Apache Spark using Scala. I will now describe how to do it with PySpark.

January 24, 2023

Exploring Apache Iceberg with Spark

Apache Iceberg is a new table format for storing large and slow moving tabular data on cloud data lakes like S3 or Cloud Storage. It was developed at Netflix and was then incubated at the Apache Foundation.

June 21, 2022

AWS Glue and S3A committers

I recently worked on an AWS Glue job written in Python. AWS Glue being a managed service on top of Apache Spark framework.

December 20, 2019

Revue de 2019

L’année 2019 est bientôt finie et je voulais faire un petit résumé de cette année que j’ai trouvé particulièrement remplie et où j’ai vécu de nombreux challenges. Parler en public...