Apache Iceberg with PySpark

In a previous post I have described how to use Apache Iceberg table format with Apache Spark using Scala. I will now describe how to do it with PySpark.

Exploring Apache Iceberg with Spark

Apache Iceberg is a new table format for storing large and slow moving tabular data on cloud data lakes like S3 or Cloud Storage. It was developed at Netflix and was then incubated at the Apache Foundation.

AWS Glue and S3A committers

I recently worked on an AWS Glue job written in Python. AWS Glue being a managed service on top of Apache Spark framework.

Revue de 2019

L’année 2019 est bientôt finie et je voulais faire un petit résumé de cette année que j’ai trouvé particulièrement remplie et où j’ai vécu de nombreux challenges. Parler en public...

Scala 2 implicit conversions

Scala is a statically typed language, meaning that Scala compiler will know types of every variable we use in our programs at compile time. To play well with compiler we...