AWS Glue and PySpark Guide

Posted 1 CommentPosted in AWS, AWS Glue

In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of creating AWS pipeline and writing AWS Glue PySpark scripts. AWS Glue is a fully managed extract, transform, and load (ETL) service to process large amount of datasets from various sources for analytics and data processing. While […]

AWS Glue – Querying Nested JSON with Relationalize Transform

Posted 4 CommentsPosted in AWS, AWS Glue

AWS Glue has transform Relationalize that can convert nested JSON into columns that you can then write to S3 or import into relational databases. As an example – Initial Schema: >>> df.printSchema() root |– Id: string (nullable = true) |– LastUpdated: long (nullable = true) |– LastUpdatedBy: string (nullable = true) |– Properties: struct (nullable […]