AWS Glue and PySpark Guide

Posted 1 CommentPosted in AWS, AWS Glue

In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of creating AWS pipeline and writing AWS Glue PySpark scripts. AWS Glue is a fully managed extract, transform, and load (ETL) service to process large amount of datasets from various sources for analytics and data processing. While […]

Filtering using Events Patterns – EventBridge

Posted Leave a commentPosted in AWS, EventBridge

Amazon EventBridge as the name suggest is a serverless pub/sub allowing applications to connect via an “event bus”. It helps build loosely coupled and distributed event driven architecture. EventBridge was formerly called CloudWatch Events. In this blog, I will give an example of setting filter based event pattern in Amazon EventBridge to send SNS notification. […]

AWS Glue – Querying Nested JSON with Relationalize Transform

Posted 4 CommentsPosted in AWS, AWS Glue

AWS Glue has transform Relationalize that can convert nested JSON into columns that you can then write to S3 or import into relational databases. As an example – Initial Schema: >>> df.printSchema() root |– Id: string (nullable = true) |– LastUpdated: long (nullable = true) |– LastUpdatedBy: string (nullable = true) |– Properties: struct (nullable […]