AWS Glue and PySpark Guide

AWS

In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of creating AWS pipeline and writing AWS Glue PySpark scripts. AWS Glue is a fully managed extract, transform, and load (ETL) service to process large amount of datasets from various sources for analytics and data processing. While […]

Read more >

Filtering using Events Patterns – EventBridge

AWS

Amazon EventBridge as the name suggest is a serverless pub/sub allowing applications to connect via an “event bus”. It helps build loosely coupled and distributed event driven architecture. EventBridge was formerly called CloudWatch Events. In this blog, I will give an example of setting filter based event pattern in Amazon EventBridge to send SNS notification. […]

Read more >

AWS Athena – DML Queries

AWS

You can learn something new everyday, and today I learned that AWS Athena supports INSERT INTO queries. Lets create table based on marvel_superheroes using CTAS command – Creating the table partition based on “year” failed with : HIVE_COLUMN_ORDER_MISMATCH: Partition keys must be the last columns in the table and in the same order as the […]

Read more >

Aurora MySQL – Export data to S3

AWS

Using SELECT INTO OUTFILE S3 you can query data from an Aurora MySQL DB cluster and save it directly into text files stored in S3 bucket. 1. Create an IAM policy for S3. { “Version”: “2012-10-17”, “Statement”: [ { “Sid”: “VisualEditor0”, “Effect”: “Allow”, “Action”: [ “s3:DeleteObject”, “s3:GetBucketLocation”, “s3:GetObject”, “s3:ListBucket”, “s3:ListBucketMultipartUploads”, “s3:PutObject” ], “Resource”: [ “arn:aws:s3:::bucket-name”, […]

Read more >