AWS Glue and PySpark Guide

Posted 1 CommentPosted in AWS, AWS Glue

In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of creating AWS pipeline and writing AWS Glue PySpark scripts. AWS Glue is a fully managed extract, transform, and load (ETL) service to process large amount of datasets from various sources for analytics and data processing. While […]

Expanding array to multiple rows – Athena

Posted 1 CommentPosted in AWS, AWS Athena

A single row in Athena table is stored as — select id, course, date from demo.course_tab where id=’1234567892′ id course date 1234567892 [95c3c1bc5873, 2e345b2eb678, 027b02599f4a, 8695a580520b, 5d453355d415, cdcc7682070b] 2019-06-13 The datatype for course column is array(string). Now, how can you get the output in below format – id course date 1 1234567892 95c3c1bc5873 2019-06-13 2 […]