AWS Glue and PySpark Guide

Posted 1 CommentPosted in AWS, AWS Glue

In this post, I have penned down AWS Glue and PySpark functionalities which can be helpful when thinking of creating AWS pipeline and writing AWS Glue PySpark scripts. AWS Glue is a fully managed extract, transform, and load (ETL) service to process large amount of datasets from various sources for analytics and data processing. While […]

Pandas Scratchpad – I

Posted Leave a commentPosted in DataScience, Pandas

This blog is scratchpad for day-to-day Pandas commands. pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 1. Few quick ways to create Pandas DataFrame DataFrame from Dict of List – DataFrame from List of List – DataFrame from List of Dict – DataFrame […]