Redshift: Convert TEXT to Timestamp

AWS

How do you convert TEXT to timestamp in redshift? If the score column has data in given format, how can you display the timestamp. {“Choices”:null, “timestamp”:”1579650266955″, “scaledScore”:null} select cast(json_extract_path_text(score, ‘timestamp’) as timestamp) from schema.table limit 10; This sql will fail with — ERROR: Invalid data DETAIL: ———————————————– error: Invalid data code: 8001 context: Invalid format […]

Read more >

Usecase with RDS Snapshot Export to S3

AWS

AWS recently announced “Amazon RDS Snapshot Export to S3” feature wherein you can now export Amazon Relational Database Service (Amazon RDS) or Amazon Aurora snapshots to Amazon S3 as Apache Parquet, an efficient open columnar storage format for analytics. I had a use-case to refresh Athena tables daily with full data set in Account B(us-east-1) […]

Read more >

Athena: Extracting data from JSON

AWS

Suppose you have a table in Athena and its column contain JSON data. How can you extract the individual keys? In the example, the table has column “fixedproperties” which contain JSON data – How can you display the data is below format? select json_extract(fixedproperties, ‘$.objectId’) as object_id, json_extract(fixedproperties, ‘$.custId’) as cust_id, json_extract(fixedproperties, ‘$.score’) as score […]

Read more >

Athena – SQL to get date of next Monday

AWS

I was recently asked how to get date of next Monday irrespective of which day of the week sql is executed. So thought to share it, in-case someone else has such requirement. select date_add(‘day’, 8 – extract(day_of_week from current_date), current_date) Or, select date_trunc(‘week’, current_date) + interval ‘7’ day; Happy learning πŸ™‚

Read more >

Pandas Scratchpad – I

DataScience

This blog is scratchpad for day-to-day Pandas commands. pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 1. Few quick ways to create Pandas DataFrame DataFrame from Dict of List – DataFrame from List of List – DataFrame from List of Dict – DataFrame […]

Read more >

Merge json files using Pandas

Coding

Quick demo for merging multiple json files using Pandas – import pandas as pd import glob import json file_list = glob.glob(“*.json”) >>> file_list [‘b.json’, ‘c.json’, ‘a.json’] Use enumerate to assign counter to files. allFilesDict = {v:k for v, k in enumerate(file_list, 1)} >>> allFilesDict {1: ‘b.json’, 2: ‘c.json’, 3: ‘a.json’} Append the data into list […]

Read more >

Pandas – ValueError: If using all scalar values, you must pass an index

Pandas

Reading json file using Pandas read_json can fail with “ValueError: If using all scalar values, you must pass an index”. Let see with an example – cat a.json { “creator”: “CaptainAmerica”, “last_modifier”: “NickFury”, “title”: “Captain America: The First Avenger”, “view_count”: 12000 } >>> import pandas as pd >>> import glob >>> for f in glob.glob(‘*.json’): […]

Read more >

S3 – fatal error: An error occurred (404) when calling the HeadObject operation

AWS

Make sure to use –recursive parameter. [desktop: test]:${PWD}> aws s3 cp s3://demo-beta/dw/user/dt=2019-07-30/ /tmp/dw/ fatal error: An error occurred (404) when calling the HeadObject operation: Key “dw/user/dt=2019-07-30/” does not exist [desktop: test]:${PWD}> aws s3 cp s3://demo-beta/dw/user/dt=2019-07-30/ /tmp/dw/ –recursive download: s3://demo-beta/dw/user/dt=2019-07-30/part-00002-fd866c-238-489-a44-739f1d04-c000.snappy.parquet to ../../../tmp/dw/part-00002-fd866c-238-489-a44-739f1d04-c000.snappy.parquet From Documentation – –recursive (boolean) Command is performed on all files or objects under […]

Read more >

Python – sort() vs sorted(list)

Coding

You can compare list using sort() or sorted(list), but be careful with sort() – >>> c = [(‘d’,4), (‘c’,3), (‘a’,1), (‘b’, 2)] >>> a = [(‘a’,1), (‘b’, 2), (‘c’,3), (‘d’,4)] >>> a.sort() == c.sort() True >>> >>> a = [(‘a’,1), (‘b’, 2), (‘c’,3), (‘d’,4)] >>> b = [(‘b’,2), (‘c’, 3), (‘a’,1)] >>> >>> a.sort() == […]

Read more >