AWS Glue – Querying Nested JSON with Relationalize Transform

Posted 4 CommentsPosted in AWS, AWS Glue

AWS Glue has transform Relationalize that can convert nested JSON into columns that you can then write to S3 or import into relational databases. As an example – Initial Schema: >>> df.printSchema() root |– Id: string (nullable = true) |– LastUpdated: long (nullable = true) |– LastUpdatedBy: string (nullable = true) |– Properties: struct (nullable […]

Redshift: Convert TEXT to Timestamp

Posted Leave a commentPosted in AWS, Redshift

How do you convert TEXT to timestamp in redshift? If the score column has data in given format, how can you display the timestamp. {“Choices”:null, “timestamp”:”1579650266955″, “scaledScore”:null} select cast(json_extract_path_text(score, ‘timestamp’) as timestamp) from schema.table limit 10; This sql will fail with — ERROR: Invalid data DETAIL: ———————————————– error: Invalid data code: 8001 context: Invalid format […]

Athena: Extracting data from JSON

Posted Leave a commentPosted in AWS, AWS Athena

Suppose you have a table in Athena and its column contain JSON data. How can you extract the individual keys? In the example, the table has column “fixedproperties” which contain JSON data – How can you display the data is below format? select json_extract(fixedproperties, ‘$.objectId’) as object_id, json_extract(fixedproperties, ‘$.custId’) as cust_id, json_extract(fixedproperties, ‘$.score’) as score […]

S3 – fatal error: An error occurred (404) when calling the HeadObject operation

Posted Leave a commentPosted in AWS, S3

Make sure to use –recursive parameter. [desktop: test]:${PWD}> aws s3 cp s3://demo-beta/dw/user/dt=2019-07-30/ /tmp/dw/ fatal error: An error occurred (404) when calling the HeadObject operation: Key “dw/user/dt=2019-07-30/” does not exist [desktop: test]:${PWD}> aws s3 cp s3://demo-beta/dw/user/dt=2019-07-30/ /tmp/dw/ –recursive download: s3://demo-beta/dw/user/dt=2019-07-30/part-00002-fd866c-238-489-a44-739f1d04-c000.snappy.parquet to ../../../tmp/dw/part-00002-fd866c-238-489-a44-739f1d04-c000.snappy.parquet From Documentation – –recursive (boolean) Command is performed on all files or objects under […]

Expanding array to multiple rows – Athena

Posted 1 CommentPosted in AWS, AWS Athena

A single row in Athena table is stored as — select id, course, date from demo.course_tab where id=’1234567892′ id course date 1234567892 [95c3c1bc5873, 2e345b2eb678, 027b02599f4a, 8695a580520b, 5d453355d415, cdcc7682070b] 2019-06-13 The datatype for course column is array(string). Now, how can you get the output in below format – id course date 1 1234567892 95c3c1bc5873 2019-06-13 2 […]