AWSAWS Athena

AWS Athena – DML Queries

You can learn something new everyday, and today I learned that AWS Athena supports INSERT INTO queries.

SELECT *
FROM   information_schema.columns
WHERE  table_schema = 'marvel'
       AND table_name = 'marvel_superheroes'

Lets create table based on marvel_superheroes using CTAS command –

# To create non-partitioned table based on where clause
CREATE TABLE "marvel"."marvel_1936"
WITH (
     format = 'JSON',
     external_location = 's3://mybucket/marvel_1936/')
AS SELECT * from "marvel"."marvel_superheroes" where year=1936
# To create partitioned table
CREATE TABLE "marvel"."universe_partitioned"
WITH (
     format = 'JSON',
     external_location = 's3://mybucket/marvel_universe_partitioned/',
     partitioned_by = ARRAY['universe'])
AS SELECT * from "marvel"."marvel_superheroes" 

Creating the table partition based on “year” failed with :

HIVE_COLUMN_ORDER_MISMATCH: Partition keys must be the last columns in the table and in the same order as the table properties: [year].

This can be addressed by specifying the column names and keeping year in the end, instead of writing SELECT * in sql.

# Check partition details
SELECT *
FROM   information_schema.__internal_partitions__
WHERE  table_schema = 'marvel'
       AND table_name = 'universe_partitioned'
ORDER  BY partition_number

The S3 bucket has AWS-KMS default encryption, so the new .gz JSON files created are also encrypted.

SELECT count(*) as row_count from "marvel"."marvel_1936"

INSERT INTO…SELECT

INSERT INTO "marvel"."marvel_1936" SELECT * from "marvel"."marvel_1936"
SELECT count(*) as row_count from "marvel"."marvel_1936"

INSERT INTO…VALUES

INSERT INTO "marvel"."marvel_1936" (id, name, eyecolor, haircolor, gender, year)
VALUES (1, 'Blue Merlin', 'Brown', 'Black', 'Female', 1936), (2, 'Brenda', 'Black', 'White', 'Female', 1936)
SELECT count(*) AS row_count from "marvel"."marvel_1936"
SELECT * FROM "marvel"."marvel_1936" WHERE id in (1,2)

The INSERT operation generated its own .gz files.

On the partitioned table, it works the same way. But, in case you miss to specify the partition column, Athena creates a new partition

INSERT INTO "marvel"."universe_partitioned" (id, name, eyecolor, haircolor, gender, year)
VALUES (2, 'Brenda', 'Black', 'White', 'Female', 1936)

Leave a Reply