AWSAWSDataWrangler

Rename Glue Tables using AWS Data Wrangler

I had a use case of renaming over 50 tables, adding “prod_” prefix to the existing Glue tables.

AWS Athena does not support native Hive DDL “ALTER TABLE table_name RENAME TO” command. So one of the option was to –

  1. “Generate Create Table DDL” in AWS Athena.
  2. Modify the table name.
  3. Execute the DDL.
  4. Preview the new table.
  5. Drop the old table

Following the above steps for over 50 tables would have been cumbersome. This is where AWS Data Wrangler comes to rescue πŸ™‚

Lets take a look on how I did it. All the commands were executed from AWS Sagemaker Notebook –

import awswrangler as wr

#get all the tables for database from glue catalog
tables = wr.catalog.get_tables(database='mydb')

#create list of tables
table_details = []
for table in tables:
    table_details.append(table)

#create the new table with "prod_" prefix and drop original table
for table_detail in table_details:
    db = table_detail['DatabaseName']
    orig_tab_name = table_detail['Name']
    new_table_name = 'prod_'+ orig_tab_name
    table_loc = table_detail['StorageDescriptor']['Location']
    table_columns = wr.catalog.get_table_types(database=db, table=orig_tab_name)
    parameters = table_detail['StorageDescriptor']['Parameters']
    print('\n')
    print(f'Create table: {new_table_name}, location: {table_loc}')
    wr.catalog.create_parquet_table(
        database=db,
        table=new_table_name,
        path=table_loc,
        columns_types=table_columns,
        parameters=parameters
    )
    status = wr.catalog.delete_table_if_exists(database=db, table=orig_tab_name) 
    print(f'Drop table: {orig_tab_name} from database: {db}, status: {status}')

Quick walkthrough of the above code –

  1. Import AWS Data Wrangler.
  2. Get all the table details using wr.catalog.get_tables
  3. Add the tables to the list.
  4. For each table in the table list –
    • Get the database name.
    • Get the original table name.
    • Get the table location.
    • Get the table columns as dictionary using wr.catalog.get_table_types.
    • Get the table properties.
    • Create the table using wr.catalog.create_parquet_table with “prod_” as prefix.
    • Drop the original table.

One thing to note is the original table has to be Parquet table in AWS Glue catalog.

Reference –

https://aws-data-wrangler.readthedocs.io/en/stable/stubs/awswrangler.catalog.create_parquet_table.html#awswrangler.catalog.create_parquet_table

One thought on “Rename Glue Tables using AWS Data Wrangler

Leave a Reply