I had a use case of renaming over 50 tables, adding “prod_” prefix to the existing Glue tables.
AWS Athena does not support native Hive DDL “ALTER TABLE table_name RENAME TO” command. So one of the option was to –
- “Generate Create Table DDL” in AWS Athena.
- Modify the table name.
- Execute the DDL.
- Preview the new table.
- Drop the old table
Following the above steps for over 50 tables would have been cumbersome. This is where AWS Data Wrangler comes to rescue 🙂
Lets take a look on how I did it. All the commands were executed from AWS Sagemaker Notebook –
import awswrangler as wr
#get all the tables for database from glue catalog
tables = wr.catalog.get_tables(database='mydb')
#create list of tables
table_details = []
for table in tables:
table_details.append(table)
#create the new table with "prod_" prefix and drop original table
for table_detail in table_details:
db = table_detail['DatabaseName']
orig_tab_name = table_detail['Name']
new_table_name = 'prod_'+ orig_tab_name
table_loc = table_detail['StorageDescriptor']['Location']
table_columns = wr.catalog.get_table_types(database=db, table=orig_tab_name)
parameters = table_detail['StorageDescriptor']['Parameters']
print('\n')
print(f'Create table: {new_table_name}, location: {table_loc}')
wr.catalog.create_parquet_table(
database=db,
table=new_table_name,
path=table_loc,
columns_types=table_columns,
parameters=parameters
)
status = wr.catalog.delete_table_if_exists(database=db, table=orig_tab_name)
print(f'Drop table: {orig_tab_name} from database: {db}, status: {status}')
Quick walkthrough of the above code –
- Import AWS Data Wrangler.
- Get all the table details using wr.catalog.get_tables
- Add the tables to the list.
- For each table in the table list –
- Get the database name.
- Get the original table name.
- Get the table location.
- Get the table columns as dictionary using wr.catalog.get_table_types.
- Get the table properties.
- Create the table using wr.catalog.create_parquet_table with “prod_” as prefix.
- Drop the original table.
One thing to note is the original table has to be Parquet table in AWS Glue catalog.
Reference –
One thought on “Rename Glue Tables using AWS Data Wrangler”
Great post, I only suggest to do a MSCK REPAIR TABLE as the final step if it the table has partitions.