To process data in AWS Glue ETL, DataFrame or DynamicFrame is required. A DataFrame is similar to a table and supports functional-style (map/reduce/filter/etc.) along with SQL operations. The AWS Glue DynamicFrame is similar to DataFrame, except that each record is self-describing, so no schema is required initially. It computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.
DynamicFrame can be created using the below options –
- create_dynamic_frame_from_rdd – created from an Apache Spark Resilient Distributed Dataset (RDD)
- create_dynamic_frame_from_catalog – created using a Glue catalog database and table name
- create_dynamic_frame_from_options – created with the specified connection and format. Example – The connection type, such as Amazon S3, Amazon Redshift, and JDBC
This post elaborates on the steps needed to access cross account AWS Glue catalog to create the DynamicFrames using create_dynamic_frame_from_catalog option.
Account A – AWS Glue ETL execution account.
Account B – Data stored in S3 and cataloged in AWS Glue.
- In Account A
- Create an IAM role in Account A to access the destination catalog and attach it to the Glue ETL job. If the job already exists, create a new policy and attach it to the existing role which the job is using.
- The below policy grants access to “marvel” database and all the tables within the database in AWS Glue catalog of Account B.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetConnection",
"glue:GetTable",
"glue:GetPartition",
"glue:GetPartitions"
],
"Resource": [
"arn:aws:glue:us-east-1:ACCOUNT-B:catalog",
"arn:aws:glue:us-east-1:ACCOUNT-B:database/marvel",
"arn:aws:glue:us-east-1:ACCOUNT-B:table/marvel/*"
]
},
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::marvel-glue-catalog-demo",
"arn:aws:s3:::marvel-glue-catalog-demo/*"
]
}
]
}
2. In Account B
- On the AWS Glue page, under Settings add a policy for Glue Data catalog granting table and database access to IAM identities from Account A created in step 1.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::ACCOUNT-A:role/cross_account_glue_role",
"arn:aws:iam::ACCOUNT-A:root"
]
},
"Action": [
"glue:GetDatabase",
"glue:GetConnection",
"glue:GetTable",
"glue:GetPartition",
"glue:GetPartitions"
],
"Resource": [
"arn:aws:glue:us-east-1:ACCOUNT-B:catalog",
"arn:aws:glue:us-east-1:ACCOUNT-B:database/marvel",
"arn:aws:glue:us-east-1:ACCOUNT-B:table/marvel/*"
]
}
]
}
- Apply a bucket policy to S3 bucket, granting access to role created in step 1.
{
"Version": "2012-10-17",
"Id": "Policy1587912647278",
"Statement": [
{
"Sid": "Stmt1587912633716",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::ACCOUNT-A:role/cross_account_glue_role",
"arn:aws:iam::ACCOUNT-A:root"
]
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::marvel-glue-catalog-demo",
"arn:aws:s3:::marvel-glue-catalog-demo/*"
]
}
]
}
3. In Account A
- Create the DynamicFrame using “from_catalog” option in Account A, reading the data from Account B catalog.

In the above code, datasource0 is the DynamicFrame created by reading the data from “marvel_superheroes” table under “marvel” database from another AWS account mentioned in “catalog_id” parameter. The value for catalog_id should be within quotes!
To conclude, DynamicFrames in AWS Glue ETL can be created by reading the data from cross-account Glue catalog with the correctly defined IAM permissions and policies.
5 thoughts on “Cross-account AWS Glue Data Catalog access with Glue ETL”
In both policies, you are granting “Action”: “s3:*”.
Why do we need all Write permissions on the source catalog, DBs and S3? The Glue jobs in account A only needs to read the data from the relevant S3 buckets of account B. So granting all Write permissions might lead to accidentally modifying the data in account B which is undesirable if the job is reading from prod accounts/buckets. Just reflecting based on a real scenario I’m facing!
I guess only List and Read permissions would be enough.
Hi Avishek,
Thank you for visiting the blog.
Yes, list and get permission should be enough to read the data for this purpose.
Regards,
Anand
Very nice details steps – made it super easy to understand. Thank you for the nice documentation.
Thank you for visiting the blog. Glad it helped!
This post really helped out in conjunction with the sister post on how to do cross-account Glue ETL without using a datacatalog (https://aprakash.wordpress.com/2020/03/28/get-dynamodb-data-into-s3-from-different-account/?unapproved=10026&moderation-hash=590edb18d994a5bda16f0cb889c37169#comment-10026).
The knowledge on the IAM assumption is really not obvious and difficult learning to figure out without expert help.