Data lakes are a centralized repository for storing structured and unstructured data at scale. Data lakes enable you to create dashboards, perform big data processing and real-time analytics, and create machine learning (ML) models on your data to drive business decisions.
Many customers are choosing AWS Lake Formation as their data lake management solution. Lake Formation is an integrated data lake service that makes it simple for you to ingest, clean, catalog, transform, and secure your data and make it available for analysis and ML.
However, some companies require account authentication and authorization to be managed through AWS IAM Identity Center (successor to AWS Single Sign-On), which doesn’t have a built-in integration with Lake Formation.
Integrating Lake Formation with IAM Identity Center can help you manage data access at the organization level, consolidating AWS account and data lake authentication and authorization.
In this post, we walk through the steps to integrate IAM Identity Center with Lake Formation.
In this post, we configure IAM Identity Center with permission sets for your data lake personas. These are the permissions that allow your data lake users to access Lake Formation. When the permission sets are assigned to your data lake account, IAM Identity Center creates Identity and Access Management (IAM) roles in that account. The IAM roles are prefixed with
In Lake Formation, you can grant data resource permissions to IAM users and roles. To integrate with IAM Identity Center, you will grant data resource access to the IAM roles created by IAM Identity Center.
Now, when users access the data lake account through the IAM Identity Center portal, they assume an IAM role that has access to Lake Formation resources.
The following diagram illustrates this solution architecture.
To implement the solution, complete the following high-level steps:
- Create a permission set within IAM Identity Center
- Grant Users or Groups access to the data lake account in IAM Identity Center
- Assign an IAM Identity Center role as a Data Lake Administrator
- Grant IAM Identity Center generated IAM role data lake permissions in Lake Formation
- Grant IAM Identity Center generated IAM role data location permissions in Lake Formation
For this walkthrough, you should have the following prerequisites:
Create a permission set with IAM Identity Center
To create your permission set, complete the following steps:
- Sign into the AWS Management Console with your management account and go to the Region where IAM Identity Center is configured.
- On the IAM Identity Center Console, choose Permissions sets in the navigation pane.
- Choose Create permission set.
- Select Custom permission set, then choose Next.
- Next, you must specify policies. The first permission set you create should have data lake admin privileges.
AWS recommends granting data lake admins the following AWS managed policies:
CloudWatchLogsReadOnlyAccess. However, if these permissions are too permissive or not permissive enough, you may prefer using customer managed policies.
- Choose Next
- Specify permission set details, then choose Next.
- Review your settings, then choose Create.
Repeat the steps to create a data analyst role to grant Lake Formation access. For this post, we created the role
LakeFormationDataAnalyst with the policy
Grant users or groups access to the data lake account in IAM Identity Center
To grant access to users and groups, complete the following steps:
- On the IAM Identity Center console, chose AWS accounts in the navigation pane.
- Choose Assign users or groups.
- Select the user or group you want to assign the data lake account permissions to (
- Choose Next.
- Select the permission you created earlier.
- Choose Next.
- Review your settings, then choose Submit.
Verify your IAM Identity Center permissions have been successfully granted by visiting your IAM Identity Center Portal, choosing the data lake admin, and signing in to the console.
Assign an IAM Identity Center role as a data lake administrator
The following steps set up a data lake administrator with the IAM role created by IAM Identity Center. Administrators have full access to the Lake Formation console, and control the initial data configuration and access permissions. For all users and groups that don’t need to be data lake administrators, skip to the next series of steps.
- Sign in to the console as the data lake account with admin access.
- Open the Lake Formation console.A pop-up window appears, prompting you to define your administrators.
- Select Add other AWS users or roles.
- Choose the permission set you created earlier (starting with
- Choose Get started.
- On the Administrative roles and tasks page, under Database creators, choose Grant.
- Choose your data lake admin role.
- Select Create database under Catalog permissions and Grantable permissions.
- Choose Grant.
You now have an IAM Identity Center-generated IAM principal that is assigned as the data lake administrator and database creator.
Grant the IAM Identity Center role data lake permissions in Lake Formation
You now manage data lake permissions. For more information, refer to Managing Lake Formation permissions.
- On the Lake Formation console, under Permissions in the navigation pane, choose Data lake permissions.
- Choose Grant.
- Select IAM users and roles.
- Choose the
- Grant access to database and table permissions as applicable, then choose Grant.
You now have an IAM Identity Center-generated IAM principal data permissions.
Grant the IAM Identity Center role data location permissions in Lake Formation
When granting access to data locations, the process remains the same.
- On the Lake Formation console, under Permissions in the navigation pane, choose Data locations.
- Choose Grant.
- Choose the
- Complete the remaining fields and choose Grant.
You now have an IAM Identity Center-generated IAM principal with Data location access.
Validate data access
We now validate data access for the IAM Identity Center principal.
- Sign in to the console through IAM Identity Center as the principal you granted access to. For this post, we’re logging in as the
To test data access, we run some queries in Amazon Athena.
- On the Athena console, choose Query editor.
- On the Settings tab, confirm that a query result location is set up.
- If you don’t have a query result location, choose Manage and configure your query result location and encryption.
- In the Athena query editor, on the Editor tab, choose the database that you granted access to.If the principal doesn’t have access to the Lake Formation table and data location, you won’t be able to view data in Athena.
- Choose the menu icon next to your table and choose Generate table DDL.
Confirm that the data appears on the Query results tab.
In this post, we demonstrated how to integrate IAM Identity Center with Lake Formation permissions. You can now grant IAM Identity Center identities administrator, database creation, database and table, and data location access in Lake Formation. Managing data lake permissions through IAM Identity Center allows you to control data access from your management account, helping to improve your scalability and security.
If you’re wondering how to adapt this solution to Tag-based access control, read Easily manage your data lake at scale using AWS Lake Formation Tag-based access control and apply the techniques you learned from this blog.
About the authors
Benon Boyadjian is a Private Equity Solutions Architect at AWS. He is passionate about helping customers understand the impact AWS can have on their businesses and guiding their AWS implementations. In his free time, he enjoys swimming, snowboarding, and playing with his cat Dirt.
Janakiraman Shanmugam is a Senior Data Architect at Amazon Web Services . He has a focus in Data & Analytics and enjoys helping customers to solve Big data & machine learning problems. Outside of the office, he loves to be with his friends and family and spend time outdoors.