Amazon Redshift is a massively parallel processing (MPP), absolutely managed petabyte-scale information warehouse that makes it easy and cost-effective to investigate all of your information utilizing present enterprise intelligence instruments.
When companies are modernizing their information warehousing options to Amazon Redshift, implementing extra information safety mechanisms for delicate information, comparable to personally identifiable data (PII) or protected well being data (PHI), is a standard requirement, particularly for these in extremely regulated industries with strict information safety and privateness mandates. Amazon Redshift gives role-based entry management, row-level safety, column-level safety, and dynamic information masking, together with different database safety features to allow organizations to implement fine-grained information safety.
Safety-sensitive purposes typically require column-level (or field-level) encryption to implement fine-grained safety of delicate information on prime of the default server-side encryption (particularly information encryption at relaxation). In different phrases, delicate information ought to be at all times encrypted on disk and stay encrypted in reminiscence, till customers with correct permissions request to decrypt the information. Column-level encryption gives an extra layer of safety to guard your delicate information all through system processing in order that solely sure customers or purposes can entry it. This encryption ensures that solely approved principals that want the information, and have the required credentials to decrypt it, are in a position to take action.
On this put up, we reveal how one can implement your personal column-level encryption mechanism in Amazon Redshift utilizing AWS Glue to encrypt delicate information earlier than loading information into Amazon Redshift, and utilizing AWS Lambda as a user-defined perform (UDF) in Amazon Redshift to decrypt the information utilizing normal SQL statements. Lambda UDFs may be written in any of the programming languages supported by Lambda, comparable to Java, Go, PowerShell, Node.js, C#, Python, Ruby, or a customized runtime. You should utilize Lambda UDFs in any SQL assertion comparable to SELECT, UPDATE, INSERT, or DELETE, and in any clause of the SQL statements the place scalar capabilities are allowed.
The next diagram describes the answer structure.
For example how you can arrange this structure, we stroll you thru the next steps:
- We add a pattern information file containing artificial PII information to an Amazon Easy Storage Service (Amazon S3) bucket.
- A pattern 256-bit information encryption key’s generated and securely saved utilizing AWS Secrets and techniques Supervisor.
- An AWS Glue job reads the information file from the S3 bucket, retrieves the information encryption key from Secrets and techniques Supervisor, performs information encryption for the PII columns, and masses the processed dataset into an Amazon Redshift desk.
- We create a Lambda perform to reference the identical information encryption key from Secrets and techniques Supervisor, and implement information decryption logic for the obtained payload information.
- The Lambda perform is registered as a Lambda UDF with a correct AWS Id and Entry Administration (IAM) position that the Amazon Redshift cluster is allowed to imagine.
- We are able to validate the information decryption performance by issuing pattern queries utilizing Amazon Redshift Question Editor v2.0. You could optionally select to check it with your personal SQL consumer or enterprise intelligence instruments.
To deploy the answer, be certain to finish the next conditions:
- Have an AWS account. For this put up, you configure the required AWS sources utilizing AWS CloudFormation within the
- Have an IAM consumer with permissions to handle AWS sources together with Amazon S3, AWS Glue, Amazon Redshift, Secrets and techniques Supervisor, Lambda, and AWS Cloud9.
Deploy the answer utilizing AWS CloudFormation
Provision the required AWS sources utilizing a CloudFormation template by finishing the next steps:
- Sign up to your AWS account.
- Select Launch Stack:
- Navigate to an AWS Area (for instance,
- For Stack identify, enter a reputation for the stack or go away as default (
- For RedshiftMasterUsername, enter a consumer identify for the admin consumer account of the Amazon Redshift cluster or go away as default (
- For RedshiftMasterUserPassword, enter a robust password for the admin consumer account of the Amazon Redshift cluster.
- Choose I acknowledge that AWS CloudFormation may create IAM sources.
- Select Create stack.
The CloudFormation stack creation course of takes round 5–10 minutes to finish.
- When the stack creation is full, on the stack Outputs tab, document the values of the next:
Add the pattern information file to Amazon S3
To check the column-level encryption functionality, you may obtain the pattern artificial information generated by Mockaroo. The pattern dataset comprises artificial PII and delicate fields comparable to telephone quantity, e mail handle, and bank card quantity. On this put up, we reveal how you can encrypt the bank card quantity discipline, however you may apply the identical technique to different PII fields based on your personal necessities.
An AWS Cloud9 occasion is provisioned for you in the course of the CloudFormation stack setup. You could entry the occasion from the AWS Cloud9 console, or by visiting the URL obtained from the CloudFormation stack output with the important thing
On the AWS Cloud9 terminal, copy the pattern dataset to your S3 bucket by operating the next command:
Generate a secret and safe it utilizing Secrets and techniques Supervisor
We generate a 256-bit secret for use as the information encryption key. Full the next steps:
- Create a brand new file within the AWS Cloud9 setting.
- Enter the next code snippet. We use the cryptography bundle to create a secret, and use the AWS SDK for Python (Boto3) to securely retailer the key worth with Secrets and techniques Supervisor:
- Save the file with the file identify
generate_secret.py(or any desired identify ending with
- Set up the required packages by operating the next
pip set upcommand within the terminal:
- Run the Python script by way of the next command to generate the key:
Create a goal desk in Amazon Redshift
A single-node Amazon Redshift cluster is provisioned for you in the course of the CloudFormation stack setup. To create the goal desk for storing the dataset with encrypted PII columns, full the next steps:
- On the Amazon Redshift console, navigate to the listing of provisioned clusters, and select your cluster.
- To hook up with the cluster, on the Question information drop-down menu, select Question in question editor v2.
- If that is the primary time you’re utilizing the Amazon Redshift Question Editor V2, settle for the default setting by selecting Configure account.
- To hook up with the cluster, select the cluster identify.
- For Database, enter
- For Person identify, enter
- For Password, enter your password.
You could want to vary the consumer identify and password based on your CloudFormation settings.
- Select Create connection.
- Within the question editor, run the next DDL command to create a desk named
We suggest utilizing the smallest potential column dimension as a finest observe, and chances are you’ll want to change these desk definitions per your particular use case. Creating columns a lot bigger than mandatory will have an effect on the scale of knowledge tables and have an effect on question efficiency.
Create the supply and vacation spot Knowledge Catalog tables in AWS Glue
The CloudFormation stack provisioned two AWS Glue information crawlers: one for the Amazon S3 information supply and one for the Amazon Redshift information supply. To run the crawlers, full the next steps:
- On the AWS Glue console, select Crawlers within the navigation pane.
- Choose the crawler named
glue-s3-crawler, then select Run crawler to set off the crawler job.
- Choose the crawler named
glue-redshift-crawler, then select Run crawler.
When the crawlers are full, navigate to the Tables web page to confirm your outcomes. You need to see two tables registered underneath the
Creator an AWS Glue ETL job to carry out information encryption
An AWS Glue job is provisioned for you as a part of the CloudFormation stack setup, however the extract, remodel, and cargo (ETL) script has not been created. We create and add the ETL script to the
/glue-script folder underneath the provisioned S3 bucket with the intention to run the AWS Glue job.
- Return to your AWS Cloud9 setting both by way of the AWS Cloud9 console, or by visiting the URL obtained from the CloudFormation stack output with the important thing
We use the Miscreant bundle for implementing a deterministic encryption utilizing the AES-SIV encryption algorithm, which implies that for any given plain textual content worth, the generated encrypted worth can be at all times the identical. The good thing about utilizing this encryption method is to permit for level lookups, equality joins, grouping, and indexing on encrypted columns. Nonetheless, you also needs to pay attention to the potential safety implication when making use of deterministic encryption to low-cardinality information, comparable to gender, boolean values, and standing flags.
- Create a brand new file within the AWS Cloud9 setting and enter the next code snippet:
- Save the script with the file identify
- Copy the script to the specified S3 bucket location by operating the next command:
- To confirm the script is uploaded efficiently, navigate to the Jobs web page on the AWS Glue console.You need to be capable of discover a job named
- Select Run to set off the AWS Glue job.It’s going to first learn the supply information from the S3 bucket registered within the AWS Glue Knowledge Catalog, then apply column mappings to rework information into the anticipated information sorts, adopted by performing PII fields encryption, and at last loading the encrypted information into the goal Redshift desk. The entire course of ought to be accomplished inside 5 minutes for this pattern dataset.You’ll be able to swap to the Runs tab to observe the job standing.
Configure a Lambda perform to carry out information decryption
A Lambda perform with the information decryption logic is deployed for you in the course of the CloudFormation stack setup. You could find the perform on the Lambda console.
The next is the Python code used within the Lambda perform:
If you wish to deploy the Lambda perform by yourself, be certain to incorporate the Miscreant bundle in your deployment bundle.
Register a Lambda UDF in Amazon Redshift
You’ll be able to create Lambda UDFs that use customized capabilities outlined in Lambda as a part of your SQL queries. Lambda UDFs are managed in Lambda, and you’ll management the entry privileges to invoke these UDFs in Amazon Redshift.
- Navigate again to the Amazon Redshift Question Editor V2 to register the Lambda UDF.
- Use the
CREATE EXTERNAL FUNCTIONcommand and supply an IAM position that the Amazon Redshift cluster is allowed to imagine and make calls to Lambda:
You could find the Lambda identify and Amazon Redshift IAM position on the CloudFormation stack Outputs tab:
Validate the column-level encryption performance in Amazon Redshift
By default, permission to run new Lambda UDFs is granted to
PUBLIC. To limit utilization of the newly created UDF, revoke the permission from
PUBLIC after which grant the privilege to particular customers or teams. To be taught extra about Lambda UDF safety and privileges, see Managing Lambda UDF safety and privileges.
You have to be a
superuser or have the
sys:secadmin position to run the next SQL statements:
First, we run a
SELECT assertion to confirm that our extremely delicate information discipline, on this case the
registered_credit_card column, is now encrypted within the Amazon Redshift desk:
For normal database customers who haven’t been granted the permission to make use of the Lambda UDF, they’ll see a permission denied error once they attempt to use the
For privileged database customers who’ve been granted the permission to make use of the Lambda UDF for decrypting the information, they will situation a SQL assertion utilizing the
registered_credit_card values may be efficiently retrieved, as proven within the
To keep away from incurring future costs, be certain to scrub up all of the AWS sources that you just created as a part of this put up.
You’ll be able to delete the CloudFormation stack on the AWS CloudFormation console or by way of the AWS Command Line Interface (AWS CLI). The default stack identify is
On this put up, we demonstrated how you can implement a customized column-level encryption resolution for Amazon Redshift, which gives an extra layer of safety for delicate information saved on the cloud information warehouse. The CloudFormation template offers you a straightforward technique to arrange the information pipeline, which you’ll be able to additional customise to your particular enterprise situations. It’s also possible to modify the AWS Glue ETL code to encrypt a number of information fields on the identical time, and to make use of completely different information encryption keys for various columns for enhanced information safety. With this resolution, you may restrict the events the place human actors can entry delicate information saved in plain textual content on the information warehouse.
You’ll be able to be taught extra about this resolution and the supply code by visiting the GitHub repository. To be taught extra about how you can use Amazon Redshift UDFs to resolve completely different enterprise issues, consult with Instance makes use of of user-defined capabilities (UDFs) and Amazon Redshift UDFs.
Concerning the Creator
Aaron Chong is an Enterprise Options Architect at Amazon Net Providers Hong Kong. He specializes within the information analytics area, and works with a variety of shoppers to construct massive information analytics platforms, modernize information engineering practices, and advocate AI/ML democratization.