Managing data most often involves creating, storing, and processing data. However, as a data engineer, there is an equally important aspect to consider: deleting data. For privacy purposes and legal compliance, businesses are required to delete or anonymize certain data.
Businesses need to comply with a variety of data protection regulations, such as the General Data Protection Regulation (GDPR) in the European Union, which grants consumers the right to request the deletion of their personal data. In addition, other legal requirements or internal policies may necessitate that data be deleted after a specific period of time.
AWS provides a range of services that can be used to properly delete data while ensuring business and legal requirements are met. Here is a guide using AWS Suite, particularly for those aiming for the AWS Certified Data Engineer – Associate (DEA-C01) exam.
AWS S3 (Simple Storage Service)
AWS S3 is widely used for data storage. To delete data from a bucket in S3, AWS CLI (Command Line Interface), AWS SDK, or AWS Management Console can be used.
With AWS CLI, you can use the following command:
aws s3 rm s3://mybucket/myobject
Here, ‘mybucket’ refers to the name of your bucket, and ‘myobject’ to the data file you want to delete.
For improved security, you can implement an S3 Bucket Policy to automate the deletion process. By setting rules in the Lifecycle Configuration of your bucket, you can specify the age of data files that should be automatically deleted. This ensures that data is not retained longer than necessary.
AWS RDS (Relational Database Service)
AWS RDS offers a ‘Delete’ option for each database instance. Additionally, you can specify a retention period for automated database snapshots, beyond which they are automatically deleted.
You can set this retention period when you create a DB instance. Just navigate to ‘Backup’ > ‘Automated backup’> specify your desired retention period.
It’s worth noting that deleting a DB instance permanently deletes any automated backups of the database.
AWS Glue and Athena
AWS Glue and Athena services can help manage and retrieve data. However, actual data deletion should happen at the storage level. Using these services, you can overwrite data partitions and create tables without the data you want to delete.
While these services offer a logical layer for separating data, the actual data deletion from S3 or RDS is crucial to meet business and legal requirements.
Data Deletion and Security
AWS reinforces that once data is deleted, it is unrecoverable. Therefore, be 100% sure that the data is no longer necessary before you delete it. Adopt a systematic way of data management to avoid accidental data deletion, such as using ‘Versioning’ for your S3 data – it will keep all versions of an object in the same bucket.
AWS provides a variety of tools and services to handle data deletion as per legal and business requirements. An AWS Certified Data Engineer should be aware of these, as they form an integral part of data management and meeting compliance requirements.
Practice Test
True or False: The AWS Management Console is the only way to delete data from an AWS deployment.
- True
- False
Answer: False
Explanation: While the AWS Management Console is one way to manage and delete data, it is not the only way. You can also use AWS Command Line Interface (CLI), AWS SDKs, and direct API calls.
In AWS, which service is specifically designed to help with the secure deletion of data to meet business and legal requirements?
- A. AWS Secrets Manager
- B. AWS Macie
- C. AWS Inspector
- D. AWS Glacier
Answer: B. AWS Macie
Explanation: AWS Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data like PII. It can help in meeting data deletion requirements in a secure manner.
True or False: Using ‘delete’ operation in Amazon S3 permanently removes the data immediately.
- True
- False
Answer: False
Explanation: The ‘delete’ operation in Amazon S3 marks the object for deletion, but it may still be recoverable for a short period of time due to Amazon’s eventuality consistency model. To permanently delete data, you may need to enable versioning and then manually delete each version of the object.
When you use the ‘delete’ command in Amazon RDS, you have the option to:
- A. Save a final DB snapshot
- B. Retain automated backups
- C. Both A and B
- D. Neither A nor B
Answer: C. Both A and B
Explanation: Before deleting Amazon RDS instances or clusters, you get the option to save a final DB snapshot for reference or later use, and retain automated backups for disaster recovery scenarios.
AWS KMS allows you to delete customer master keys (CMKs).
- A. True
- B. False
Answer: A. True
Explanation: AWS Key Management Service (KMS) permits deletion of customer master keys, but it defaults to a “pending deletion” state to prevent accidental loss of data encryption keys.
Storage classes in Amazon S3, like S3 One Zone-IA and S3 Glacier, can be used for cost-effective long-term archiving and may help in meeting certain legal requirements.
- A. True
- B. False
Answer: A. True
Explanation: These storage classes move data to lower-cost storage and can enforce certain retention policies that may be required for legal compliance.
True or False: The AWS Artifact service can help with deleting data to meet certain regulatory standards.
- True
- False
Answer: False
Explanation: AWS Artifact provides on-demand access to AWS compliance reports and agreements but does not actually manage data deletion.
To securely erase data from an Amazon EC2 instance before decommissioning, you should:
- A. Delete all the files
- B. Terminate the instance
- C. Use an appropriate file shredding tool
- D. Transfer all data to Amazon S3
Answer: C. Use an appropriate file shredding tool
Explanation: Just deleting files or terminating instances may not erase the data beyond recovery. Utilizing a file shredding tool ensures that the data is securely erased.
Object lock feature in Amazon S3 can help in meeting regulatory requirements by preventing data deletion for a defined period of time.
- A. True
- B. False
Answer: A. True
Explanation: The object lock feature can enforce retention periods to prevent deletion, helping meet regulatory requirements for preserving certain types of data.
AWS CloudTrail service provides a way to delete logs from your AWS environment.
- A. True
- B. False
Answer: B. False
Explanation: AWS CloudTrail service captures and logs API activities, but it does not provide a mechanism to delete these logs or other data.
Interview Questions
What AWS service can be used to automate data deletion to meet business and legal requirements?
AWS provides services like AWS Glue, AWS Lake Formation, and Amazon S3 Lifecycle policies that can be configured to automate data deletion.
How can S3 Lifecycle policies help in meeting business and legal data deletion requirements?
S3 Lifecycle policies can be configured to automatically delete objects from a bucket after a designated time period, thus ensuring that data is not retained longer than necessary.
What is the benefit of using Amazon Macie for data deletion?
Amazon Macie automatically provides an inventory of Amazon S3 data, along with patterns and anomalies to identify, alert, and potentially delete sensitive data to meet business and legal requirements.
How can AWS KMS be used to help meet data deletion requirements?
AWS Key Management Service (KMS) can be used to delete customer master keys (CMKs) which has the effect of rendering the data encrypted under those keys unreadable.
How can AWS CloudTrail help organizations meet regulatory requirements for data deletion?
AWS CloudTrail provides a record of AWS API calls, including calls related to data deletion, which can be critical for audit and compliance purposes.
Can Amazon RDS enable automatic data deletion?
Yes, Amazon RDS supports automated backups where you can specify a retention period.
What is the purpose of the AWS Data Lifecycle Manager?
AWS Data Lifecycle Manager allows the automation of the creation, retention, and deletion of Amazon EBS volume snapshots.
Can you change the retention period of a RDS snapshot after it has been created?
No, once a RDS snapshot has been created the retention period cannot be modified.
Do Amazon S3 Glacier and Glacier Deep Archive support data deletion?
Yes, you can delete archives in both Amazon S3 Glacier and Glacier Deep Archive; however, you may incur a charge if an archive is deleted within the minimum storage duration period.
Can AWS Config provide insights regarding data deletion activities?
Yes, AWS Config records the configuration of your AWS resources so you can understand configurations that trigger deletions, which can aid in meeting business and legal requirements.
How can versioning in S3 benefit data deletion?
If versioning is enabled in Amazon S3, even when a delete operation is performed the data is not permanently removed, but a delete marker is created. This allows for potential restoration if needed.
How does the AWS Secrets Manager help in data deletion?
AWS Secrets Manager automatically rotates, manages, and retrieves database credentials, API keys, and other secrets throughout their lifecycle. Using Secrets Manager, you can secure and manage secrets used to access resources, thus ensuring access keys are always up to date and old ones are removed.
Can AWS DMS (Database Migration Service) be used to delete data?
AWS DMS is used to migrate databases to AWS, not typically for data deletion. However, if part of your migration strategy involves deleting old or unnecessary data, DMS might be useful in this context.
How can data deletion policies in Amazon S3 buckets help in cost optimization?
Proper implementation of data deletion policies in Amazon S3 buckets can reduce storage costs by removing unneeded data.
Can AWS IAM (Identity and Access Management) aid in data deletion management?
Yes, AWS IAM can help manage access to AWS services and resources such as allowing specific users permissions to delete data on certain services, ensuring that only authorized individuals can perform deletions.