Amazon S3
Amazon S3 is a storage service. It stores every file as an object in an unstructured format. A bucket is like a container or a folder for storing objects. Each object is a piece of data that includes the data, metadata, and a unique identifier for easy access and retrieval. For storing a single object, the maximum size of an object is limited up to 5TB. But you can upload the object in multiple parts. In this topic, we see the
- Bucket Overview
- Storage classes in S3
- Object Lock in S3
- Features in S3
- Security
- Monitoring & Logging
1. Bucket Overview:
A bucket is like a container or a folder for storing objects. You can create 100 Buckets per AWS account. If you need more buckets contact service limit. Bucket name should be unique across AWS accounts and regions. Bucket Policy is a resource-based Identity and Access Management policy that contains multiple statements to grant access permission of buckets and objects. Bucket Policy is owned by Bucket Owner. The maximum size of the bucket policy is 20KB and it is in JSON-based access policy language.
After creating a bucket, you cannot change its name and its region(where you created it. you can upload the object once you made the bucket. You can download one object at a time. You can empty the bucket by deleting all the objects in it. Once you delete the bucket, then the bucket name will be available for re-use. So other AWS accounts may create the bucket by using this name. If you want to use the same bucket name, then empty the bucket.
2. Storage Classes in S3:
Amazon S3 offers a wide range of storage classes for different use cases. It is designed for storage, cost, archival, and retrieval for other use cases without affecting their performance & operations. All the storage classes store the data in 3 available zones of multiple copies except one zone -IA and express one zone.
2.1 Storage classes for frequently accessed objects
S3 Standard is the default storage class that provides high availability, durability, and performance object storage for frequently accessed data. Key features are 99.9⁹ durability, 99.99% availability, SSL support for data encryption both in transit & rest, low latency, and high throughput.
2.2 Storage class for automatically optimizing data with changing or unknown access patterns
S3 Intelligent tier moves your data automatically to cost-effective areas based on access frequency. It moves the object not accessed for 30 days to infrequent access, thereby reducing storage costs by up to 40%. After 90 days it is moved to the instant archive tier thereby storage costs reduced by up to 68%. After 180 days it moves to the deep archive thereby storage costs reduced up to 95%.
2.3 Storage classes for infrequently accessed objects
S3 — Standard Infrequent Access suitable for data is less frequency access and rapid access when needed. Key features are 99.9⁹ durability, 99.99% availability, SSL support for data encryption both in transit & rest, low latency, and high throughput.
S3 -One Zone Infrequent Access suitable for data is less frequency access and rapid access when needed. Key features are 99.9⁹ durability, 99.5% availability, SSL support for data encryption both in transit & rest, low latency, and high throughput. It stores the data in 1 Availability Zone. The cost is 20% lower than standard IA.
2.4 Storage classes for rarely accessed objects
S3 — Glacier is designed to provide you with the highest performance, the most retrieval flexibility, and the lowest cost of archive storage in the cloud. It has 3 retrieval modes. There are
Instant Retrieval data can be retrieved at milliseconds at a low cost. Designed for 99.9⁹ durability and 99.9% of availability across multiple zones. Data size should be a minimum of 128kb. You can access the data in real time.
Flexible Retrieval ( S3-glacier ) data can be retrieved within minutes(1–5) to hours at a lower cost. Designed for 99.⁹⁹ durability and 99.99 % of availability. Bulk Retrieval can be retrieved within 5 -12 hours. You cannot access the data in real time.
Deep Archive Retrieval data can be retrieved from Hours to hours at the lowest cost. The default time is 12 hours. It stores the data in 3 Availability Zones. You cannot access the data in real time.
4. Object Lock in S3
Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time (retention period) or indefinitely (Legal Hold). You can use Object Lock to help meet regulatory requirements that require WORM storage or simply add another layer of protection against object changes and deletion. It uses the Write Once Read Many (WORM) model. Object Lock works only in versioned buckets, and retention periods and legal holds apply to individual object versions. When you lock an object version, Amazon S3 stores the lock information in the metadata for that object version. Placing a retention period or legal hold on an object protects only the version specified in the request. It doesn’t prevent new versions of the object from being created.
In governance mode, users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions. In compliance mode, a protected object version can’t be overwritten or deleted by any user, including the root user in your AWS account.
4. Features in S3
Block Public Level Access By default objects in the bucket or AWS account are enabled by turning on block public access. Public access is granted by using the Access Control List (ACL), Policy (or) both. AWS recommends turning on block public access to all new buckets also existing buckets. You can change the settings in objects and buckets regarding keeping public and private.
Cost Optimization You can make the cost-efficient by storing the data in storage classes and changing the life cycle for objects.
Versioning in Amazon S3 is a means of keeping multiple copies of an object in the same bucket. It helps you to recover objects from accidental deletion or overwrite. If you delete an object, instead of removing the object permanently, Amazon S3 inserts a delete marker, which becomes the current object version. You can then restore the previous version. If you overwrite an object, Amazon S3 adds a new object version to the bucket. The previous version remains in the bucket and becomes a noncurrent version.
A multi-region Access point is the process of accessing data from different regions.
Cross-region replication is used to automatically copy the data from the source account to the target account across different regions. Same Region replication is used to automatically copy the data from the source account to the target account within the region. Both are used to improve the high availability in case of disaster.
Batch replication is replicating the object from the existing bucket to a new bucket such as unable to migrate.
Multi-Part Upload supports uploading a large file by dividing it into several parts. It supports the pause and resume method. Once the upload part fails you can re-upload the failed part only. you are charged for your storage, bandwidth, and your request for Multi-part upload. You can create a lifecycle for “ delete the incomplete multipart uploads “ and set the number of days. This deleting operation is free of charge.
S3 Transfer Acceleration is a bucket-level feature used by S3 to transfer files that vary from GB to TB securely, and quickly across regions. Instead of uploading the file directly to S3, it utilizes edge locations to transfer the files. when the data arrives at the edge location, it uses an optimized path to reach S3.
To test and compare the speed of S3 Transfer Acceleration using the speed comparison tool click the link. The sample image looks like this…
5. Security
For Security purposes use Data Protection while data is in transit you can use SSL / TSL to protect it. While the data is at rest you can use data encryption like server-side encryption (SSE) and client-side encryption. Use AWS Private Link for S3 for secure communication regarding VPC service.
6. Logging & Monitoring
For Monitoring use AWS S3 Event Notifications to receive notifications when certain event happen in your S3 bucket.
For Logging you can use Cloud Trail Logs and Server access logs. Both are used to maintain log records when auditing and compliance purposes. Server access logging provides detailed records for the requests that are made to an Amazon S3 bucket. When you enable logging, Amazon S3 delivers access logs for a source bucket to a target bucket. The target bucket must be in the same AWS Region and AWS account as the source bucket, and must not have a default retention period configuration. But for simpler log management, we recommend that you save access logs in a different bucket.
AWS recommends Cloud Trail logs for bucket-level and object-level actions. To view the logs, Create the trail in cloudtrail console. If you don’t configure a trail, you can view the most recent events in the CloudTrail console in Event history.
See this page for a comparison between CloudTrail logs and Server-access logs.