S3 | The Missing SENG

S3 (Simple Storage Service)

Amazon S3 (Simple Storage Service) is an object storage service that offers industry-leading scalability, data availability, security, and performance. It’s designed to store and retrieve any amount of data, at any time, from anywhere on the web. Think of it as a highly scalable, durable, and secure hard drive in the cloud.

Key Concepts

Object Storage: S3 stores data as objects within buckets. Unlike file systems (which have a hierarchical directory structure), object storage is flat. Each object has a unique key (like a filename) and consists of data, metadata (information about the data), and a globally unique ID.
Buckets: Containers for objects. Bucket names are globally unique across all AWS accounts. You create buckets in specific AWS regions to optimize for latency, minimize costs, or address regulatory requirements.
Keys: The unique identifier for an object within a bucket. The combination of a bucket name and a key uniquely identifies each object. Keys can include / characters to simulate a directory structure, but it’s still fundamentally flat storage. (e.g., my-bucket/images/logo.png - my-bucket is the bucket, images/logo.png is the key).
Regions: S3 buckets are created in specific AWS regions. Choosing a region close to your users can reduce latency.
Durability and Availability: S3 is designed for 99.999999999% (11 9’s) durability and 99.99% availability of objects over a given year. This high durability is achieved through data redundancy across multiple facilities and devices.
Storage Classes: S3 offers different storage classes optimized for different access patterns and cost requirements:
- S3 Standard: General-purpose storage for frequently accessed data.
- S3 Intelligent-Tiering: Automatically moves objects between frequent, infrequent, and archive access tiers based on access patterns, optimizing costs.
- S3 Standard-IA (Infrequent Access): For data that is accessed less frequently, but requires rapid access when needed. Lower storage cost than S3 Standard, but with a retrieval cost.
- S3 One Zone-IA: Similar to S3 Standard-IA, but stores data in a single Availability Zone, offering lower cost but less redundancy.
- S3 Glacier Instant Retrieval: Archive storage for data rarely accessed.
- S3 Glacier Flexible Retrieval: Low-cost archive storage with retrieval times ranging from minutes to hours.
- S3 Glacier Deep Archive: Lowest-cost storage class for long-term archiving, with retrieval times of 12 hours or more.
Versioning: S3 can keep multiple versions of an object in the same bucket. This protects against accidental deletion or overwriting.
Lifecycle Policies: You can define rules to automatically transition objects between storage classes or delete them after a specified period. This helps optimize storage costs.
Access Control: S3 offers granular access control using:
- IAM Policies: Control access for AWS users and groups.
- Bucket Policies: Control access to an entire bucket.
- Access Control Lists (ACLs): Control access to individual objects (generally, bucket policies are preferred).
- Pre-signed URLs: Generate temporary URLs that grant time-limited access to specific objects, even to users who don’t have AWS credentials.
Static Website Hosting: S3 can host static websites (HTML, CSS, JavaScript, images) directly, without requiring a separate web server.
Event Notifications: S3 can trigger notifications when certain events occur (e.g., object creation, deletion), which can be used to integrate with other AWS services (like Lambda, SQS, SNS).

Example: Uploading a File to S3 using the AWS CLI

This example demonstrates uploading a file to an S3 bucket using the AWS CLI. You’ll need the AWS CLI installed and configured.

Create an S3 Bucket (if you don’t have one):

aws s3 mb s3://my-unique-bucket-name --region us-east-1  # Replace with a unique bucket name and your desired region

Note: Bucket names must be globally unique.

Upload a File:

aws s3 cp my_file.txt s3://my-unique-bucket-name/  # Upload 'my_file.txt' to the root of the bucket
aws s3 cp my_file.txt s3://my-unique-bucket-name/my_folder/my_file.txt  # Upload to a "folder" (key prefix)

List Objects in the Bucket:
```
aws s3 ls s3://my-unique-bucket-name/
```

Download a File:

aws s3 cp s3://my-unique-bucket-name/my_file.txt .  # Download 'my_file.txt' to the current directory

Remove a file:

 aws s3 rm s3://my-unique-bucket-name/my_file.txt

Remove the bucket (the bucket must be empty):
```
 aws s3 rb s3://my-unique-bucket-name
```