Simple Storage Service (S3)

S3 stands for Simple Storage Service. It is a storage service that provides an interface that you can use to store any amount of data, at any time, from anywhere in the world. With S3 you pay only for what you use, and the payment model is pay-as-you-go. It's like a File Transfer Protocol (FTP) storage.

• You can keep your SNAPSHOTS in S3. You can also ENCRYPT your sensitive data in S3.

• S3 was the second service that was introduced by AWS.

Amazon S3 allows people to store objects (files) in “buckets” (directories)
Buckets must have a globally unique name (across all regions and all accounts)
Buckets are defined at the regional level
S3 looks like a global service but buckets are created in a region.
Anything that you upload on S3 you can access the objects using an HTTP protocol.
S3 is globally accessible it is a very good platform for hosting your static website as it is accessible anywhere

Amazon S3 Use cases

• Backup and storage

• Disaster Recovery

• Archive

• Hybrid Cloud storage

• Application hosting

• Media hosting

• Data lakes & big data analytics

• Software delivery

• Static website

Advantage of S3

♻️Scalability: S3 can store an unlimited amount of data, and you can easily scale your storage needs up or down as required.

♻️Durability: With 99.999999999% durability and comprehensive security and compliance capabilities.

♻️Availability: It provides a high level of availability, ensuring that your data is accessible when needed.

♻️Security: S3 offers various security features, including access control lists (ACLs), bucket policies, and encryption options, to keep your data secure.

♻️Cost-Effective: S3 offers cost-effective storage solutions with various storage classes, such as Standard, Intelligent-Tiering, and Glacier, allowing you to optimize costs based on your data access patterns

♻️Data Transfer Acceleration: S3 Transfer Acceleration speeds up transferring files to and from S3 by using Amazon CloudFront's globally distributed edge locations.

♻️Backup and Restore: It's an excellent solution for data backup and restore, ensuring data integrity and availability

❗️ Cons

Data Transfer Costs: Fees can be high for transferring data out of S3. Learning Curve: Can be Complex to understand for new users

🎯Use Case: Airbnb uses S3 to store and analyze data to guide its real-estate price offerings and match customers with personalized experiences

Key Features of S3:

Amazon S3 is an object storage service designed to store and retrieve any amount of data, anywhere on the web. Key features include scalability, durability, security, and versatility. Whether you're a startup or a global enterprise, S3's capabilities can meet your storage needs.

Security:

Amazon takes security very seriously. Being in the Cloud and serving the thousands of organizations using these services means an exceptional level of security is required. AWS provides multiple levels of security, let’s go through them one by one to understand them in detail.

Let’s divide the overall security part into two: Data Access Security and Data Storage Security.
Identity and Access Management (IAM) policies:

IAM policies apply to specific principles like User, Group, and Role. The policy is a JSON document, which mentions what the principle can or can not do.

An example IAM policy will look like this. Any IAM entity (user, role, group) having the below policy can access the app gambit-s3access-test bucket and objects inside that.

{
  "Version": "2012-10-17",
  "Statement":[{
    "Effect": "Allow",
    "Action": "s3:*",
    "Resource": ["arn:aws:s3:::appgambit-s3access-test",
                 "arn:aws:s3:::appgambit-s3access-test/*"]
    }
  ]
}

Bucket policies.

Bucket policy uses JSON-based access policy language to manage advanced permissions. If you want to make all the objects inside a bucket publicly accessible, then following simple JSON will do that. Bucket policies are only applicable to S3 buckets.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "MakeBucketPublic",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::appgambit-s3access-test/*"
        }
    ]
}

Bucket policies are very versatile and have lot of configuration options. Let’s say the bucket is service web assets and it should only serve content to requests originating from a specific domain.

{
  "Version":"2012-10-17",
  "Id":"Allow Website Access",
  "Statement":[
    {
      "Sid":"Allow Access to only appgambit.com",
      "Effect":"Allow",
      "Principal":"*",
      "Action":"s3:GetObject",
      "Resource":"arn:aws:s3:::appgambit-s3access-test/*",
      "Condition":{
        "StringLike":{
            "aws:Referer":[
                "https://www.appgambit.com/*",
                "https://appgambit.com/*"
            ]
        }
      }
    }
  ]
}

Access Control Lists (ACLs)

ACL is a legacy access policy option to grant basic read/write permissions to other AWS accounts.

Query String Authentication

Imagine you have private content that you want to share with your authenticated users out of your application, like sending the content link via email which only that user can access.

AWS S3 allows you to create a specialized URL that contains information to access the object. This method is also known as a Pre-Signed URL.
Data Access Security

By default, when you create a new bucket, only you have access to Amazon S3 resources they create. You can use access control mechanisms such as bucket policies and Access Control Lists (ACLs) to selectively grant permissions to users and groups of users. Customers may use four mechanisms for controlling access to Amazon S3 resources.

Backup & Recovery

By default, the AWS S3 provides the same level of durability and availability across all the regions. But then also things can go wrong, so most organizations when they use Cloud to host their data, would like to have backup and recovery in their plan.

AWS S3 provides a simple mechanism to create a backup for data, Cross Region Replication. Cross-region Replication enables automatic and asynchronous copying of objects across buckets in different AWS regions. This is useful in case we want to access our data in different regions or create a general backup of the data.

Cross-Origin Resource Sharing (CORS):

Cross-Origin Resource Sharing (CORS) configuration in Amazon S3 allows you to control which web domains or websites are permitted to access the resources (such as objects) in your S3 bucket from a web browser. This is essential for web applications that need to make cross-origin requests to S3. Cross Region Replication requires Versioning enabled, so this will have an impact on your AWS billing amount as well. The CRR includes the Versioning cost as well as the Data Transfer cost.

Here's how you can configure CORS in an S3 bucket:
1. Sign in to AWS Management Console: Sign in to your AWS account and navigate to the S3 service.
2. Select Your Bucket: Click on the name of the S3 bucket for which you want to configure CORS.
3. Access Permissions Tab: In the bucket's properties, navigate to the "Permissions" tab.
4. CORS Configuration: Scroll down to the "Cross-origin resource sharing (CORS)" section, and click on "Edit" to configure CORS rules.
5. Create CORS Rules: You can add one or more CORS rules, which are defined in JSON format. CORS rules typically include the following components:
  - AllowedOrigins (Origins): This specifies the list of domains (origins) that are allowed to make cross-origin requests to your S3 bucket. For example:
    - ```
                            [
                                "http://example.com",
                                "https://example.net"
                            ]
```
- AllowedMethods (HTTP methods): You can specify which HTTP methods (e.g., GET, PUT, POST) are allowed in cross-origin requests. For example:
```
          [
              "GET",
              "HEAD",
              "PUT"
          ]
```
- AllowedHeaders: You can specify which HTTP headers can be included in the actual request from the client. For example:
```
          [
              "Authorization",
              "Content-Type"
          ]
```
- ExposeHeaders: You can specify which response headers can be exposed to the client. For example:
```
          [
              "x-amz-server-side-encryption"
          ]
```
- MaxAgeSeconds: This defines how long the preflight request (an initial request sent before the actual request) is cached by the browser, in seconds. For example:
- Save Your CORS Configuration: After specifying your desired CORS rules, click "Save" to save your CORS configuration.

What are the different storage classes in S3?

Amazon S3 offers several storage classes, each designed to address different use cases and optimize costs based on your data's access patterns and retrieval requirements. Here's an overview of the various storage classes available in Amazon S3:

S3 Standard🚫:
- Use Case: This is the default storage class, suitable for frequently accessed data that requires low-latency retrieval. It's ideal for data that is actively used and needs high availability.
- Durability: 99.999999999% (11 9's)
- Availability: 99.99%
S3 Intelligent-Tiering🚫:
- Use Case: Designed for data with unknown or changing access patterns. It automatically moves objects between two access tiers (frequent and infrequent) based on changing access patterns, optimizing costs without any performance impact.
- Durability: 99.999999999% (11 9's)
- Availability: 99.9%
S3 Standard-IA (Infrequent Access) 🚫:
- Use Case: Suitable for data that is accessed less frequently but still requires low-latency retrieval when needed. It's a cost-effective choice for data that can tolerate slightly higher retrieval times.
- Durability: 99.999999999% (11 9's)
- Availability: 99.9%
S3 One Zone-IA🚫:
- Use Case: Similar to Standard-IA but stores data in a single availability zone, offering lower costs. It's a good choice for data that can be easily recreated if lost, and where durability is less critical.
- Durability: 99.999999999% (11 9's) within a single availability zone
- Availability: 99.5%
S3 Glacier and Glacier Deep Archive🚫:
- Use Case: Designed for long-term archival of data that is accessed infrequently, such as compliance data, backups, and historical records. Data is archived and has longer retrieval times.
- Durability: 99.999999999% (11 9's)
- Availability: Glacier: Not designed for immediate retrieval; retrieval times vary. Glacier Deep Archive: Longer retrieval times.
S3 Glacier Storage Class🚫:
- Use Case: Similar to S3 Glacier, but with configurable retrieval times. You can choose between expedited, standard, or bulk retrieval options based on your needs.
S3 Outposts🚫:
- Use Case: Intended for data stored on AWS Outposts, which are AWS-managed, on-premises data centers. It allows data stored on Outposts to be seamlessly integrated with S3.
S3 Reduced Redundancy Storage (RRS) (Deprecated)🚫:
- Use Case: Previously, RRS was an option for non-critical, reproducible data. However, AWS has deprecated this storage class, and users are encouraged to use other options, like S3 Standard or S3 Standard-IA.

Deploy a static site from AWS S3

Create an S3 bucket:

Log in to your AWS Management Console, navigate to the S3 service, and create a new bucket. Name it according to your preference, keeping in mind that bucket names must be globally unique.

Upload HTML file to bucket:

After creating the bucket, upload your website's static file (HTML, CSS, JavaScript, images, etc.) into the bucket. Make sure to set the appropriate permissions for these files so they can be publicly accessible.

Example HTML file to upload:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-
width, initial-scale=1.0">
<title>Serving from S3 Bucket</title>
</head>
<body>
<h1>Serving from 53 Bucket</h1>
<p>This is a simple HTML file served from an S3
bucket.</p>
</body>
</html>

Turn on static web hosting:

In the bucket properties, find the "Static website hosting" section and click on "Edit". Select the option to enable static website hosting, and specify the index document (e.g., index.html)

Allow public traffic:

To allow public access to your website files, you'll need to set a bucket policy. Go to the bucket's permissions tab, click on "Bucket Policy", and add a policy allowing read access to all users ("*") for the resources in your bucket.

Site should be live:

Once everything is set up, you can access your website using the endpoint provided in the "Static website hosting" section of your bucket properties.

What is a glacier?

Glacier is the backup or archival tool that you use to back up your data in S3.

How will you use S3 with your EC2 instances?

Websites hosted on your EC2 instances can load their static contents directly from S3. It provides highly scalable, reliable, fast, inexpensive data storage infrastructure.

How can you send a request to Amazon S3?

You can send requests by using the REST API or the AWS SDK wrapper libraries that wrap the underlying Amazon S3 REST API.

What is the minimum and maximum size of individual objects that you can store in S3?

The minimum size of individual objects that you can store in S3 is 0 bytes and the maximum bytes that you can store for individual objects is 50TB.

In S3 how many buckets can be created?

By default, 100 buckets can be created in a region.

How can your secure access to your S3 bucket?

There are two ways that you can control the access to your S3 buckets,
• ACL – Access Control List
• Bucket policies

How can you encrypt data in S3?

You can encrypt the data by using the below methods,
• Server Side Encryption – S3 (AES 256 encryption)
• Server Side Encryption – KMS (Key Management Service)
• Server Side Encryption – C (Client Side)

What is the relation between Amazon S3 and AWS KMS?

To encrypt Amazon S3 data at rest, you can use several variations of Server-Side Encryption. Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it SSE performed by Amazon S3, and AWS Key Management Service (AWS KMS) uses the 256-bit Advanced Encryption Standard (AES).

Permissions & Management.

● Access Control List: ACLs used to grant read/write permission to another AWS Account.

● Bucket Policy: It uses JSON JSON-based access policy to advance permission to your S3 Resources.

● CORS: CORS stands for Cross-Origin Resource Sharing. It allows cross-origin access to your S3 Resources

Charges On S3:

The pricing model for S3 is as below,
• Storage used
• Number of requests you make
• Storage management
• Data transfer
• Transfer acceleration

If your application is hosted in S3 and users are in different locations, how can you reduce latency?

Use S3 Transfer Acceleration or CloudFront CDN to deliver content closer to global users.

What is the pre-requisite to work with Cross-region replication in S3?
You need to enable versioning on both the source bucket and destination to work with cross-region replication. Also both the source and destination buckets should be in different regions.

Amazon S3 -Versioning

• You can version your files in Amazon S3

• It is enabled at the bucket level

• Same key overwrite will change the “version”: 1, 2, 3….

• It is best practice to version your buckets

• Protect against unintended deletes (ability to restore a version)

• Easy rollback to the previous version

• Any file that is not versioned before enabling versioning will have version “null”

• Suspending versioning does not delete the previous versions

Explain Amazon s3 lifecycle rules.
Amazon S3 lifecycle configuration rules, you can significantly reduce your storage costs by automatically transitioning data from one storage class to another or even automatically deleting data after some time.
• Store backup data initially in Amazon S3 Standard
• After 30 days, transition to Amazon Standard IA
• After 90 days, transition to Amazon Glacier
• After 3 years, delete

What is the function of cross-region replication in Amazon S3?

Cross-region replication is a feature that allows you to asynchronously replicate all new objects in the source bucket in one AWS region to a target bucket in another region. To enable cross-region replication, versioning must be turned on for both source and destination buckets. Cross-region replication is commonly used to reduce the latency required to access objects in Amazon S3

Some Important Questions

What is the distinction between Amazon S3 and EBS?

Amazon S3: S3 is object storage designed for storing and retrieving large amounts of unstructured data, such as files, images, videos, and backups. It's highly scalable and suitable for web hosting, data archiving, and content distribution.
Amazon EBS: EBS stands for Elastic Block Stores. They are persistent volumes that you can attach to the instances. With EBS volumes, your data will be preserved even when you stop your instances, unlike your instance store volumes where the data is deleted when you stop the instances. AWS released a feature called Multi-Attach, which allows EC2 instances to share a single EBS volume for up to 16 instances.

In summary, S3 is best for storing and managing files and objects, while EBS is better suited for block-level storage used by EC2 instances.

What is Amazon S3 and how does it ensure the durability of data?

Amazon S3 (Simple Storage Service) is a scalable object storage service that allows users to store and retrieve any amount of data. It ensures durability by redundantly storing data across multiple facilities and devices within a region. S3 automatically replicates data to at least three physically separated Availability Zones (AZs) to protect against data loss.

What are the key features of Amazon S3?

Amazon S3 offers features like data durability, high availability, security options, scalable storage, and the ability to store data in different storage classes based on access patterns.

What is an S3 bucket?

An S3 bucket is a container for storing objects, which can be files, images, videos, and more. Each object in S3 is identified by a unique key within a bucket.

How can you control access to objects in S3?

Access to S3 objects can be controlled using bucket policies, access control lists (ACLs), and IAM (Identity and Access Management) policies. You can define who can read, write, and delete objects.

What is the difference between S3 Standard, S3 Intelligent-Tiering, and S3 One Zone-IA storage classes?

S3 Standard: Offers high durability, availability, and performance.

S3 Intelligent-Tiering: Automatically moves objects between two access tiers based on changing access patterns.

S3 One Zone-IA: Stores objects in a single availability zone with lower storage costs, but without the multi-AZ resilience of S3 Standard.

How does S3 provide data durability?

S3 provides 99.999999999% (11 9's) durability by automatically replicating objects across multiple facilities within a region.

What is Amazon S3 Glacier used for?

Amazon S3 Glacier is a storage service designed for data archiving. It offers lower-cost storage with retrieval times ranging from minutes to hours.

How can your secure data in Amazon S3?

You can secure data in Amazon S3 by using access control mechanisms, like bucket policies and IAM policies, and by enabling encryption using server-side encryption or client-side encryption.

What is S3 versioning?

S3 versioning is a feature that allows you to preserve, retrieve, and restore every version of every object in a bucket. It helps protect against accidental deletion and overwrites.

What is a pre-signed URL in S3?

A pre-signed URL is a URL that grants temporary access to an S3 object. It can be generated using your AWS credentials and shared with others to provide temporary access.

How can you optimize costs in Amazon S3?

You can optimize costs by using storage classes that match your data access patterns, utilizing lifecycle policies to transition objects to less expensive storage tiers, and setting up cost allocation tags for billing visibility.

What is S3 Cross-Region Replication?

S3 Cross-Region Replication is a feature that automatically replicates objects from one S3 bucket in one AWS region to another bucket in a different region.

How can you automate the movement of objects between different storage classes?

You can use S3 Lifecycle policies to automate the transition of objects between storage classes based on predefined rules and time intervals.

What is the purpose of S3 event notifications?

S3 event notifications allow you to trigger AWS Lambda functions or SQS queues when certain events, like object creation or deletion, occur in an S3 bucket.

What is the AWS Snowball device?

The AWS Snowball is a physical data transport solution used for migrating large amounts of data into and out of AWS. It's ideal for scenarios where the network transfer speed is not sufficient.

What is Amazon S3 Select?

Amazon S3 Select is a feature that allows you to retrieve specific data from an object using SQL-like queries, without the need to retrieve the entire object.

What is the difference between Amazon S3 and Amazon EBS?

Amazon S3 is object storage used for storing files, while Amazon EBS (Elastic Block Store) is block storage used for attaching to EC2 instances as volumes.

How can you enable server access logging in Amazon S3?

You can enable server access logging to track all requests made to your bucket. The logs are stored in a target bucket and can help analyze access patterns.

What is S3 Transfer Acceleration?

S3 Transfer Acceleration is a feature that speeds up transferring files to and from Amazon S3 by utilizing Amazon CloudFront's globally distributed edge locations.

How can you replicate data between S3 buckets within the same region?

You can use S3 Cross-Region Replication to replicate data between S3 buckets within the same region by specifying the same source and destination region.

Command Palette