Wednesday, July 31, 2013

Amazon S3, Cloud Computing Storage for Files, Images, Videos

Amazon S3, Cloud Computing Storage for Files, Images, Videos
http://aws.amazon.com/s3/


Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

 


This page contains the following categories of information. Click to jump down:


Amazon S3 is intentionally built with a minimal feature set.

  • Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The number of objects you can store is unlimited.
  • Each object is stored in a bucket and retrieved via a unique, developer-assigned key.
  • A bucket can be stored in one of several Regions. You can choose a Region to optimize for latency, minimize costs, or address regulatory requirements. Amazon S3 is currently available in the US Standard, US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo), and GovCloud (US) Regions. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps.
  • Objects stored in a Region never leave the Region unless you transfer them out. For example, objects stored in the EU (Ireland) Region never leave the EU.
  • Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
  • Options for secure data upload/download and encryption of data at rest are provided for additional data protection.
  • Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
  • Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP. A BitTorrent™ protocol interface is provided to lower costs for high-scale distribution.
  • Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options. These options can be easily administered from the Amazon S3 Management Console.
  • Reliability backed with the Amazon S3 Service Level Agreement.

Data stored in Amazon S3 is secure by default; only bucket and object owners have access to the Amazon S3 resources they create. Amazon S3 supports multiple access control mechanisms, as well as encryption for both secure transit and secure storage on disk. With Amazon S3's data protection features, you can protect your data from both logical and physical failures, guarding against data loss from unintended user actions, application errors, and infrastructure failures. For customers who must comply with regulatory standards such as PCI and HIPAA, Amazon S3's data protection features can be used as part of an overall strategy to achieve compliance. The various data security and reliability features offered by Amazon S3 are described in detail below.

Data Security Details

Amazon S3 supports several mechanisms that give you flexibility to control who can access your data as well as how, when, and where they can access it. Amazon S3 provides four different access control mechanisms: Identity and Access Management (IAM) policies, Access Control Lists (ACLs), bucket policies, and query string authentication. IAM enables organizations with multiple employees to create and manage multiple users under a single AWS account. With IAM policies, you can grant IAM users fine-grained control to your Amazon S3 bucket or objects. You can use ACLs to selectively add (grant) certain permissions on individual objects. Amazon S3 Bucket Policies can be used to add or deny permissions across some or all of the objects within a single bucket. With Query string authentication, you have the ability to share Amazon S3 objects through URLs that are valid for a predefined expiration time.

You can securely upload/download your data to Amazon S3 via the SSL encrypted endpoints using the HTTPS protocol. Amazon S3 also provides multiple options for encryption of data at rest. If you prefer to manage your own encryption keys, you can use a client encryption library like the Amazon S3 Encryption Client to encrypt your data before uploading to Amazon S3. Alternatively, you can use Amazon S3 Server Side Encryption (SSE) if you prefer to have Amazon S3 manage encryption keys for you. With Amazon S3 SSE, you can encrypt data on upload simply by adding an additional request header when writing the object. Decryption happens automatically when data is retrieved.

Amazon S3 also supports logging of requests made against your Amazon S3 resources. You can configure your Amazon S3 bucket to create access log records for the requests made against it. These server access logs capture all requests made against a bucket or the objects in it and can be used for auditing purposes.

For more information on the security features available in Amazon S3, please refer to Access Control and Using Data Encryption topics in the Amazon S3 Developer Guide. For an overview on security on AWS, including Amazon S3, please refer to Amazon Web Services: Overview of Security Processes document.

Data Durability and Reliability

Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon S3 synchronously stores your data across multiple facilities before returning SUCCESS. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data. Unlike traditional systems which can require laborious data verification and manual repair, Amazon S3 performs regular, systematic data integrity checks and is built to be automatically self-healing.

Amazon S3 provides further protection via Versioning. You can use Versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. This allows you to easily recover from both unintended user actions and application failures. By default, requests will retrieve the most recently written version. Older versions of an object can be retrieved by specifying a version in the request. Storage rates apply for every version stored.

Amazon S3's standard storage is:

  • Backed with the Amazon S3 Service Level Agreement.
  • Designed for 99.999999999% durability and 99.99% availability of objects over a given year.
  • Designed to sustain the concurrent loss of data in two facilities.

Reduced Redundancy Storage (RRS) is a storage option within Amazon S3 that enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy than Amazon S3's standard storage. It provides a cost-effective, highly available solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced. The RRS option stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive, but does not replicate objects as many times as standard Amazon S3 storage, and thus is even more cost effective. Reduced Redundancy Storage is:

  • Backed with the Amazon S3 Service Level Agreement.
  • Designed to provide 99.99% durability and 99.99% availability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.01% of objects.
  • Designed to sustain the loss of data in a single facility.

Amazon Glacier

Amazon S3 enables you to utilize Amazon Glacier's extremely low-cost storage service as a storage option for data archival. Amazon Glacier stores data for as little as $0.01 per gigabyte per month, and is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. Examples include digital media archives, financial and healthcare records, raw genomic sequence data, long-term database backups, and data that must be retained for regulatory compliance.

Like Amazon S3's other storage options (Standard or Reduced Redundancy Storage), objects stored in Amazon Glacier using Amazon S3's APIs or Management Console have an associated user-defined name. You can get a real-time list of all of your Amazon S3 object names, including those stored using the Amazon Glacier option, using the Amazon S3 LIST API. Objects stored directly in Amazon Glacier using Amazon Glacier's APIs cannot be listed in real-time, and have a system-generated identifier rather than a user-defined name. Because Amazon S3 maintains the mapping between your user-defined object name and the Amazon Glacier system-defined identifier, Amazon S3 objects that are stored using the Amazon Glacier option are only accessible through Amazon S3's APIs or the Amazon S3 Management Console. To restore Amazon S3 data that was stored in Amazon Glacier via the Amazon S3 APIs or Management Console, you first initiate a restore job using the Amazon S3 APIs or Management Console. Restore jobs typically complete in 3 to 5 hours. Once the job is complete, you can access your data through an Amazon S3 GET request.

The Amazon Glacier storage option is:

  • Backed with the Amazon S3 Service Level Agreement.
  • Designed for 99.999999999% durability and 99.99% availability of objects over a given year.
  • Designed to sustain the concurrent loss of data in two facilities.

Amazon S3 makes it easy to manage your data. With Amazon S3's data lifecycle management capabilities, you can automatically archive objects to even lower cost storage options or perform recurring deletions, enabling you to reduce your costs over an object's lifetime. Amazon S3 also allows you to monitor and control your costs across your different business functions. All of these management capabilities can be easily administered using the Amazon S3 APIs or Management Console. The various data management features offered by Amazon S3 are described in detail below.

Data Lifecycle Management

Lifecycle management of data refers to how your data is managed and stored from creation and initial storage to when it's no longer needed and deleted. Amazon S3 provides a number of capabilities to simplify the lifecycle management of your data, including management of capacity, automated archival to lower cost storage, and scheduled deletions.

When storing new data, Amazon S3 eliminates the need for capacity planning by enabling you to both scale on-demand and pay only for the capacity you use. With traditional storage systems, capacity planning can be an error-prone process, especially when storage growth is unpredictable, as it often is. Over provisioning capacity can result in under-utilization and higher costs, while under provisioning can trigger expensive hardware upgrades far earlier than planned.

As your data ages, Amazon S3 takes care of automatically and transparently migrating your data to new hardware as hardware fails or reaches its end of life. This eliminates the need for you to perform expensive, time-consuming, and risky hardware migrations. Amazon S3 also enables you to automatically archive your data to lower cost storage as your data ages. You can define rules to automatically archive sets of Amazon S3 objects to Amazon Glacier based on their lifetime. Data archival rules are supported for Amazon S3 objects in the US-Standard, US-West (N. California), US-West (Oregon), EU-West (Ireland), and Asia Pacific (Japan) Regions

When your data reaches its end of life, Amazon S3 provides programmatic options for recurring and high volume deletions. For recurring deletions, rules can be defined to remove sets of objects after a pre-defined time period. For efficient one-time deletions, up to 1,000 objects can be deleted with a single request. These rules can be applied to standard objects, RRS objects, or objects that have been archived to Amazon Glacier.

Cost Monitoring and Controls

Amazon S3 offers several features for managing and controlling your costs. You can use the AWS Management Console or the Amazon S3 APIs to apply tags to your Amazon S3 buckets, enabling you to allocate your costs across multiple business dimensions, including cost centers, application names, or owners. You can then view breakdowns of these costs using Amazon Web Services' Cost Allocation Reports, which show your usage and costs aggregated by your tags. For more information on Cost Allocation and tagging, please visit About AWS Account Billing. For more information on tagging your S3 buckets, please see the Bucket Tagging topic in the Amazon S3 Developer Guide.

You can use Amazon CloudWatch to receive billing alerts that help you monitor the Amazon S3 charges on your bill. You can set up an alert to be notified automatically via e-mail when estimated charges reach a threshold that you choose. For additional information on billing alerts, you can visit the billing alerts page or see the Monitor Your Estimated Charges topic in the Amazon CloudWatch Developer Guide.


Pay only for what you use. There is no minimum fee. Estimate your monthly bill using the AWS Simple Monthly Calculator. We charge less where our costs are less, and prices are based on the location of your Amazon S3 bucket.

AWS Free Usage Tier*

As part of the AWS Free Usage Tier, you can get started with Amazon S3 for free. Upon sign-up, new AWS customers receive 5 GB of Amazon S3 standard storage, 20,000 Get Requests, 2,000 Put Requests, and 15GB of data transfer out each month for one year.

Storage Pricing

Request Pricing

Data Transfer Pricing

The pricing below is based on data transferred "in" to and "out" of Amazon S3.

Storage and bandwidth size includes all file overhead.

Rate tiers take into account your aggregate usage for Data Transfer Out to the Internet across Amazon EC2, Amazon S3, Amazon Glacier, Amazon RDS, Amazon SimpleDB, Amazon SQS, Amazon SNS, Amazon DynamoDB, and AWS Storage Gateway.

AWS GovCloud Region

AWS GovCloud is an AWS Region designed to allow U.S. government agencies and contractors to move more sensitive workloads into the cloud by addressing their specific regulatory and compliance requirements. For pricing and more information on the new AWS GovCloud Region, please visit the AWS GovCloud web page.

*

Your usage for the free tier is calculated each month across all regions except the AWS GovCloud Region and automatically applied to your bill – unused monthly usage will not roll over. Restrictions apply; See

offer terms

for more details.

(Amazon S3 is sold by Amazon Web Services, Inc..)


Using Amazon S3 is easy. To get started you:

  • Create a Bucket to store your data. You can choose a Region where your bucket and object(s) reside to optimize latency, minimize costs, or address regulatory requirements.
  • Upload Objects to your Bucket. Your data is durably stored and backed by the Amazon S3 Service Level Agreement.
  • Optionally, set access controls. You can grants others access to your data from anywhere in the world.

You can easily and securely create buckets, upload objects, and set access controls using the AWS Management Console. The console provides a point-and-click web-based interface for accessing and managing all of your Amazon S3 resources. The Amazon S3 Getting Started Guide shows you how to start using Amazon S3 from the console. Developers building applications can use the AWS SDK for .NET, the AWS SDK for Java, or a wide variety of 3rd party libraries for other platforms and languages.


AWS Import/Export accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. AWS transfers your data directly onto and off of storage devices using Amazon's high-speed internal network and bypassing the Internet. For significant data sets, AWS Import/Export is often faster than Internet transfer and more cost effective than upgrading your connectivity. You can use AWS Import/Export for migrating data into the cloud, distributing content to your customers, sending backups to AWS, and disaster recovery.

You can also use AWS Direct Connect to transfer large amounts of data to Amazon S3. AWS Direct Connect makes it easy to establish a dedicated network connection from your premise to AWS. Using AWS Direct Connect, you can establish private connectivity between AWS and your datacenter, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.


Amazon S3 can be used to support a wide variety of use cases, for example:

Content Storage and Distribution

Amazon S3 provides a highly durable and available store for a variety of content, ranging from web applications to media files. It allows you to offload your entire storage infrastructure onto the cloud, where you can take advantage of Amazon S3's scalability and pay-as-you-go pricing to handle your growing storage needs. You can distribute your content directly from Amazon S3 or use Amazon S3 as an origin store for pushing content to your Amazon CloudFront edge locations.

For sharing content that is either easily reproduced or where you're storing an original copy elsewhere, Amazon S3's Reduced Redundancy Storage (RRS) feature provides a compelling solution. For example, if you're storing media content in-house but you need to provide accessibility to your customers, channel partners, or employees, RRS is a low-cost solution for storing and sharing this content.

Storage for Data Analysis

Whether you're storing pharmaceutical data for analysis, financial data for computation and pricing, or photo images for resizing, Amazon S3 is an ideal location to store your original content. You can then send this content to Amazon EC2 for computation, resizing, or other large scale analytics – without incurring any data transfer charges for moving the data between the services. You can then choose to store the resulting, reproducible content using Amazon S3's Reduced Redundancy Storage feature (or, of course, you can store it using Amazon S3's standard storage as well).

Backup, Archiving and Disaster Recovery

Amazon S3 offers a highly durable, scalable, and secure solution for backing up and archiving your critical data. You can use Amazon S3's Versioning capability to provide even further protection for your stored data. If you have data sets of significant size, you can use

AWS Import/Export

to move large amounts of data into and out of AWS with physical storage devices. This is ideal for moving large quantities of data for periodic backups, or quickly retrieving data for disaster recovery scenarios. You can also define rules to archive sets of Amazon S3 objects to Amazon Glacier's extremely low-cost storage service based on object lifetimes. As your data ages, these rules enable you to ensure that it's automatically stored on the storage option that is most cost-effective for your needs.

Static Website Hosting

You can host your entire static website on Amazon S3 for an inexpensive, highly available hosting solution that scales automatically to meet traffic demands. Self-hosting a highly available website that can handle peak traffic loads can be challenging and costly. With Amazon S3, you can reliably serve your traffic and handle unexpected peaks without worrying about scaling your infrastructure. Amazon S3 is designed for 99.99% availability and 99.999999999% durability, and it gives you access to the same highly scalable, reliable, and fast infrastructure that Amazon uses to run its own global network of web sites. You also benefit from pay-as-you-go pricing. You pay only for the capacity you use. Amazon S3's website hosting solution is ideal for websites with static content, including html files, images, videos, and client-side scripts such as JavaScript. (Amazon EC2 is recommended for websites with server-side scripting and database interaction).


Amazon S3 is based on the idea that quality Internet-based storage should be taken for granted. It helps free developers from worrying about how they will store their data, whether it will be safe and secure, or whether they will have enough storage available. It frees them from the upfront costs of setting up their own storage solution as well as the ongoing costs of maintaining and scaling their storage servers. The functionality of Amazon S3 is simple and robust: Store any amount of data inexpensively and securely, while ensuring that the data will always be available when you need it. Amazon S3 enables developers to focus on innovating with data, rather than figuring out how to store it.

Amazon S3 was built to fulfill the following design requirements:

  • Secure: Built to provide infrastructure that allows the customer to maintain full control over who has access to their data. Customers must also be able to easily secure their data in transit and at rest.
  • Reliable: Store data with up to 99.999999999% durability, with 99.99% availability. There can be no single points of failure. All failures must be tolerated or repaired by the system without any downtime.
  • Scalable: Amazon S3 can scale in terms of storage, request rate, and users to support an unlimited number of web-scale applications. It uses scale as an advantage: Adding nodes to the system increases, not decreases, its availability, speed, throughput, capacity, and robustness.
  • Fast: Amazon S3 must be fast enough to support high-performance applications. Server-side latency must be insignificant relative to Internet latency. Any performance bottlenecks can be fixed by simply adding nodes to the system.
  • Inexpensive: Amazon S3 is built from inexpensive commodity hardware components. All hardware will eventually fail and this must not affect the overall system. It must be hardware-agnostic, so that savings can be captured as Amazon continues to drive down infrastructure costs.
  • Simple: Building highly scalable, reliable, fast, and inexpensive storage is difficult. Doing so in a way that makes it easy to use for any application anywhere is more difficult. Amazon S3 must do both.

A forcing-function for the design was that a single Amazon S3 distributed system must support the needs of both internal Amazon applications and external developers of any application. This means that it must be fast and reliable enough to run Amazon.com's websites, while flexible enough that any developer can use it for any data storage need.


Your use of this service is subject to the Amazon Web Services Customer Agreement


(via Instapaper)

No comments:

Post a Comment