Posts EBS, EFS and Storage Gateway
Post
Cancel

EBS, EFS and Storage Gateway

EBS Volume

Elastic Block Store (EBS) provides persistent block storage volumes for usage with Amazon EC2 instances in the AWS Cloud.

Each Amazon EBS volume is automatically replicated with its AZ to protect you from component failure, offering high availability and durability.

EBS is like a virtual hard disk drive in the cloud

  • it uses the network to communicate the instance, which means there might be a bit of latency
  • it can be detached from an EC2 instance and attached to another one quickly, as long as in the same AZ
  • it can only attach to a single instance at one time
TypeDescriptionUse CasesAPI NameVolume SizeMax. IOPS / Volume
General Purpose (GP2 SSD)General purpose that balances price and performance - can be used as boot volumesMost work loadsgp21GB - 16TB16000
Provisioned IOPS (SSD)Highest-performance SSD for mission-critical applications, IOPS (I/O Ops Per Sec) - can be used as boot volumesDatabasesio14GB - 16TB64000
Throughput Optimised Hard Disk Drive (ST1 HDD)Low cost HDD for freq. accessed, throughput-intensive workloadsBig Data & Data warehousesst1500GB - 16TB500
Cold HDD (SC1 HDD)Lowest cost HDD for less freq. accessed workloadsFile Serverssc1500GB - 16TB250
EBS MagneticPrevious generation HDDWorkloads where data is infreq. accessedStandard1GB - 1TB40 - 200

EBS is locked in an AZ, which means if an EBS volume in us-east-1a cannot be attached to us-east-1b. To move a volume across, your first need to snapshot it

EBS volume default type is SSD, but come with a option of magnetic

Details of different EBS Volumes

gp2

GP2 is for General purpose that balances price and performance, this type of volume can be used as boot volumes

it:

  • recommended for most workloads
  • system boot volumes
  • virtual desktops
  • low-latency interactive apps
  • development and test environments

here are some stats of GP2

  • 1 GB - 16 TB
  • small gp2 volumes can burst IOPS to 3000
  • max IOPS is 16,000
  • 3 IOPS per GB, means at 5334GB we are at the max IOPS

io1

io1 is suitable for critical business applications that require sustained IOPS performance, or more than 16,000 IOPS per volume (gp2 limit), it is also suitable for large database workloads

stats of io1

  • 4 GB - 16 TB
  • IOPS is provisioned (PIOPS) - min 100 ~ max 64,000
  • the maximum ratio of provisioned IOPS to requested volume size (in GB) is 50:1

sc1

  • Throughput-oriented storage for large volumes of data that is infrequently accessed
  • scenarios where the lowest storage cost is important
  • cannot be a boot volume

Throughput refers to how much data can be transferred from one location to another in a given amount of time. It is used to measure the performance of hard drives and RAM, as well as Internet and network connections.

stats of sc1

  • 500 GB - 16 TB
  • max IOPS is 250
  • Max throughput of 250 MB/s - can burst

EBS Volume Type Summary

  • gp2: Genenral Purpose volumes (cheap)
    • 3 IOPS / GB, minimum 100 IOPS, can burst to 3000 IOPS, with max 16,000 IOPS
  • io1: Provisioned IOPS (expensive)
    • min 100 IOPS, max 64,000 IOPS (Nitro) or 32,000 (other)
  • st1: Throughput Optimized HDD
    • 500 MB/s throughput
  • sc1: Cold HDD, Infrequently accessed data
    • 250 MB/s throughput

EBS Snapshots

You need to make sure that the AZ of EC2 Instance is the same as the AZ of EBS Volume, otherwise, this will cause a huge latency time.

EBS Snapshots are backed up to S3 incrementally

What will happen if we terminate the EC2 Instance?

  • EBS Volume got removed automatically (Root Device)
  • Other volumes (by default) continue to persist, and the status of the volume is available
  • We can create EBS and check “delete on termination”

How to migrate data from AZ1 to another different AZ (EC2 / EBS)? Step by step:

  1. Actions -> Create Snapshots
  2. Turn the Snapshot to an AMI
  3. Use the AMI to launch in another AZ

EBS Encryption

  1. When you create an encrypted EBS volume, you get the following:
    • Data are rest encrypted inside the volume
    • All the data in light moving between the instance and the volume is encrypted
    • All snapshots are encrypted
    • All volumes created from snapshot are encrypted
  2. Encryption and decryption are handled transparently, which means you don’t need to do anything
  3. Encryption has a minimal impact on latency
  4. EBS Encryption leverages keys from KMS (AES-256)
  5. Copying an un-encrypted snapshot allows encryption
  6. Snapshots of encrypted volumes are encrypted

EBS vs Instance Store

Some instances do not come with Root EBS volumes. Instead, they come with "Instance Store", which is ephemeral storage. Instance Store is physically attached to the machine (EBS is a network drive)

There are Pros and Cons for using Instance Store

Pros

  • Better I/O performance
  • Good for buffer / cache / scratch data / temporary content
  • Data survives reboots

Cons

  • On stop or termination, the instance store is lost, (since ephemeral <-> temporary)
  • You can’t resize the instance store
  • Backups must be operated by the user

Local EC2 Instance Store is a physical disk attached to the physical server where your EC2 is

it has very high IOPS, but the size of it cannot be increased and the data will be lost if hardware fails to happen

EBS RAID Options

EBS is replicated within an AZ so it is already redundant storage. But if you want to increase the IOPS more or you want to mirror your EBS volumes, then you need to mount volumes in parallel in RAID settings. (RAID is possible as long as your OS supports it)

Normal RAID options:

  • RAID 0
  • RAID 1
  • RAID 5 - not recommended for EBS
  • RAID 6 - not recommended for EBS

RAID 0 - increasing performance

  • Combining 2 or more volumes and getting the total disk space and I/O
  • But one disk fails, then all the data is failed
  • Use cases:
    • application needs a lot of IOPS and doesn’t need fault-tolerance
    • a database that has replication already built-in
  • Using this, we can have a very big disk with a lot of IOPS

two 500G EBS io1 volumes with 4,000 provisioned IOPS each, will create a

1,000GB RAID 0 array with an available bandwidth of 8,000 IOPS and 1,000 MB/s of throughput

RAID 1 - increase fault tolerance

RAID 1 is to mirror a volume to another, which means if one disk fails, then our logical volume is still working (since there is our mirroring one)

Use case:

  • application that needs to increase volume fault tolerance
  • application that needs service disks

two 500 GB EBS io1 volumes with 4,000 provisioned IOPS each will create

500 GB RAID 1 array with an available bandwidth of 4000 IOPS and 500 MB/s throughput

EFS (Elastic File System)

EFS is a managed NFS (network file system) that can be mounted on many EC2, EFS can work with EC2 instances in multi-AZ.

EFS is a High Available, Scalable, and expensive service

Use cases: content management, web serving, data sharing, WordPress application

Summary: EBS or EFS ?

EBSEFS
can be attached to only one instance at a timecan be mounted to hundreds of instances
locked at the AZ levelcan share website media files
migrating an EBS volume across AZ means first backing it up and re-create it using snapshot in another AZ 
EBS backups use IO and you should avoid it while the application is handling a lot of traffic 
Root EBS volumes of instances get terminated by default if the EC2 instance gets terminated (you can disable it) 
if disk IO is high -> increase EBS volume size 

references for comparing EBS with EFS: https://medium.com/awesome-cloud/aws-difference-between-efs-and-ebs-8c0d72a348ad

Snowball

Briefly speaking:

  1. Snowball is a piece of equipment, moving large amounts of data into the AWS Cloud. It supports:
  2. Import to S3
  3. Export to S3
  4. Storage Gateway is a service enabling you to securely store data to the AWS Cloud for scalable and cost-effective storage

Snowball Edge : Snowball Edge adds the computational capability to the device, it supports a custom EC2 AMI so you can perform processing on the go

Storage Gateway

Hybrid Cloud for Storage

part of the infrastructure is on the cloud, another part is on-premise

A bridge between on-premises data and cloud data in S3, typical use cases are disaster recovery, backup & restore, and tiered storage

There are three types of Storage Gateway:

  1. File Gateway
  2. Volume Gateway
    1. Stored Volumes
    2. Cached Volumes
  3. Tape Gateway Virtual Tape Library (VTL)

File Gateway

  • configured S3 buckets are accessible using the NFS and SMB protocol
  • supports S3 standard, S3 IA, S3 One Zone IA
  • bucket access using IAM roles for each File Gateway
  • most recently used data is cached in the file gateway
  • this can be mounted on many servers

Files are stored as objects in your S3 Buckets, accessed through a Network File System (NFS) mounting point

Volume Gateway

  1. The Volume Gateway presents your application with disk volumes using the iSCSI block protocol, backed by S3
  2. Asynchronously back up as point-in-time snapshots, the snapshots are stored in the cloud as Amazon EBS Snapshots
  3. Snapshots are incremental backups that capture only changed blocks, but compressed to minimized charges

=> Storing Virtual Hard Disk Drive in the Cloud

Let’s summarize the differences between Stored Volumes and Cached Volumes

For Stored Volumes:

  1. Entire Dataset stored on site
  2. Asynchronously scheduled backed up to S3

For Cached Volumes:

  1. Entire Dataset is stored on S3
  2. Most Frequently Accessed data are cached on-site (low latency access to most recently used data)

Stored Volume will store the files on S3 and provide local cached copes, while Cached Volumes will store the files locally and push them to S3 as a backup

Tape Gateway

  • Physical tapes for the backup process, for example
  • Virtual Tape Library (VTL) backed by S3 and Glacier
  • Back up data using existing tape-based processes (and iSCSI interface)
  • Works with leading backup software vendors

Storage Gateway Summary

Exam Tip:

if the question is asking "On-premise data to the cloud", we want Storage Gateway

  • File Access / NFS -> File Gateway, backed by S3
  • Volumes / Block Storage / iSCSI -> Volume Gateway, backed by S3 with EBS Snapshots
  • VTL Tape solution / Backup with iSCSI -> Tape Gateway, backed by S3 and Glacier
TypeDescription
File Gateway (NFS)Files are stored as objects in your S3 buckets, accessed through an NFS mount point.
Volume Gateway (iSCSI)Same using virtual directories via iSCSI block protocol. Files are stored in the cloud as Amazon EBS snapshots. Two types: (1) Stored volumes and (2) Cached volumes.
Type Gateway (VTL)It offers a durable, cost-effective solution to archive your data in the AWS Cloud (same mechanism as Volume Gateway).

Amazon FSx

Amazon FSx for Windows

EFS is a shared POSIX system for Linux systems, not suitable for Windows machine

Amazon FSx for Windows is a fully managed Windows file system share drive

  • support SMB protocol and Windows NTFS
  • Microsoft Active Directory integration, ACLs, user quotas
  • it is built on SSD, High IOPS, High Throughput
  • can be accessed from your on-premise infrastructure
  • can be configured to be Multi-AZ (High Availability)
  • Data is backed-up daily to S3

Amazon FSx for Lustre

Amazon FSx for Lustre is a type of parallel distributed file system, for large-scale computing

Machine Learning, High-Performance Computing (HPC), Video Processing, Financial Modeling, Electronic Design Automation

Seamless integration with S3:

  • it can read S3 as a file system (through FSx)
  • it can write the output of the computations back to S3 (through FSx)

It can be used from on-premise servers

Storage Comparison

Storage TypeNote
S3Object Storage
GlacierObject Archival
EFSNetwork File System for Linux instances
FSx for WindowsNetwork File System for Windows
FSx for LustreHigh-Performance Computing
EBS volumesNetwork storage for one EC2 instance at a time
Instance StoragePhysical storage for your EC2 instance (high IOPS)
Storage GatewayFile Gateway; Volume Gateway (cache & stored); Tape Gateway
Snowball / Snowmobilemove a large amount of data to the cloud, physically
Databasespecific workloads, usually with indexing and querying
This post is licensed under CC BY 4.0 by the author.