EBS Volume
Elastic Block Store (EBS) provides persistent block storage volumes for usage with Amazon EC2 instances in the AWS Cloud.
Each Amazon EBS volume is automatically replicated with its AZ to protect you from component failure, offering high availability and durability.
EBS is like a virtual hard disk drive in the cloud
- it uses the network to communicate the instance, which means there might be a bit of latency
- it can be detached from an EC2 instance and attached to another one quickly, as long as in the same AZ
- it can only attach to a single instance at one time
Type | Description | Use Cases | API Name | Volume Size | Max. IOPS / Volume |
---|---|---|---|---|---|
General Purpose (GP2 SSD) | General purpose that balances price and performance - can be used as boot volumes | Most work loads | gp2 | 1GB - 16TB | 16000 |
Provisioned IOPS (SSD) | Highest-performance SSD for mission-critical applications, IOPS (I/O Ops Per Sec) - can be used as boot volumes | Databases | io1 | 4GB - 16TB | 64000 |
Throughput Optimised Hard Disk Drive (ST1 HDD) | Low cost HDD for freq. accessed, throughput-intensive workloads | Big Data & Data warehouses | st1 | 500GB - 16TB | 500 |
Cold HDD (SC1 HDD) | Lowest cost HDD for less freq. accessed workloads | File Servers | sc1 | 500GB - 16TB | 250 |
EBS Magnetic | Previous generation HDD | Workloads where data is infreq. accessed | Standard | 1GB - 1TB | 40 - 200 |
EBS is locked in an AZ, which means if an EBS volume in us-east-1a
cannot be attached to us-east-1b
. To move a volume across, your first need to snapshot it
EBS volume default type is SSD, but come with a option of magnetic
Details of different EBS Volumes
gp2
GP2 is for General purpose
that balances price and performance, this type of volume can be used as boot volumes
it:
- recommended for most workloads
- system boot volumes
- virtual desktops
- low-latency interactive apps
- development and test environments
here are some stats of GP2
- 1 GB - 16 TB
- small gp2 volumes can burst IOPS to 3000
- max IOPS is 16,000
- 3 IOPS per GB, means at 5334GB we are at the max IOPS
io1
io1 is suitable for critical business applications that require sustained IOPS performance, or more than 16,000 IOPS per volume (gp2 limit), it is also suitable for large database workloads
stats of io1
- 4 GB - 16 TB
- IOPS is provisioned (PIOPS) - min 100 ~ max 64,000
- the maximum ratio of provisioned IOPS to requested volume size (in GB) is 50:1
sc1
- Throughput-oriented storage for large volumes of data that is infrequently accessed
- scenarios where the lowest storage cost is important
- cannot be a boot volume
Throughput refers to how much data can be transferred from one location to another in a given amount of time. It is used to measure the performance of hard drives and RAM, as well as Internet and network connections.
stats of sc1
- 500 GB - 16 TB
- max IOPS is 250
- Max throughput of 250 MB/s - can burst
EBS Volume Type Summary
- gp2: Genenral Purpose volumes (cheap)
- 3 IOPS / GB, minimum 100 IOPS, can burst to 3000 IOPS, with max 16,000 IOPS
- io1: Provisioned IOPS (expensive)
- min 100 IOPS, max 64,000 IOPS (Nitro) or 32,000 (other)
- st1: Throughput Optimized HDD
- 500 MB/s throughput
- sc1: Cold HDD, Infrequently accessed data
- 250 MB/s throughput
EBS Snapshots
You need to make sure that the AZ of EC2 Instance is the same as the AZ of EBS Volume, otherwise, this will cause a huge latency time.
EBS Snapshots are backed up to S3 incrementally
What will happen if we terminate the EC2 Instance?
- EBS Volume got removed automatically (Root Device)
- Other volumes (by default) continue to persist, and the status of the volume is
available
- We can create EBS and check “delete on termination”
How to migrate data from AZ1 to another different AZ (EC2 / EBS)? Step by step:
- Actions -> Create Snapshots
- Turn the Snapshot to an AMI
- Use the AMI to launch in another AZ
- Find the root device volume, root device volume will have the snapshotID, "Action" -> "Create Snapshot"
- Select the snapshot, "Actions" -> "Create Images" then we can use it and deploy to another AZ
- For "Virtualization Type", we have "Paravirtual" (PV) and Hardware Virtual Machine (HVM)
- Volumes exist on EBS, think of EBS as Virtual Hard Disk
- Snapshots are on S3, think of Snapshot as photo of disk
- Snapshots are the point in time copies of volumes
- EBS Snapshots are incremental, only the blocks that have changed since your last snapshot are moved to S3
- To create a snapshot for Amazon EBS Volumes that serve as root devices, you should stop the instance before taking the snapshot
- You can create AMIs from Volumes and Snapshots
- You can change EBS Volume sizes on the fly, including changing the size and storage type
- Volume will ALWAYS be in the same AZ as the EC2 Instance, BUT you can copy snapshots across AZ or Region
- EBS Backup will utilize IO so you should not enable it while handling a lot of traffic
- Recommend - detach the EBS volume to do the backup, but not a must
- EBS volumes restored by snapshots need to be pre-warmed (using
fio
ordd
command to read the entire volume) - snapshots can be automated using
"Amazon Data Lifecycle Manager"
EBS Encryption
- When you create an encrypted EBS volume, you get the following:
- Data are rest encrypted inside the volume
- All the data in light moving between the instance and the volume is encrypted
- All snapshots are encrypted
- All volumes created from snapshot are encrypted
- Encryption and decryption are handled transparently, which means you don’t need to do anything
- Encryption has a minimal impact on latency
- EBS Encryption leverages keys from KMS (AES-256)
- Copying an un-encrypted snapshot allows encryption
- Snapshots of encrypted volumes are encrypted
Encryption : encrypt an un-encrypted EBS Volume
- Create an EBS snapshot of the volume
- Encrypt the EBS snapshot (using copy)
- Create a new EBS volume from the snapshot (the volume will also be encrypted)
- Now you can attach the encrypted volume to the original instance
EBS vs Instance Store
Some instances do not come with Root EBS volumes. Instead, they come with "Instance Store"
, which is ephemeral storage. Instance Store is physically attached to the machine (EBS is a network drive)
There are Pros and Cons for using Instance Store
Pros
- Better I/O performance
- Good for buffer / cache / scratch data / temporary content
- Data survives reboots
Cons
- On stop or termination, the instance store is lost, (since ephemeral <-> temporary)
- You can’t resize the instance store
- Backups must be operated by the user
Local EC2 Instance Store is a physical disk attached to the physical server where your EC2 is
it has very high IOPS, but the size of it cannot be increased and the data will be lost if hardware fails to happen
EBS RAID Options
EBS is replicated within an AZ so it is already redundant storage. But if you want to increase the IOPS more or you want to mirror your EBS volumes, then you need to mount volumes in parallel in RAID settings. (RAID is possible as long as your OS supports it)
Normal RAID options:
- RAID 0
- RAID 1
- RAID 5 - not recommended for EBS
- RAID 6 - not recommended for EBS
RAID 0 - increasing performance
- Combining 2 or more volumes and getting the total disk space and I/O
- But one disk fails, then all the data is failed
- Use cases:
- application needs a lot of IOPS and doesn’t need fault-tolerance
- a database that has replication already built-in
- Using this, we can have a very big disk with a lot of IOPS
two 500G EBS io1 volumes with 4,000 provisioned IOPS each, will create a
1,000GB RAID 0 array with an available bandwidth of 8,000 IOPS and 1,000 MB/s of throughput
RAID 1 - increase fault tolerance
RAID 1 is to mirror a volume to another, which means if one disk fails, then our logical volume is still working (since there is our mirroring one)
Use case:
- application that needs to increase volume fault tolerance
- application that needs service disks
two 500 GB EBS io1 volumes with 4,000 provisioned IOPS each will create
500 GB RAID 1 array with an available bandwidth of 4000 IOPS and 500 MB/s throughput
EFS (Elastic File System)
EFS is a managed NFS (network file system) that can be mounted on many EC2, EFS can work with EC2 instances in multi-AZ.
EFS is a High Available, Scalable, and expensive service
Use cases: content management, web serving, data sharing, WordPress application
Summary: EBS or EFS ?
EBS | EFS |
---|---|
can be attached to only one instance at a time | can be mounted to hundreds of instances |
locked at the AZ level | can share website media files |
migrating an EBS volume across AZ means first backing it up and re-create it using snapshot in another AZ | |
EBS backups use IO and you should avoid it while the application is handling a lot of traffic | |
Root EBS volumes of instances get terminated by default if the EC2 instance gets terminated (you can disable it) | |
if disk IO is high -> increase EBS volume size |
references for comparing EBS with EFS: https://medium.com/awesome-cloud/aws-difference-between-efs-and-ebs-8c0d72a348ad
Snowball
Briefly speaking:
- Snowball is a piece of equipment, moving large amounts of data into the AWS Cloud. It supports:
- Import to S3
- Export to S3
- Storage Gateway is a service enabling you to securely store data to the AWS Cloud for scalable and cost-effective storage
Snowball Edge : Snowball Edge adds the computational capability to the device, it supports a custom EC2 AMI so you can perform processing on the go
Storage Gateway
Hybrid Cloud for Storage
part of the infrastructure is on the cloud, another part is on-premise
A bridge between on-premises data and cloud data in S3, typical use cases are disaster recovery, backup & restore, and tiered storage
There are three types of Storage Gateway:
- File Gateway
- Volume Gateway
- Stored Volumes
- Cached Volumes
- Tape Gateway Virtual Tape Library (VTL)
File Gateway
- configured S3 buckets are accessible using the
NFS
andSMB
protocol - supports S3 standard, S3 IA, S3 One Zone IA
- bucket access using IAM roles for each File Gateway
- most recently used data is cached in the file gateway
- this can be mounted on many servers
Files are stored as objects in your S3 Buckets, accessed through a Network File System (NFS) mounting point
Volume Gateway
- The Volume Gateway presents your application with disk volumes using the iSCSI block protocol, backed by S3
- Asynchronously back up as point-in-time snapshots, the snapshots are stored in the cloud as Amazon EBS Snapshots
- Snapshots are incremental backups that capture only changed blocks, but compressed to minimized charges
=> Storing Virtual Hard Disk Drive in the Cloud
Let’s summarize the differences between Stored Volumes and Cached Volumes
For Stored Volumes:
- Entire Dataset stored on site
- Asynchronously scheduled backed up to S3
For Cached Volumes:
- Entire Dataset is stored on S3
- Most Frequently Accessed data are cached on-site (low latency access to most recently used data)
Stored Volume will store the files on S3 and provide local cached copes, while Cached Volumes will store the files locally and push them to S3 as a backup
Tape Gateway
- Physical tapes for the backup process, for example
- Virtual Tape Library (VTL) backed by S3 and Glacier
- Back up data using existing tape-based processes (and iSCSI interface)
- Works with leading backup software vendors
Storage Gateway Summary
Exam Tip:
if the question is asking "On-premise data to the cloud"
, we want Storage Gateway
- File Access / NFS -> File Gateway, backed by S3
- Volumes / Block Storage / iSCSI -> Volume Gateway, backed by S3 with EBS Snapshots
- VTL Tape solution / Backup with iSCSI -> Tape Gateway, backed by S3 and Glacier
Type | Description |
---|---|
File Gateway (NFS) | Files are stored as objects in your S3 buckets, accessed through an NFS mount point. |
Volume Gateway (iSCSI) | Same using virtual directories via iSCSI block protocol. Files are stored in the cloud as Amazon EBS snapshots. Two types: (1) Stored volumes and (2) Cached volumes. |
Type Gateway (VTL) | It offers a durable, cost-effective solution to archive your data in the AWS Cloud (same mechanism as Volume Gateway). |
Amazon FSx
Amazon FSx for Windows
EFS is a shared POSIX system for Linux systems, not suitable for Windows machine
Amazon FSx for Windows is a fully managed Windows file system share drive
- support SMB protocol and Windows NTFS
- Microsoft Active Directory integration, ACLs, user quotas
- it is built on SSD, High IOPS, High Throughput
- can be accessed from your on-premise infrastructure
- can be configured to be Multi-AZ (High Availability)
- Data is backed-up daily to S3
Amazon FSx for Lustre
Amazon FSx for Lustre is a type of parallel distributed file system, for large-scale computing
Machine Learning, High-Performance Computing (HPC), Video Processing, Financial Modeling, Electronic Design Automation
Seamless integration with S3:
- it can read S3 as a file system (through FSx)
- it can write the output of the computations back to S3 (through FSx)
It can be used from on-premise servers
Storage Comparison
Storage Type | Note |
---|---|
S3 | Object Storage |
Glacier | Object Archival |
EFS | Network File System for Linux instances |
FSx for Windows | Network File System for Windows |
FSx for Lustre | High-Performance Computing |
EBS volumes | Network storage for one EC2 instance at a time |
Instance Storage | Physical storage for your EC2 instance (high IOPS) |
Storage Gateway | File Gateway; Volume Gateway (cache & stored); Tape Gateway |
Snowball / Snowmobile | move a large amount of data to the cloud, physically |
Database | specific workloads, usually with indexing and querying |