Google Cloud Platform has a vast array of services which developers can leverage to serve their needs and help them deliver the best products in the industry to their customers, while it is very convenient to have services which perfectly fit users' needs without much hassle, it also becomes confusing to understand the purpose of each service and their best use cases. It might seem like knowing these services is a bigger mess than the one they have been designed to solve. But in reality that is not the case, to make things easier for you, I have summarised all of the storage offerings from Google Cloud to help you choose better.
There are 3 types of non volatile storage types, namely Block Storage, Object Storage and File Storage. So we shall go through the services for each one of these types.
Google Cloud Persistant Disks
This is a type of Block Storage service. This is like a traditional storage service, hard disks or solid state drives on our laptops are also block storage devices. Persistent Disks offers various types of drives which can be connected with your Compute Engine instances. When you initialise an instance, it comes with a persistent disk connected with it which stores the OS which the instance is running. You can also attach multiple such persistent disks to our instances to increase the types of storage and amount of storage an instance has access to. These disks are not limited to compute engine's instances, they also serve containers launched and managed by Google Kubernetes Engine.
While it may seem like your data is being stored on a single persistent disk, in reality it is spread across multiple physical disks. In addition to this, the data stored is encrypted by default which increases the security aspect of these disks. As the name suggests, persistent disks are persistent, if an instance or a container do shut down, the data on a persistent disk is not lost, in fact it can be connected to another instance and can be accessed which makes them flexible to use (exception below). Speaking of flexibility, you can even change the amount of storage available on a PD on the fly, which means that you do not need to shut down an instance or disconnect a PD from an instance to make changes to the PD.
To save your PD's data for the future or to transfer it to another region, you can take snapshots of the disk, which can be launched later to recover the data.
There are various types of persistent disk offerings by GCP, which are as follows:
- Zonal Persistent Disk : Provide data replication within a single zone in a region.
- Regional Persistent Disk : Provide data replication within two zones in a region.
- Local SSD : These are physically attached to your VM and are suitable for high throughput applications but the data is persisted only till the instance is running, after which it is auto deleted.
Google Cloud Storage
This is a type of Object Storage service. Cloud Storage stores data in forms of objects inside buckets. This is perfect to use with web applications and websites since data can be accessed, stored, modified and deleted using simple REST based APIs. It is also very useful for archival and disaster recovery purposes since data's durability is very high. Data is stored inside buckets, these buckets can be accessed by multiple services at once. The data stored is world wide, i.e. it can be accessed from anywhere in the world and it is not limited by volume.
One major advantage is that you can import/export table from and to BigQuery as well as CloudSQL. You can even store app engine logs, instance startup scripts, VM images and objects used by Compute Engine apps.
Whenever data is stored, it is not stored at the same physical location and a set of bytes or a URL is returned through which you can access your data. These objects are immutable, so whenever there is a change in any object, it is stored as a new version of the object. By default the data is encrypted at rest and data in transit is encrypted as well and hence transmitted over HTTPS. Moreover, the access permissions to a bucket and even objects inside the bucket can be controlled at a fine grain level. IAM roles and ACLs easily define who or which service has access to a bucket and to which particular object. ACLs control access on a much finer level than IAM roles.
Buckets are belong to a particular project, which means they inherit their permissions from the project they belong to. While a bucket is accessible worldwide, it physically belongs to a region. It can be replicated across regions.
Buckets offer a variety of storage classes:
- Regional : storage lets you store data in a particular region, it is cheaper but less redundant. It is used to store data close to their VM, Compute Engine or Kubernetes Engine.
- Multi-Regional : is geo redundant, a broad geological storage location is chosen and GCP stores data in 2 locations separated by at least 160 kms in the chosen space.
- Nearline : is a regional storage class where data is modified once a month or less.
- Coldline : storage is a highly durable storage class which is good for archiving data and backing it up and if you plan to access the data utmost once per year.
Regional and Multi regional are high performance object storage, the other two are backup and archival storage classes, they all can be accessed using cloud storage API. Both Nearline and Coldline storage classes charge fee for accessing data on a per gigabyte basis.
There are lifecycle management rules which help you in moving your data between different storage classes depending on various factors.
Google Cloud Filestore
This is a type of File Storage service. Filestore instances can be connected with VMs, Kubernetes containers and even to on-premise infrastructure. If you are familiar with NAS(Network Attached Storage) then you know what exactly Filestore is. Although, it is a type of block storage, what differs it from a general block store is that the data which is stored on filestore is stored on a disk connected over the network, which means that multiple services can access the same data simultaneously. It also offers a filesystem interface for applications which might need one.
One of the added advantage is also that multiple clusters or VMs can be connected to the same filestore instance. These instances are very flexible and can be modified on the fly and you can also take snapshots to deploy them in another region. Data is encrypted at rest and in transit. Filestore backups can provide point in time recovery of the state of the drives.
There are multiple service tiers which filestore serves its users with, they are as follows:
- Basic tiers : File sharing, software development, web hosting, basic AI.
- Enterprise tier : Mission-critical workloads requiring high availability.
- High Scale tier : HTC, batch compute, EDA, media rendering and transcoding, advanced AI, large data sets.
The instances are also grouped into their types:
- Zonal instances : Basic and High Scale tier instances are zonal instances which provide data redundancy within a single zone.
- Regional instances : Enterprise tier instances are regional instances which are costlier but provide redundancy across 2 zones.
While it might seem that there are huge upsides of using Filestore, you should keep in mind that they do not provide disk partitioning which may create cobbling of data while it is being accessed simultaneously and access issues for services, moreover, this can also cause throughput issues since the bandwidth also has certain limits.
Dividing these 3 storage services in the type of storage they provide has made it simpler for me to understand their pros and cons. I hope it has done the same for you.