Object storage for Cloud Foundry apps

One of the key mandates of Cloud Native Apps is to be stateless. This allows them to be ephemeral and to scale out massively. In order to do so they must store their data outside somewhere, but where? The truth is that Cloud Native Apps love “Object storage”.


Object storage is in way similar to File storage, but unlike File it doesn't have a directory tree. Data is organized in "buckets" which are a single level deep. These "buckets" must have a unique name. "Objects" are files that get stored along with a unique name within the bucket. Apps use this unique identifier retrieve the object. You could say that it is a key-value store where the value portion is a file.

Two factors allow object storage to grow orders of magnitude larger than file storage:
  • The simple (almost absence of) data structure
  • Metadata is stored along with the object
These mean you don't need to preallocate a table of pointers to files and directories as well their ACLs. I bet you have heard the terms FAT32 and NTFS many times. Objects and buckets typically leverage a very simple set of permissions like the one in this picture.


Object storage is very easy to consume for applications. They can use an API like S3 or Swift or in most cases by a simple URL when the app just need to retrieve an object. The image file selected in the previous screenshot could be called back from a HTML page as follows:

<img src="https://a_ppiper_blog.s3.amazonaws.com/3Dprinter-rods.jpg">

AWS S3 is the prime example of object storage. It is potentially the largest storage entity in the world and constitutes the biggest source of revenue for AWS. Its no surprise then that its API (aka S3) is used by other object storage vendors.

During the Pied Piper program we got participants to play also with ECS (Elastic Cloud Storage). This a fast growth storage product for DellEMC. It enables an S3 compatible object storage in the customer's own datacenter, with all the security and governance considerations that come along with that. Availability is another consideration especially after the 5 hour outage AWS S3 experienced in February 2017.

ECS offers some additional features:
  • Namespaces. With multitenancy in mind, ECS implements "Namespaces" which are a sort of a tenant. This essentially turns it into a 2-level deep object storage. Bucket names must be unique only within a namespace.
  • Strong consistency. Most object stores provide only "eventual consistency", which means that two users accessing the same object in different locations might get different versions of the object if the object has been recently modified. In contrast, ECS implements "strong global consistency" by which all locations know that the object has changed even though they might not have the latest version yet. In consequence they can point the app to get the latest version from a location where it is available.
  • It uses "erasure coding" to protect object data across dispersed geographical locations instead of storing a full copy per location. This translates into $/GB figures that are below most cloud providers
  • Durability (11 x 9’s) and SEC 17-A4 compliant
Of course during the program the main focus was on learning how to use all these technologies. All our coding was done in Python. We leveraged AWS Boto library. The following repo contains the "Boto-S3.py" script which is a concise cookbook with the most common operations on S3 compatible object stores

https://github.com/cermegno/Boto-S3

Once we learned the basics we went on doing a cool exercise of creating a photo-album. If you want to practice you can access the repo here:

https://github.com/cermegno/Photo-album

We used Cloud foundry in Pivotal Web Services for most of our exercises. This cloud offering has a marketplace where one can add services to apps. However object storage is not a current offering. In order to stay "Cloud Native" we should pass the S3 or ECS credentials using the "user provided environment variable" capability of Cloud Foundry

There are no differences in the Boto code when dealing with ECS. However there is a small difference in how objects are called back from a URL. The difference stems from the implementation of the "namespace" we mentioned earlier.

This is the AWS S3 method

http://bucket.s3.amazonaws.com/key

ECS offers two possibilities

http://namespace.public.myecs.com/bucket/key
http://bucket.namespace.public.myecs.com/key

Comments

Popular posts from this blog

Sending PowerStore alerts via SNMP

Sending PowerStore logs to Syslog

Electronic Nose - eNose