Cloud Foundry High Availability features

Cloud Foundry has several High Availability features which are essential on an Enterprise application platform. We can classify them into two categories depending on at what level they are implemented.


  • Elastic Runtime features. These are features that are implemented at the application instance level
    • Availability Zones. Administrators can define multiple Availability Zones which typically correspond to chunks of the infrastructure that are independent from each other, such as different physical servers sitting on a separate racks with different power sources, etc. When you define an application with multiple instances, Cloud Foundry automatically distributes the app across the available Availability Zones. This ensures that if an AZ goes down either accidentally or during maintenance the app can keep running
    • App instance fails. Diego sends "heartbeat" messages to the "Health Manager" component of the platform. If the "actual state" is different from "desired state" it instructs the "Cloud Controller" to restart the instance


  • BOSH Features. BOSH is a major differentiator of Cloud Foundry. It has the ability to interface downwards with the IaaS atop which the platform sits. This ability to "abstract" the underlying IaaS is what makes Cloud Foundry a multi-cloud platform. There are multiple choices of (VMware, Openstack, ...). BOSH also has the notion of what the "Desired State" is. To clarify this is the desired state of the VM's that conform the platform
    • Elastic Runtime process goes down. BOSH is aware of the many process that Elastic Runtime needs to do its job. Some examples are the processes I mentioned above, such as "Health Manager", "Cloud Controller" or "Diego". If any of these process dies it will attempt to restart it and it will notify the Ops team
    • VM goes down. VMs have a BOSH agent running on them. If the agent or the VM goes down BOSH restarts the VM, through a conveniently named process called "the resurrector" ... you gotta love CF terminology !!
In the video below I demonstrate how Elastic Runtime restarts a failed application instance. The demonstration is a bit convoluted as it is not easy to crash an app instance. The way I simulate it is by killing the main process in the instance. For example a Python app requires a "Procfile" file which defines what needs to be run as soon as the instance is app. For example a "Procfile" could read:


web: python app.py

This means that as soon as the container is up, the instance will attempt to run "app.py". From that point on the health of "app.py" is monitored as there is no point in keeping the instance app if that process is not running. In the video I leverage the ability to SSH into an instance to run the "kill" command and show how the instance gets restarted immediately.


Comments

Popular posts from this blog

Sending PowerStore alerts via SNMP

Sending PowerStore logs to Syslog

Electronic Nose - eNose