Cloud First

Abstract

UIS has adopted a "Cloud First" strategy which requires that those building, planning or deploying services within UIS must consider and fully evaluate potential cloud products first before considering any other option. This approach is mandatory for new services in UIS.

A cloud product has the following essential characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity or expansion, and measured service.

A cloud product can be hosted on UIS-owned hardware but is usually hosted by a third-party in return for a fee. This is termed the "public cloud" model and is the preferred model for UIS.

Introduction

The University of Cambridge has always prided itself at being at the forefront of best practice in multiple fields. The Technical Design Authority (TDA) see no reason why the provision and development of digital services should be any different.

Members of University Information Services (UIS) will undoubtedly have seen the term "cloud" having been bandied around internally. This document sets out the TDA's position on what UIS cloud strategy means for those engineering, procuring and co-ordinating new services.

This document provides guidance for UIS members on what the cloud is, what it is not, how they can help UIS become a Cloud First organisation and how to get started with cloud products.

The TDA has received a great deal of feedback surrounding the procurement, development, integration and co-ordination of cloud-hosted services and this document aims to address common questions and concerns.

Who is this document for?

This document will be of interest to the following people within UIS:

  • those involved in the procurement of new services,
  • those integrating, designing and developing new services, and
  • those co-ordinating service delivery.

Those procuring services from third-party vendors may be particularly interested in sections below indicating how they can help UIS enact a Cloud First strategy.

Those involved with the integration, design or development of new services may be particularly interested in sections discussing what the cloud is and what it is not along with sections on how to get started and how to help.

Those involved with the co-ordination of service delivery may be particularly interested in sections indicating how they can help and also on sections which discuss common questions or concerns about the use of cloud products.

What "the cloud" is

The term "Cloud" is very popular in technology circles but is often loosely defined. The US National Institute of Standards and Technology (NIST) provide five essential characteristics of a cloud product:

  • On-demand self-service. Those implementing a service in UIS can provision and make use of the cloud product without having to go through a third-party. Provisioning of the cloud product should be near instantaneous.
  • Broad network access. The cloud product is not limited to a particular set of networks and can be used irrespective of where the remainder of the service is hosted. This is related to zero trust architectures where services are designed so that the security of the service does not depend on trusting the networks it is deployed on.
  • Resource pooling. The cost of using the cloud product does not increase linearly with use as one may use the same product with multiple services. Put another way: the more you use it, the cheaper each use gets.
  • Rapid elasticity. If rising demand for a UIS service results in rising demand for a cloud product, the cloud product can rapidly be scaled to meet the demand. Conversely, when demand falls a cloud product can scale back down. This helps ensure that just the right amount of resource is being used at any one time.
  • Measured service. Related to elasticity, the use of the cloud product can be monitored in real time allowing the product to be automatically scaled as appropriate.

Let's take the example of one cloud product, the Relational Database Service from Amazon Web Services (AWS) and see how it matches these characteristics. The Relational Database Service allows those developing UIS services to request PostgreSQL, Oracle, SQL Server or MySQL database instances for storing and querying structured data.

  • A new instance may be created on-demand by a user with appropriate permissions. This can also be done by automated processes as part of an automated service deployment. The time from requesting an instance to it being ready to use is a few seconds. It is on-demand and self-service.
  • A database instance can be accessed from any network which can route packets to and from the Internet. If necessary access can be limited by IP address but the service does not rely on this restriction for its security. It has broad network access.
  • Instances may be "reserved" with a commitment to a certain level of usage. In return the per-instance price is reduced. This is an example of resource pooling.
  • A given database instance may have its memory, CPU or available storage increased or decreased at any time as demand changes. This can be done via an automated process or manually and it can take effect within a few minutes of the request being made. The product is elastic.
  • An automated process can query the current database load and automatically increase or decrease CPU as required. This allows for the service to weather unexpected peaks in demand. The product can be measured and, via elasticity, scaled.

For engineers

This section describes the cloud in terms likely to be of interest to those involved in development, design, integration or deployment of services in UIS.

The cloud provides production-ready services which one can connect together to build services from. Consider a simple database driven web applications. The following products can be used to implement the service:

  • A managed database engine will provide an endpoint and credentials which allows the web application to store and query structured data.
  • An object store will provide the ability for the web application to store large unstructured "blobs" of data such as user uploads.
  • A container-hosting service will host the packaged web application itself and automatically scale the number of instances on demand.
  • A secret store will provide locations where individual parts of the service can retrieve credentials required to talk to other parts.
  • An email sending service which can send emails to users, monitor bounces and alert if your application appears on popular blacklists.
  • A monitoring and alerting service will allow operations dashboards to be constructed and set alerting policies based on compute resource, storage space, elevated failure rates, etc.
  • A scheduled event service will trigger processes which happen on a regular schedule such as emailing reports to operators or performing asynchronous tasks.
  • A cloud orchestration tool will use cloud APIs to automatically provision all these resources and configure them correctly.
  • A "software forge" application will host source code for the cloud orchestration tool's configuration and any web application source.

Public clouds will usually have offerings for each of these products and cloud-hosted tools can be used for, e.g., the software forge. UIS provides a cloud-hosted GitLab instance which is used internally for many products.

For procurers

This section describes the cloud in terms likely to be of interest to those involved in the procurement, selection or evaluation of third-party products.

For a third party product there are generally two ways in which it may be considered "cloud-hosted": it can be hosted by the vendor with the University paying a license fee to use or the product may be hosted by UIS on cloud infrastructure UIS is responsible for.

Generally vendor-hosted solutions will involve less operational work from UIS but may have reduced customisability. In such cases the initial integration work may be higher but on-going maintenance will be reduced. For lightly used services UIS benefits from the hosting cost being amortised over the vendor's customers.

UIS-hosted solutions will involve more up-front and on-going work but may offer greater customisability or easier integration with other UIS services. UIS-hosted products may still make use of cloud products to provide the hosting but the cost of the hosting will need to be taken into account when evaluating the product.

For co-ordinators

Cloud-hosted development tools allows for a greater degree of asynchronous working by teams. Tooling is available from any team member's machine and as such it is no longer as important that teams be physically present to be effective at working together.

By selecting a small number of cloud products with well known interfaces, expertise can be re-used across UIS. This allows teams to progress rapidly in implementing solutions by re-using existing deployment and design work.

Where "the cloud" is

None of the characteristics of a cloud product require the product to be hosted outside of UIS. The characteristics of a cloud product relate to the way it works and not where it is.

Cloud products are often hosted outside of the organisations which use them as a consequence of the deployment model:

  • Private clouds are collections of cloud products which are for the exclusive use of a single organisation. They may be hosted on physical infrastructure under the control of the organisation or be hosted on infrastructure rented from a third-party. Similarly the cloud products may be managed operated by a single organisation or by some third-party.
  • Community clouds are essentially private clouds owned, managed and operated by a group of organisations instead of a single organisation.
  • Public clouds are cloud products which are open for use by the general public. The infrastructure is usually owned and operated by a third-party.
  • Hybrid clouds are where the set of cloud products is formed from a mixture of public and private cloud products. Often this is because a given cloud product is cheaper to run as a private cloud but the remainder can be taken from some public cloud.

UIS offers some private cloud products for virtual machine hosting and storage.

UIS also has an agreement with AWS to allow invoice-based billing for cloud products. As well as supporting Amazon as a cloud provider internally, UIS resells AWS to the rest of the University.

Important

UIS will also be seeking similar agreements with other prominent public cloud providers such as Google and Microsoft.

Unless a service has particularly esoteric requirements it is usually cheaper and more robust to host it using public cloud products. Using public cloud products has the following advantages:

  • Scale to zero costing. There is no fixed overhead associated with having access to a public cloud; if one uses none of the products it doesn't cost anything. A private cloud has significant fixed overheads even if none of the products are used.
  • Commonality. This is also known as the "Stack Overflow effect". A public cloud will have many thousands of users and so there will be a great deal of support, examples and documentation available in the public domain.
  • Certification and compliance. Public clouds have as their core business model a trust relationship with their customers. As such they will usually have very good certification and compliance guarantees, especially with regard to personal data and sensitive information.
  • Testing. With many thousands of users, a cloud product from a public cloud will have had a great deal of real-world testing as part of production services. It is hard to test products from private clouds with a small handful of users.
  • Tooling. A cloud product will be on-demand, elastic and measurable and so it not surprising that tooling exists to manage cloud products. Terraform, for example, is a tool which allows managing multiple cloud products and aids with "wiring them together" into services. Tooling will often support public clouds as a first class citizen by virtue of them having many thousands of users.
  • Resourcing. The cost of utilising any one cloud product will rarely come close to the cost of one Full-time Equivalent (FTE) position. As such even if a private cloud product requires very little ongoing maintenance, the staff time along with any initial setup can easily outweigh the lifetime cost of a public cloud resource.

For these reasons, new services from UIS should prefer public cloud products to self-hosted private cloud products.

What "the cloud" is not

Because it is so often poorly defined, many misconceptions have arisen about "the cloud".

Only for impersonal data

The Data Protection Act 2018 (DPA) incorporated into UK law the European Union General Data Protection Regulation (GDPR). It requires special treatment for "personal data" which, loosely, is data which can be used to directly or indirectly identity a living individual.

Warning

This section elides a great deal of detail surrounding data protection legislation. Specific queries around processing of personal data can be sent to the University Information Compliance Office.

There is a perception that cloud products cannot be used by the University to store personal data. This perception often stems from a confusion between two terms: data controller and data processor.

The data controller is the entity which collects the original data and is responsible in law for managing the personal data appropriately, keeping the subject of any personal data informed where necessary and responding to any requests from the subject. For UIS, this is almost always The University itself.

A data processor is any entity which processes the data collected by the data controller. The data controller must be able to demonstrate that all data processors comply with the provisions of the DPA. Any large public cloud will have a dedicated set of compliance resources which data controllers may use to demonstrate compliance.

Tip

There are dedicated Google, Microsoft and AWS compliance websites which let you evaluate each cloud's compliance with DPA requirements.

UIS services are usually able to process the personal data of University members as it will be in the legitimate interest of the University. If this is unclear for a given service, the Information Compliance Office may be able to help determine if the collection of personal data is legitimate.

Cloud products become unsuitable for storing personal data when the data controller is not the University. This is why, for example, not all Google applications are enabled for University members by default; UIS can only enable applications by default if the data controller is the University. Similarly, care should be taken when adding personal data to online productivity tools that the data controller remains the University.

Insecure

Security concerns around the cloud can broadly be split into three categories:

When they have been presented to the TDA, concerns around confidentiality often stem from confusion as to whether sensitive data is appropriate to pass to a cloud product. In this case "sensitive" includes personal data, medical data and data which could adversely affect the business of the University if made public.

UIS provide a data classification scale which considers data confidentiality in the context of how restricted access must be. Public cloud providers have strong guarantees surrounding when and why their administrators may see data that you pass through cloud products. With that in mind, it is appropriate to pass data classified as level 2 and below through a cloud product. If level 3 data is required by a UIS service, seek guidance from the Information Compliance Office to check whether a given cloud product's compliance documentation allows its use.

The same security principles which are used to protect non-cloud based services from unauthorised access and denial of service should be used to protect cloud-based ones. This document will not attempt to exhaustively list each and every item of security best practice but the following transfer directly to the cloud:

  • Use of dedicated software such as 1Password, LastPass or KeePass to control access to credentials.
  • When credentials are generated by UIS, ensure that they are unique and high-entropy.
  • Sandbox the resources which implement a service so that a denial-of-service attack will not also disrupt unrelated services.
  • Use the principle of least privilege; ensure that any credentials used to implement the service confer the minimum capability required.

A public cloud has a degree of inbuilt advantage when it comes to security. Since the business of the cloud provider is to protect their customer's data and service from another malicious customer, public clouds are designed from the ground up with partitioning and sandboxing in mind; the products offered by a public cloud are designed to make doing the "right" thing easy and the "wrong" thing hard.

It is possible to deploy a cloud-hosted service in an insecure manner just as it is possible to deploy a non cloud-hosted service in an insecure manner. Deploying to the cloud will not magically make a service more secure but there will be a wealth of public domain documentation providing examples of best practice.

Expensive

A common feature of many ad hoc costing calculations is to undervalue or ignore the cost of staff members' time. In UIS the provision of virtual machines is often perceived as "free" neglecting the time and expertise required to operate and upgrade virtual machine hosting infrastructure or the up-front purchase cost of the infrastructure.

Recall that cloud products are also elastic; they can scale in response to demand. Coupled with scale to zero, the cost of a cloud product can be made very small for services which see "peaky" use.

Many UIS services will be loaded more during work hours and less during the night. Some UIS service may only be loaded at particular times of year.

Example

In 2020, the summer pool process was supported in part by a web application hosted by the Cloud Run product. Outside of the pool month, daily Cloud Run hosting costs never averaged above £3/month. Elasticity allowed the product to automatically scale with demand so that during the pool period hosting averaged £25/month before falling back down.

No administrator intervention was required for this auto scaling to happen.

In addition to the cost savings which come with elasticity, public cloud providers have "committed use" discounts which can provide significant savings if your service commits to a minimum usage level over a period of time.

Rented Virtual Machines

When the cloud was first taking off, the first sets of products revolved around mirroring virtual machine (VM) setups being used in industry. One could request a machine, configure the CPU, memory, etc, attach a disk and treat it like a normal VM. This class of cloud product is termed Infrastructure as a Service (IaaS).

The perception that the cloud is just "other people's VMs" does not take into account the rise of Platform as a Service (PaaS) products. These products abstract the management of virtual machines away and present an interface where one requests components of a service directly. For example, a managed database service will offer the ability to request a certain database engine, CPU, memory and storage capability but will not require that any VMs are managed directly. The cloud provider provisions, upgrades and patches the VMs and provides the database as a fully managed service.

Cloud products have evolved so far beyond "VMs as as service" that now there are cloud products which require that one brings one's own VMs. The Anthos service from Google is one example; this service allows you to bring your own VM hosting infrastructure and have Google create and manage VMs for you in order to offer cloud products such as managed databases or container hosting which are physically hosted on UIS infrastructure.

Slow

When the TDA has heard concerns over "slowness" of Cloud Services, they generally fall into one of three categories:

  • high latency,
  • low bandwidth, and
  • slow processing.

Concerns about latency (time from making a request to it being served) and bandwidth (how much data can be sent to a cloud product per unit time) usually arise when hybrid cloud models are in use.

Consider an on-premises web-server which uses a cloud product as a backing file store; each request will involve both the latency of communicating with the web server and, subsequently, the latency of communication with the backing file store. If the latency between the web-server and file store is large, it will negatively impact all requests.

Bandwidth concerns are less prevalent when considering on-premises to cloud communications but may be a concern when a given solution involves public cloud to public cloud data transfer.

In the vast majority of examples seen by the TDA, it has been discovered that the hybrid deployment latency has been dwarfed by other latencies. Good Engineering practice discourages premature optimisation. Usually the parts of a system one expects to be slow are not actually where the slowness originates.

The TDA recommends that services which do exhibit significant latency due to cloud to on-premises latency should be re-engineered to move on-premises services to the cloud and not to move cloud-hosted services on-premises.

A defining characteristic of cloud products is that they are elastic. As such, slow processing can usually be mitigated by configuring the product to use more resource. While the "pay more for it to go faster" model will clearly have some breaking point, it is a reasonable solution for the short to medium term.

Complicated

The TDA has seen concerns about complexity primarily from those involved in the implementation of services. In these cases the concerns have turned out to have unfamiliarity as a root cause. This is a valid concern as those involved in implementing a service need to be able to reason about it in a technical manner and it is hard to do this when the underlying technology is unfamiliar.

Concerns around complexity also arise due to the fact that cloud technologies appear to revel in opaque or obscure naming. It is not immediately obvious, for example, that "fargate" would refer to a cloud product which allows you to host web-servers.

This is an industry-wide challenge. For example, The Google Cloud Developer's Cheat Sheet is a list of all the Google Cloud products described in fewer than five words. The fact that it is available as a desktop background suggests that few developers have the full list in their head.

Obscurity and unfamiliarity can be combatted with education, training, practice and experimentation. In particular those co-ordinating teams implementing new services should ensure that the team be given the time, space and resource required to experiment with cloud products.

Outside the CUDN

Colloquially "inside the CUDN" has a number of meanings but in the context of UIS services it usually refers to services being hosted on a network which is not directly accessible on the public Internet. A service may require that its clients also be on the private network or the service may have a presence on the private network so that it can in turn talk to other services "inside the CUDN".

None of the above requires that the service be hosted on machines connected to the University Data Network (UDN) which is a physical network of high-bandwidth links which connect the majority of University institutions and departments.

If you are developing a service which has to talk to services which are only available on a private network, consider the following:

  • Does the service being talked to need to be on a private network or is it security theatre?
  • Will service-to-service communication always be initiated by your service? In which case a Network Address Translation (NAT) based Virtual Private Network (VPN) may be appropriate.
  • If necessary, public cloud providers usually provide bridging products which will allow a private cloud-side network to be configured like a private "CUDN network" along with "CUDN-private" IP ranges.

Where "inside the CUDN" means "physically present on University-owned land", the requirement usually arises from specific needs surrounding the ability to restrict physical access to hosting infrastructure to a set of people. UIS provides guidance surrounding data security classifications and it is possible that only physical control over hosting infrastructure will satisfy the requirements of Level 3 data. Physical control over hosting does not imply compliance with Level 3 requirements. Contractual relationships with a public cloud vendor and related compliance certification may suffice when protecting Level 3 data. The Information Compliance Office should be able to help with specific queries about Level 3 data.

Bad for the Environment

The TDA has seen concerns that cloud-hosted services have a detrimental effect on the environment. The concerns generally arise from a belief that smaller on-premises servers are more energy efficient than larger data centre based servers or that large data centres require disproportional energy usage for cooling, support infrastructure, etc.

Public cloud vendors see environmental responsibility and sustainability as key differentiators. As of writing:

  • The Azure public cloud, as part of Microsoft, has been carbon neutral since 2012. Microsoft has pledged to be carbon negative by 2030 and to have retroactively "removed" the carbon it has emitted since its foundation by 2050.
  • Amazon Web Services (AWS) has exceeded 50% renewable energy usage and has committed to move to 100% renewable energy over time.
  • Google Cloud platform uses 100% renewable energy and maintains on-going carbon neutrality.

A recent Wired article explored in depth what each public cloud provider is doing to reduce its environmental impact. In terms of "overall greenness", Google Cloud was better than Azure which was in turn better than AWS. Rankings were different for other metrics and so UIS members interested in using environmental impact when choosing public clouds are encouraged to read the entire article.

What can I do to help?

If you want to help UIS become a Cloud First organisation, the TDA recommends the following ways in which you can help.

Procurers

If you are involved with the costing or procurement of a service from a third-party vendor, this section provides some guidance on how you may evaluate products for use in a Cloud First organisation.

Requests for information (RFI) should ask the following questions:

  • Is a vendor-hosted solution available?
  • If no vendor-hosted solution is offered, are there vendor-recommended third party hosting providers?
  • If the solution must be hosted by UIS:
  • Are there examples of other customers who have deployed the product into the cloud?
  • Is the product available packaged in a cloud native format such as a docker image or helm chart?
  • Can all the product state be stored in vendor-independent services such as a managed SQL databases or an object store?
  • Are there technical restrictions on how far the product can be scaled?
  • Does any part of the product impose large CPU, memory, storage or bandwidth requirements? If so, what are they?

Prefer vendors which provide vendor-hosted solutions or which are packaged in a way which is compatible with cloud hosting products. Even if a vendor does not offer a vendor-hosted product, the "self-hosted" option should still be evaluated assuming it is to be hosted on a public cloud.

If a vendor-hosted solution is not possible, make sure that requests for proposals require clear indications of the CPU, memory, disk and bandwidth requirements of a self-hosted solution. Hosting costs should be estimated using public cloud vendor's costing tools.

Ensure that the project has a technical lead and that they are present during vendor presentations. The technical lead should confirm that an Request for Proposals (RFP) contains questions suitable for estimating hosting costs if UIS are to be responsible for hosting the product.

For self-hosted products, get the technical lead to provide an estimate of staff resource required to deploy the product, to integrate it with existing systems and to provide on-going maintenance. These costs should be listed separately.

The TDA can provide informal guidance early on in vendor selection or as part of the RFI or RFP process.

Engineers

If you are involved in the technical design, implementation, integration or operation of a service inside UIS, this section provides some guidance on how you may implement services in a Cloud First organisation.

Get some first-hand experience developing for and/or deploying to the Cloud. The TDA has provided a technical getting started guide.

Ensure that those co-ordinating the service development know that you require time to familiarise yourself with any cloud products you may be using.

Take the time to understand what a cloud native architecture is and how you might make use of it for the next service you develop.

Make sure that your deployment is automated and configures cloud resources on demand. A key performance metric is how long it takes the service management team to stand up a parallel instance of the service from zero.

The TDA can provide technical contacts if you have specific questions you'd like to ask or would like someone to rubber duck your service design to.

Co-ordinators

If you are involved with co-ordinating others to deliver services, this section describes how you can help UIS become a Cloud First organisation.

As a relatively new set of technology, not everyone will be familiar with designing services which make use of cloud products. The project's technical lead should be able to advise on cloud products and technologies which are required for a service. Make sure to build in time for those involved with implementing the service to gain familiarity with new ones.

Ensure that there is a budget to pay for any cloud hosting costs. The technical lead should be able to provide a rough indication of cloud resource required which may be costed using a public cloud's costing calculator.

As part of the initial service discovery, data owners for the service should have been determined. Make sure that a data security classification has been performed. If Level 3 data is to be handled by the service, make sure that the Information Compliance Office are consulted where necessary.

The TDA can help by providing contacts who have co-ordinated the delivery of other cloud-hosted services.

How to get started

The TDA has written a dedicated getting started guide for using cloud products in UIS. It is aimed at an engineering audience.

Important

The TDA is interested in case studies from those who have co-ordinated or procured services which make use of cloud products so that they can produce guidance aimed at non-engineer.

This section recommends some specific tooling which can be used when making use of cloud products within UIS. This list is not intended to be exhaustive.

  • UIS provides the Developers' Hub which provides project management, agile delivery and cloud native development tooling. This tool is useful for those implementing services and those co-ordinating service delivery.
  • Docker is the de facto packaging format used with cloud-hosting products. Play with docker provides short-lived VMs for experimenting with docker containers. This tool is useful for those experimenting with third-party solutions packaged for cloud-hosting.
  • Terraform is a tool which can be used to connect different cloud products together to form an entire service. It supports all major public clouds. This tool is useful for those implementing or integrating services.

External resources