Cloud Run allows you to specify which revision should receive traffic. It can be the latest revision, and you can split the traffic by percentages between different revisions. It is possible to use tags for testing, traffic migration, and rollbacks.
To manage the traffic in a service, we need to navigate to Service and click REVISIONS. Once we’re in the Revisions section of the service, we can click the MANAGE TRAFFIC button:
Figure 7.18 – Overview of a service with multiple revisions
We will be presented with the Manage traffic window, where we can decide how network traffic flows:
Figure 7.19 – Overview of a service with multiple revisions
We can decide how to distribute traffic between different revisions in this window.
For example, we can direct 50% of it to the latest healthy revision and the remaining 50% to another revision:
Figure 7.20 – Network traffic split between two revisions
After a moment, the internal load balancer will distribute network traffic as desired:
Figure 7.21 – Network traffic split between two revisions in place
Similarly, we can roll back the changes or distribute traffic further across revisions of our application.
In the next section, we will focus on autoscaling and concurrent requests.
Before we start with the autoscaling concept in Cloud Run, we need to determine the limits of the Cloud Run instances. By default, Cloud Run services are configured to a maximum of 100 instances, and the default values for capacity are as follows:
- CPU: 1
- Memory: 512 MiB
- Request timeout: 300 seconds
- Maximum requests per container: 80
- Container instances: 30
At the time of writing, this book’s maximums apply to Cloud Run:
- CPU: 8
- Memory: 34 GiB
- Request timeout: 3,600 seconds
- Maximum requests per container: 1,000
- Maximum container instances (quota increase needed): 1,000
Cloud Run allows us to control the number of requests per instance precisely. Sometimes, you can lower the maximum concurrency to 1 if your code cannot process parallel requests; each request uses most of the available CPU and memory. Setting the maximum concurrency to 1 will likely negatively affect scaling performance due to the need to start many container instances before they can handle incoming requests.
To learn more about Cloud Run concurrency, go to https://cloud.google.com/run/docs/about-concurrency.
Cloud Run is a fantastic service, and we highly encourage you to try it out, explore its options, and have fun with it. To learn more about Cloud Run development tips, visit https://cloud.google.com/run/docs/tips/general.
The next section of this chapter will focus on another serverless product: Cloud Functions.
Cloud Functions, which falls under category of Function-as-a-Service (FaaS), is a serverless execution environment where we can run code without provisioning or managing any infrastructure. Cloud Functions is executed in a fully managed and serverless environment – you don’t need to provision infrastructure or manage servers. Functions are triggered when an event being watched occurs.