We probably should break the upgrades into 3 parts, Kubernetes version upgrade, node image version upgrade (security patching) and Azure-associated service upgrade.
Kubernetes Version Upgrade
AKS Supported Versions
Kubernetes uses the standard Semantic Versioning scheme, which means it follows the format of
[major].[minor].[patch]
version 1.17.9
- 1 would be the major version (Gets changed only when incompatible API changes or backward compatibility is broken)
- 17 would be the minor version (Gets releases 9–12 months after 19)
- 9 being the patch version (Gets releases the fastest as weekly)
AKS supports the latest general available (GA) minor version (N) and 2 minor versions prior to that (N-2). For example, version 1.17.x is the latest GA minor version, AKS would support
- 1.17.x
- 1.16.x
- 1.15.x
Within each minor version, AKS also supports 2 stable patch versions.
- 1.17.a
- 1.17.b
- 1.16.c
- 1.16.d
- 1.15.e
- 1.15.f
In short, [major].[N-2].[P-2] to [major].[N].[P] would be supported by Azure platform. We could use
az aks get-versions -l <location> -o table
to get the available version in each region. For more about AKS supported versions and availability, please check here.
Get Available AKS Upgrade Versions
Now, let’s take a look at how we could be sure of the running AKS K8s version.
az aks show -g <resource group name> -n <AKS cluster name>| grep -E "orchestratorVersion|kubernetesVersion"
** orchestrator version means the master nodes’ version (control plane version)
** kubernetes version means the worker nodes’ version
kubectl get nodes -o wide
** This command shows only worker nodes’ version but it shows all nodes in all node pools
Check available upgrades with
az aks get-upgrades --resource-group <RG name> --name <Cluster name> --output table
** Minor version upgrade needs to be performed sequentially. For example, 1.18.x to 1.20.x is not allowed. It needs to be 1.18.x to 1.19.x to 1.20.x.
AKS Cluster Upgrade
Before really getting into the cluster upgrade, we could set node surge percentage. With higher value set, the nodes could be upgraded faster but it also means the service hosted by the cluster gets higher possibility of service interruption. For more details, please check this section. If administrators would like to use automatic upgrade instead of manual, this could also be done by setting auto-upgrade channel, please follow this section.
3 important prerequisite checks before heading toward the operation.
** Please make sure there are sufficient private IP addresses within the AKS associated subnet if the cluster is running with Azure CNI. The rule of thumb is to have
- The amount of nodes across node pools — N
- The max amount of Pods on each node — P
- The private IP addresses required N + (N+1)*P.
For example, having the total nodes of 30 across 3 node pools, and each node could host at most 30 Pods, the total private IP addresses required would be
30 + (31*30) = 961
If we use CIDR to represent this number, it would be x.x.x.x/22. You could use any online tool to find this information.
** Please make sure there is no Pod Disruption Budget (PDB) that would be potentially blocking AKS cluster upgrade process. Essentially, “maxUnavailable” and “minAvailable” would need to be set according to your workload. For example, if you have only 1 Pod hosting the application, the option is either not set a PDB or set “maxUnavailable” at “1” and plan the application down time with the application team. For more information, please check this section of the official documentation.
** Administrators could decide whether to only upgrade control plane (master nodes) or both control plane and worker nodes.
The command to only upgrade AKS control plane
az aks upgrade \
--resource-group <RG name> \
--name <AKS cluster name> \
--kubernetes-version <target K8s version> \
--control-plane-only
Finally, we could upgrade AKS cluster with
az aks upgrade \
--resource-group <RG name> \
--name <AKS cluster name> \
--kubernetes-version <target K8s version>
After executing the command, head to AKS-associated VMSS resource group, it usually is in the format of “MC_<RG name>_<AKS cluster name>_<region>”. You could see one additional node being created and running in the latest node image version (node image version is paired with the selected K8s version), assuming you are going by default, meaning not having node surge percentage set to a different value. What the platform is doing right now is by draining one of the nodes’ workload to this newly created node and make the drained node to be available for upgrading to the target K8s version.
If we use K8s CLI to check the node version, we would see the first node is successfully running in the target K8s version, 5 more node upgrades to go. At the end of the process, the newly created node would be deleted as it is no longer needed.
Node Image Version Upgrade
Security Concerns
As we all know that containers are essentially running as a process within the hosts, keeping the host a secure environment is crucial. Therefore, OS security patches are performed to AKS Linux nodes on a nightly basis. However, this is not the case for AKS Windows nodes. That said, some security patches require hosts to reboot before those could be applied and this is the responsibility of AKS administrators. Basically, there are 2 options to perform this operation, one is manual reboot each node and the other is by deploying Kured as daemonsets (a dedicated Pod on each node). For more information, please check the official documentation.
To get more information on how to deploy Kured in AKS, please check this official documentation.
Check AKS Node Image Version
az aks nodepool show \
--resource-group <RG name> \
--cluster-name <AKS cluster name> \
--name <target node pool name> \
--query nodeImageVersion
Get AKS Node Image Available Upgrades
az aks nodepool get-upgrades \
--resource-group <RG name> \
--cluster-name <AKS cluster name> \
--nodepool-name <target node pool name>
Upgrade AKS Node Image Version to the Latest
az aks nodepool upgrade \
--resource-group <RG name> \
--cluster-name <AKS cluster name> \
--name <target node pool name> \
--node-image-only
After executing the command, head to AKS-associated VMSS resource group, it usually is in the format of “MC_<RG name>_<AKS cluster name>_<region>”. You could see nodes are being upgraded one by one.
After a while, click on the “Refresh” button on the top, and if the “Provisioning state” is “Succeeded” and the “Latest model” shows “Yes”, we could confirm each node is using the latest node image version in the current running version on AKS.
Azure-Associated Service Upgrade
Uptime SLA
AKS core components, mainly services provided in the control plane, could be upgraded to higher SLA. From this official documentation, we could know that AKS could get 99.9% update without Availability Zones and 99.95% with Availability Zones.
To upgrade existing AKS cluster to use uptime SLA, please follow the steps below.
az aks show -g <RG name> -n <AKS cluster name> | grep sku -A5 -B5
At the moment, we could see “sku” is showing tier as “Free”. If you execute the command below, the “sku” of this cluster would be showing “Paid”.
az aks update -g <RG name> -n <AKS cluster name> - uptime-sla
Proximity Placement Group
When adding a node pool with proximity placement group (PPG), basically means AKS nodes are being logically put in close physical location to reduce network latency, so applications for gaming and that are compute-intensive could have better performance. For more details, please check this official documentation.
To create a PPG.
az ppg create -g <RG name> -n <PPG name> -l <location> -t <SKU>
To create node pool inside the newly create PPG.
az aks nodepool add \
--resource-group <RG name> \
--cluster-name <AKS cluster name> \
--name <node pool name> \
--node-count <amount> \
--ppg <PPG resource ID from above command>