Upgrading kubernetes ¶

NOTE for Kubernetes >1.31 ¶

Kops' upgrade procedure has historically risked violating the Kubelet version skew policy. After kops update cluster --yes completes and before every kube-apiserver is replaced with kops rolling-update cluster --yes, newly launched nodes running newer kubelet versions could be connecting to older kube-apiserver nodes.

Violating this policy when upgrading to Kubernetes 1.31 can cause newer kubelets to crash. This kubernetes issue provides details though it was not addressed because the change does not actually violate the version skew policy, it merely breaks tooling that was already violating the policy.

To upgrade a cluster to Kubernetes 1.31 or newer, use the new kops reconcile cluster command introduced in Kops 1.31. This replaces both kops update cluster --yes and kops rolling-update cluster --yes.

kops reconcile cluster will interleave the cloud provider updates of kops update cluster --yes with the node rotations of kops rolling-update cluster --yes.

It is comparable to the following sequence: 1. kops update cluster --instance-group-roles=control-plane,apiserver --yes 2. kops rolling-update cluster --instance-group-roles=control-plane,apiserver --yes 3. kops update cluster --yes 4. kops rolling-update cluster --yes 5. kops update cluster --prune --yes

Terraform users will need to use a targeted terraform apply with the normal kops rolling-update cluster --yes:

$ kops update cluster --target terraform ...

# Get the terraform resource IDs of the instance groups with a spec.role of `ControlPlane`, `Master`, or `APIServer`
# The exact output may vary.
$ terraform state list | grep -E 'aws_autoscaling_group|google_compute_instance_group_manager|hcloud_server|digitalocean_droplet|scaleway_instance_server'
aws_autoscaling_group.controlplane-us-east-1a-example-com
aws_autoscaling_group.controlplane-us-east-1b-example-com
aws_autoscaling_group.controlplane-us-east-1c-example-com
aws_autoscaling_group.nodes-example-com
aws_autoscaling_group.bastion-example-com

# Apply the changes to all control plane instance groups
$ terraform apply -target 'aws_autoscaling_group.controlplane-us-east-1a-example-com' -target 'aws_autoscaling_group.controlplane-us-east-1b-example-com' -target 'aws_autoscaling_group.controlplane-us-east-1c-example-com'

# Roll the apiserver nodes
$ kops rolling-update cluster --yes --instance-group-roles control-plane,apiserver

# Apply everything else
$ terraform apply

# Roll the remaining nodes
$ kops rolling-update cluster --yes

Upgrades before Kops 1.31 ¶

Upgrading kubernetes is very easy with kOps, as long as you are using a compatible version of kOps. The kOps 1.30.x series (for example) supports the kubernetes 1.28, 1.29, and 1.30 series, as per the kubernetes deprecation policy. Older versions of kubernetes will likely still work, but these are on a best-effort basis and will have little if any testing. kOps 1.30 will not support the kubernetes 1.31 series, and for full support of kubernetes 1.31 it is best to wait for the kOps 1.31 series release. We aim to release the next major version of kOps within a few weeks of the equivalent major release of kubernetes. We try to ensure that a pre-release (alpha or beta) is available at the kubernetes release date, for early adopters.

Upgrading kubernetes is similar to changing the image on an InstanceGroup, the kubernetes version is controlled at the cluster level. So instead of kops edit ig <name>, we kops edit cluster, and change the kubernetesVersion field. kops edit cluster will open your editor with the cluster, similar to:

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: simple.k8s.local
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudProvider: gce
  configBase: gs://kubernetes-clusters/simple.k8s.local
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-central1-a
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-us-central1-a
      name: a
    name: events
  iam:
    legacy: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.17.2
  masterPublicName: api.simple.k8s.local
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  project: gce-project
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - name: us-central1
    region: us-central1
    type: Public
  topology:
    dns:
      type: Public

Edit kubernetesVersion, changing it to 1.17.7 for example.

Apply the changes to the cloud infrastructure using kops update cluster and kops update cluster --yes:

Will create resources:
  InstanceTemplate/master-us-central1-a-simple-k8s-local
    Network                 name:default id:default
    Tags                    [simple-k8s-local-k8s-io-role-master]
    Preemptible             false
    BootDiskImage           cos-cloud/cos-stable-57-9202-64-0
    BootDiskSizeGB          64
    BootDiskType            pd-standard
    CanIPForward            true
    Scopes                  [compute-rw, monitoring, logging-write, storage-ro, https://www.googleapis.com/auth/ndev.clouddns.readwrite]
    Metadata                {cluster-name: <resource>, startup-script: <resource>}
    MachineType             n1-standard-1

  InstanceTemplate/nodes-simple-k8s-local
    Network                 name:default id:default
    Tags                    [simple-k8s-local-k8s-io-role-node]
    Preemptible             false
    BootDiskImage           debian-cloud/debian-9-stretch-v20170918
    BootDiskSizeGB          128
    BootDiskType            pd-standard
    CanIPForward            true
    Scopes                  [compute-rw, monitoring, logging-write, storage-ro]
    Metadata                {startup-script: <resource>, cluster-name: <resource>}
    MachineType             n1-standard-2

Will modify resources:
  InstanceGroupManager/us-central1-a-master-us-central1-a-simple-k8s-local
    InstanceTemplate         id:master-us-central1-a-simple-k8s-local-1507089163 -> name:master-us-central1-a-simple-k8s-local

  InstanceGroupManager/us-central1-a-nodes-simple-k8s-local
    InstanceTemplate         id:nodes-simple-k8s-local-1507089694 -> name:nodes-simple-k8s-local

kops rolling-update cluster will show that all nodes need to be restarted.

NAME            STATUS      NEEDUPDATE  READY   MIN MAX NODES
master-us-central1-a    NeedsUpdate 1       0   1   1   1
nodes           NeedsUpdate 3       0   3   3   3

Restart the instances with kops rolling-update cluster --yes.

> kubectl get nodes -owide
NAME                        STATUS    AGE       VERSION   EXTERNAL-IP     OS-IMAGE                             KERNEL-VERSION
master-us-central1-a-8fcc   Ready     26m       v1.17.7   35.194.56.129   Container-Optimized OS from Google   4.4.35+
nodes-9cml                  Ready     16m       v1.17.7   35.193.12.73    Ubuntu 16.04.3 LTS                   4.10.0-35-generic
nodes-km98                  Ready     10m       v1.17.7   35.194.25.144   Ubuntu 16.04.3 LTS                   4.10.0-35-generic
nodes-wbb2                  Ready     2m        v1.17.7   35.188.177.16   Ubuntu 16.04.3 LTS                   4.10.0-35-generic