Release notes for kOps 1.36 series ¶
⚠ kOps 1.36 has not been released yet! ⚠
This is a document to gather the release notes prior to the release.
kOps 1.36 adds Kubernetes 1.36 support, completes the move away from the in-tree cloud providers, refreshes the bundled components (containerd v2.3, etcd-manager, AWS Load Balancer Controller, EBS CSI driver, Cilium options, CoreDNS, etc.), reworks how the kops-channels addon manager is built and deployed, introduces a hybrid bootstrap path for gossip clusters, expands Azure and Hetzner support, and lays the groundwork for Linode (Akamai) as a new cloud provider.
Significant changes ¶
Kubernetes ¶
- Kubernetes 1.36 support, including integration tests (#18267, #18202)
kube-apiserver: aligndeleteCollectionWorkers,MaxRequestInflightandcompactionIntervalwith kube-up defaults for large clusters (#18081, #18086)kubelet: migrate deprecated CLI flags into the kubelet configuration file passed via--config(#18280)kubelet: add new tunables (kubeAPIQPS/kubeAPIBurst,nodeAllocatableUpdatePeriodSecondsand friends) and only setnodeAllocatableUpdatePeriodSecondson Kubernetes 1.35+ (#18085, #18153, #18305)kube-proxy: bind-mount the kubeconfig directory instead of the file so kubeconfig rotations survive (#18344)kube-scheduler: documentation clarification forKubeSchedulerConfiguration(#18151)kops-controller: validate instance group names (#18391)kops-controller: delete the legacy node reconciler and route DigitalOcean through the non-legacy Identifier (#18227)- Remove the unused in-tree cloud config plumbing (#18347)
Container Runtime ¶
- Default containerd to v2.3.0 (with runc 1.4.2) for Kubernetes 1.36+; v2.2.3 remains the default for 1.32–1.35, and v1.7.31 for older clusters (#18364)
- Support containerd config TOML v3, with clearer config-file version behavior and dead-code cleanup from 1.6 (#18291). Users who set
spec.containerd.configAdditionson a cluster or instance group must update those entries to match the TOML v3 schema before upgrading. - Map the containerd registry mirror wildcard to the
_defaultdirectory (#18291) - Stream verified container image bytes directly into
ctr importfrom nodeup and drop the containerized-mounter Archive task (#18278, #18277) - Dump containerd config files in
kops toolbox dumpfor troubleshooting (#18313)
Networking ¶
- Update Cilium with several new tunables:
- Schedule
cilium-operatoron control-plane nodes (#18375) - Allow setting arbitrary
cilium-configentries viaCilium.ExtraConfig(#18285) - Add
EnableHostFirewallfield toCiliumNetworkingSpec(#18152) - Add
bpf-lb-sockandbpf-lb-sock-hostns-onlyflags (#18375) - Require
k8s-connectivityin the liveness probe (#18237) - Calico: disable kube-proxy when running in eBPF mode (#18334)
- Update AWS VPC CNI to v1.21.2 (#18410)
- CoreDNS: update to v1.14.3 then pin to v1.14.2 to stay on the supported branch, and bump CoreDNS memory on large clusters (#18368, #18369, #18361)
- dns-controller: make
priorityClassNameconfigurable and defaultProviderwhenExternalDNSis partially set (#18298, #18302) - Drop deprecated GCS-based CNI plugin mirrors (#17987, #17976)
AWS ¶
- Drop the in-tree
cloud-provider-awsdependency from kOps (#18336) - AWS Load Balancer Controller refresh:
- Upgrade to v3.3.0, switching the manifest to a Helm + Kustomize pipeline (#18221, #18276)
- Prune the bundled Deployment and drop the
ALBTargetControlConfigCRD (#18222, #18233) - Bypass the LBC webhook for cert-manager so circular bootstrap issues are avoided (#17999)
- Add
elasticloadbalancing:SetRulePrioritiesandec2:DescribeSubnetspermissions (#17999) - Update EBS CSI driver to v1.58.0, refresh the upstream policy, and gate
MutableCSINodeAllocatableCountfor Kubernetes 1.35+ (#18220) kops-controlleris now served over HTTPS with a/healthztarget on the API NLB target group (#18236, #18174)- Abort rolling updates when load-balancer deregister fails instead of marching on with degraded targets (#18338)
NodeTerminationHandler:- Add
EnableOutOfServiceTaintfield (#18140) - Allow disabling
enableScheduledEventDrainingin Queue Processor mode (#18339) - Mixed instances policy: apply
onDemandAllocationStrategyto the ASG and propagate taints without value as ASG tags (#18342, #18343) instanceRequirements: addexcludedInstanceTypesand fix a memory-assignment bug (#18113, #18123)- NLB: add a security-group mode option (#18211)
- Also consider private subnets that already have an IPv6 CIDR as having a CIDR assigned (#18089)
- Disable
nm-cloud-setupon RHEL 9 for AWS VPC CNI (#18264) - Tighten and trim instance-role IAM permissions to match what each component actually uses (#18251, #18355, #18362, #18363, #18372)
- Use
HeadBucketto resolve the S3 bucket region, pass the VFS scheme and provider-specific options to the S3 client, and silence warnings when the S3 provider has no supported checksum (#18335, #18129, #18132, #18128)
Azure ¶
- Deploy
cloud-controller-managerfor node lifecycle and load-balancer support (#18197) - Add experimental Terraform target support (#18149)
- Add support for the Azure Disk CSI Driver (#18141)
- Use HTTPS for the kops-controller probe and load-balancer health check, and move probe / rule configuration into the model using SDK types (#18182, #18183, #18190)
- Use
/etc/kubernetes/azure.jsonfor the cloud config, load CCM/CSI config from a Secret via a newazure-cloud-configaddon and stop writing the cloud-config file on nodes (#18345) - Set the provider ID when starting kubelet (#18155)
- Enable CCM cloud routes for kubenet and Kindnet (#18262)
- Encode storage account in
azureblob://URLs (#18260) - List VMSS NICs in protokube gossip seed discovery and match VMSS VM/NIC ARM IDs case-insensitively in the dumper (#18319, #18315)
- Cluster deletion robustness: block disk deletion on the parent VMSS, delete
RoleAssignmentafter the VM scale set, handle missing resource groups in disk listing, fix nil pointer dereferences and ordering issues (#18196, #18186, #18191, #18184, #18185) - Retry tasks with failed provisioning state, fix Terraform
LoadBalancertask dependencies onPublicIP, and use a larger default VM SKU (Standard_D4ls_v6) in tests (#18157, #18154, #18194) - Restrict VMSS role assignments to the control plane and fix a control-plane role tag spelling (#18357, #18353)
- Add
regenerate.shfor addons (#18225)
GCP ¶
- Add
kops-controllerto the GCE internal load balancer and expose it on the internal LB for gossip clusters (#18169, #18307) - Use SSL health check for
kops-controlleron GCE (#18171) - Support
role=apiserveron GCE withdns=noneand fix instance tags forrole=apiserver(#18159, #18175) - Allow BGP from nodes to the control plane for Calico (#18351)
- Don't request live migration on instance types that don't support it (#18004)
- Wait for
InstanceManagers/InstanceTemplatesdeletion to complete and include MIG scaling errors when instances are not found (#18013, #18247) - Allow
N4Ainstance type (#18330) - Drop the
cloud-provider-gcpdependency and switchclouddnsto a forkedgcetokensource(#18274) - Allow pods to reach metric ports running on control-plane nodes when using GCE alias IPs (#18052)
- Use Kubernetes 1.36 for the
apiserverrole e2e template, move it todns=none, add e2e templates that combine internal load balancer + Cilium etcd, and rejectrole=apiserver+dns=noneonly on GCE when not supported (#18147, #18146, #18162, #18214) - GCE: shrink the etcd-cluster disk label to fit the 63-character limit (#18292)
- GCE: support control-plane volume type configuration (#17955)
- GCE: fix instance-group deletion (#18148) and nil-panic during deletion (#18195)
- Migrate the GCS bucket discovery store handling and reject GCS VFS paths without buckets (#18360)
Hetzner ¶
- Enable Cluster Autoscaler (#18226, #18135)
- Upgrade
hcloud-cloud-controller-managerto v1.31.0 (#18281, #18317) - Upgrade
hcloud-csi-driverto v2.20.2 and reorder the CSI driver Deployment before the DaemonSet (#18318) - Split the hcloud Secret into its own addon and let the CSI driver consume the CCM-provided secret (#18317, #18318)
DigitalOcean ¶
- Default machine type to
s-2vcpu-4gb-amd(#18227) - Migrate node identifier to the non-legacy
Identifierinterface and tag droplets with the instance group role (#18227)
OpenStack ¶
- Resolve the
floatingipTODO from kOps 1.21 (#18314) - Enable hybrid bootstrap mode for gossip clusters (#18245)
Linode (experimental) ¶
- Initial Linode (Akamai) cloud provider support, including:
- Cloud provider API registration (#18166)
- VFS object storage schema (#18138)
- VPC cloudup task (#18316)
- nodeup configuration and node identity (#18177)
- The Linode provider is not yet ready for production use.
Bare metal ¶
- Don't try to use protobuf for bare-metal tooling (#18068)
- Unpin Kubernetes version for the metal provider (#17944)
Etcd ¶
- Update etcd-manager to v3.0.20260512 (#18323)
- Bump etcd to latest patches (3.5.30, 3.6.11) and drop support for etcd 3.4 (#18290)
- Generate etcd-manager patch symlinks dynamically (#18290)
- Decouple
EtcdEventsHTTP from main etcd cluster traffic in scalability tests (#18370)
Channels and addons ¶
- The
kops-channelsaddon manager is now built as a kops-managed image and rendered as a static pod on control-plane nodes; protokube no longer applies channels or labels control-plane nodes (#18215, #18328, #18373, #18374) kops-channelsgains--node-name,--interval, multi-URL apply support, and quicker retries until the first reconcile succeeds (#18328)kops-channels: move the node labeler from protokube (#18215)kops-channels: fix region detection and the discovery cache permission noise (#18390)- Drop the standalone
channelsbinary from the kOps release artifacts (#18374) addons: render addons as fi-tasks so addon templates can reference the finalized task graph, and drop the legacy9.99.0version shim and the deprecated masternodeAffinityterm (#18215, #18257)
Gossip ¶
- Introduce hybrid worker bootstrap for gossip clusters on AWS, Azure, GCE, and OpenStack: control-plane nodes keep using gossip while workers bootstrap directly against the API NLB / internal LB, so workers no longer need protokube (#18245, #18307)
- Restrict gossip seed discovery to control-plane nodes and stop exporting unused cloud credentials to worker nodes (#18352, #18354)
- New cluster creation now defaults to
dns=noneand logs a deprecation warning if gossip is requested (#18245) - Validation: enforce supported DNS topology per cloud provider (#18255)
- Migrate protokube mesh gossip protobuf from gogo (#18230)
- Add minimal gossip create/update integration tests and an upgrade e2e test (#18256, #18296)
Operating System Support ¶
- Drop support for Amazon Linux 2 (#17943)
- Drop support for Ubuntu 20.04 (#18235)
- Drop support for Debian 10 (#18235)
- Add experimental support for Ubuntu 26.04 (#18232)
- Load the
nf_tablesmodule and installiptables-nfton RHEL 10+ (#18179) - Enable the
nf_conntrackkernel module on Rocky 9 (#17968) - Skip
ImageVolumetests on Debian 11 and preventcloud-ifupdown-helperfrom hijacking CNI veths on Debian 11 (#18261) - Set E2E
--node-os-arch=arm64for Rocky 10 (#18192)
Other components ¶
- Update cluster-autoscaler to v1.35.0 (#18110)
- Run metrics-server in insecure mode for AI Conformance tests (#18067)
- Update Go to v1.25.7 / v1.25.8 / v1.25.9 / v1.26.2 / v1.26.3 (#17956, #18058, #18173, #18267, #18395)
- Build kOps binaries with
gcr.io/distroless/staticas the base image and strip release binaries by default (#18403, #18263) - Drop the in-tree helm dependency from
kops toolbox; switch to a forkedhelmstrvals(#18272) - Switch
structured-merge-difffrom v4 to v6 (#18273) - Upgrade
hashicorp/memberlistto v0.5.4 (#18230)
Other changes of note ¶
kops reconcile clusteraccepts--use-kubeconfigto reuse an existing kubeconfig instead of regenerating it (#18126)kopsaccepts--node-volume-typeflags oncreate cluster(#18145)kops create instancegroupaligns the node label acrosscreate clusterandcreate instancegroup(#18341)kops get assets: fix lookup whenspec.dnsZoneis a DNS name (#18384)kops update cluster: reject non-http(s)URLs forassets.fileRepository(#18340)kops update cluster: validateassets.fileRepository(#18340)kops upgrade-ab: allow kOps downgrades for upgrade-AB scenarios (#18219)kops toolbox dump: time out per-node log dumping after one minute, improve reliability and skip not-found nodes on GCE (#18349, #18311, #18049)nodeup: add experimental hybrid-bootstrap workers, skip protokube/channels assets on workers, populateDefaultMachineTypefor Cilium-ENI clusters, and use shared system-component env vars forkops-channels(#18245, #18358, #18365)nodeup: fix protokube skip on hybrid-bootstrap workers (#18378)kops-controller/nodeup: ensure files have the desired permissions before close and rename, and fixPrivateKey.WriteToreturning zero length (#18379)dns-controller: honorklog -stderrthresholdeven when-logtostderris true (#18231)- Fix
HasHighlyAvailableControlPlaneto useAllInstanceGroupswhen an instance-group filter is in use (#17740) - Fix
kopspanic on send to a closed results channel (#18326) AssetBuilderis now concurrency-safe (#18181)- Side-loading uses the
KOPS_BASE_URLimage version (#18200) - Verify the config server IPs with a DNS name (#18241)
- Remove the explicit
fs.inotify.max_user_watchessysctl setting (#17556) - Pull
actions/upload-artifact,actions/setup-goandactions/dependency-review-actionto their latest releases, pinned by commit SHA (#18114) - Replace
shipbotwith agh-based script for promoting binaries (#18095) - Build: add a
kops-channelsimage build and CI push, and runmake apimachineryupdates as needed (#18328) gomod: tidy and verify all modules dynamically (#18401)
AI Conformance (experimental) ¶
- New experimental test suite under
tests/ai-conformancecovering accelerator metrics, gang scheduling with Kueue, observability and AI-service metrics, pod and cluster autoscaling, GPU operator integration with Cilium gateway API, and more. The suite is intended for kOps CI/E2E and is not yet exposed as a user-facing feature (#18056, #18075, #18055, #18053, #18100, #18112, #18063, #18064, #18071, #18074, #18073, #17992, #18001, #18005, #18006, #18008, #18009, #18018, #18066, #18067, #18076, #18077, #18084, #18087, #18101, #18108)
Breaking changes ¶
- Support for Amazon Linux 2, Ubuntu 20.04 and Debian 10 has been removed; existing clusters running those distributions must be migrated to a supported OS before upgrading to kOps 1.36 (#17943, #18235)
- Support for Kubernetes 1.30 has been removed.
- Support for etcd 3.4 has been removed; clusters must be running etcd 3.5 or 3.6 (#18290)
- The standalone
channelsbinary is no longer distributed; thekops-channelsaddon manager now runs as a static pod on control-plane nodes (#18374) - The legacy
9.99.0addons-version shim has been removed; addons set up by kOps versions prior to 1.22 must be re-applied before upgrading (#18257) - The in-tree
cloud-provider-awsandcloud-provider-gcpdependencies have been dropped from kOps; external cloud providers are required (already mandatory for Kubernetes 1.33+) (#18336, #18274) - The legacy
azureblob://{container}/{key}URL form (paired with theAZURE_STORAGE_ACCOUNTenvironment variable) is no longer accepted; state-store paths must use the newazureblob://{account}/{container}/{key}form (#18260) - Users who set
spec.containerd.configAdditionsmust update those entries to the containerd config TOML v3 schema before upgrading to kOps 1.36 (#18291)
Known Issues ¶
- None at this time
Deprecations ¶
-
Support for Kubernetes version 1.30 is removed in kOps 1.36.
-
Support for Kubernetes version 1.31 is deprecated and will be removed in kOps 1.37.
-
Support for Amazon Linux 2, Ubuntu 20.04 and Debian 10 is removed in kOps 1.36.
-
Support for etcd 3.4 is removed in kOps 1.36.
-
The standalone
channelsbinary is no longer distributed in kOps 1.36;kops-channelsruns as a static pod on the control plane. -
Support for AWS Classic Load Balancer (CLB) for the API, deprecated since kOps 1.26, will be rejected for new clusters in kOps 1.37 and fully removed (existing clusters must migrate) in kOps 1.38. See the CLB to NLB migration guide for the upgrade procedure.
-
Support for gossip DNS is deprecated. kOps 1.37 will reject new gossip DNS clusters, and kOps 1.38 will require existing gossip DNS clusters to migrate before upgrading. This affects clusters whose name ends in
.k8s.localand that were not created with--dns=none; clusters using--dns=none, even with a.k8s.localname, are not affected. Existing gossip DNS clusters should migrate to--dns=noneor a hosted DNS zone. kOps 1.36 introduces hybrid bootstrap to make that migration easier.