Last year, Kubernetes updated its feature lifecycle policy to prevent the existence of “permanent beta” APIs. The new policy gives beta REST APIs three releases to either reach GA (and deprecate the beta) or create a new beta version (and deprecate the previous beta). Kubernetes 1.22 is the first release to remove deprecated beta APIs since the policy was adopted.
While testing Pixie on Kubernetes 1.22, we discovered that the removal of CustomResourceDefinition
from apiextensions.k8s.io/v1beta1
broke the nats-operator
and etcd-operator
. In this post, we share how we adapted our NATS and etcd deployments to be compatible with Kubernetes 1.22 and the challenges we faced while updating active Pixie deployments to the new architecture.
Pixie's on-prem data collector, Vizier, relies on NATS and etcd. We opted to use the nats-operator
and etcd-operator
when we added these dependencies. These operators expose a simple interface via a CustomResourceDefinition
(CRD) and translate requested resources into the necessary set of configurations, deployments, and services - simplifying the deployment.
Unfortunately, both operators chose to define their CRD inside of their code (rather than as a separate yaml) and the latest releases1 still used the beta version of the API. That meant our Kubernetes 1.22 cluster rejected CRD creation requests from these operators because those APIs no longer existed. Since NATS no longer recommends using nats-operator and coreos archived etcd-operator, we needed a new deployment model.
We decided to switch away from CustomResources over to StatefulSet equivalents (etcd, nats). Forking the operator code bases was an option, but we did not need all of the features provided by the operators nor did we want to maintain a forked project. The main operator feature that we needed was dynamic configuration; the operator deployments knew the name of the pods in their configuration before the pods were deployed. We replicated this with StatefulSets (which gave us predictable naming) combined with environment variable substitution.
Although updating to StatefulSet NATS/etcd deployments is only necessary for Kubernetes 1.22+, we wanted to reduce our operational burden by updating all running Viziers. We encountered two issues:
Pixie’s Vizier is deployed and updated using the Kubernetes operator pattern. The Vizier operator already had an update path for our core product, but lacked the equivalent for our dependencies. We originally delayed building the update path for the dependencies because we didn’t want to interrupt connected Viziers for longer than necessary and our deps rarely changed. This would also be the first update with destructive changes: we would need to get rid of the actively running etcd and NATS instances.
We decided to special case this in the Vizier operator code. Whenever we detect the etcd-cluster and nats-cluster CustomResources in the Vizier namespace, we remove the resources and deploy the new StatefulSet versions. We didn’t create an update path for dependencies, but given the destructive changes were a special case, we decided to delay building out a whole update path.
We wanted to use the etcd-cluster and nats-cluster clients to detect old NATS and etcd dependencies. We hit a strange compilation error while attempting to include the clients. It turns out that our packaged version of client-go is incompatible with the packaged version in the clients. The client-go developers introduced a breaking interface change and Go’s dependency management disallows two different incompatible versions with the same import path.
Fortunately, we had recently used client-gen
to generate client code for Vizier CustomResources. We decided to do the same with nats-cluster and etcd-cluster, vendoring the clients into our operator code. Once you have a hammer, everything starts to look like a nail.
Now during our updates, the Vizier operator simply uses the vendored clients to check for etcd-cluster or nats-cluster resources deployed by old versions of Vizier and replaces them with the new StatefulSet versions if they happen to exist. Future Viziers will only deploy with the StatefulSet Versions.
The removal of the beta CRD API from Kubernetes 1.22 posed a unique challenge for our team. Switching to StatefulSet NATS and etcd removed the deprecated third-party operator dependencies. Updating all of our running instances allowed us to avoid bifurcating our deployments.
If you’re looking to use StatefulSet NATS and etcd, check out our NATS and etcd yamls as references. You should also check our blog on how etcd works and why we chose NATS as our message bus.
Let us know if you found this type of post interesting! We’re trying to not only open-source our codebase, but also openly discuss the challenges we’ve faced and lessons that we’ve learned along the way.
Find us on Slack, GitHub, or Twitter at @pixie_run.
nats-operator
code for managing CRDs has since been updated to the GA API , but they have not released a new version.↩Terms of Service|Privacy Policy
We are a Cloud Native Computing Foundation sandbox project.
Pixie was originally created and contributed by New Relic, Inc.
Copyright © 2018 - The Pixie Authors. All Rights Reserved. | Content distributed under CC BY 4.0.
The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage Page.
Pixie was originally created and contributed by New Relic, Inc.