Pixie is at Kubecon NA! Learn more ☸️Pixie is at Kubecon NA!
pixie logo
Blog / Pixie Team Blogs

Unexpected Challenges Supporting Kubernetes 1.22 in PixiePermalink

Phillip Kuznetsov

October 05, 2021

Last year, Kubernetes updated its feature lifecycle policy to prevent the existence of “permanent beta” APIs. The new policy gives beta REST APIs three releases to either reach GA (and deprecate the beta) or create a new beta version (and deprecate the previous beta). Kubernetes 1.22 is the first release to remove deprecated beta APIs since the policy was adopted.

While testing Pixie on Kubernetes 1.22, we discovered that the removal of CustomResourceDefinition from apiextensions.k8s.io/v1beta1 broke the nats-operator and etcd-operator. In this post, we share how we adapted our NATS and etcd deployments to be compatible with Kubernetes 1.22 and the challenges we faced while updating active Pixie deployments to the new architecture.

The Problem: Deprecated Third-Party OperatorsPermalink

Pixie's on-prem data collector, Vizier, relies on NATS and etcd. We opted to use the nats-operator and etcd-operator when we added these dependencies. These operators expose a simple interface via a CustomResourceDefinition (CRD) and translate requested resources into the necessary set of configurations, deployments, and services - simplifying the deployment.

Unfortunately, both operators chose to define their CRD inside of their code (rather than as a separate yaml) and the latest releases1 still used the beta version of the API. That meant our Kubernetes 1.22 cluster rejected CRD creation requests from these operators because those APIs no longer existed. Since NATS no longer recommends using nats-operator and coreos archived etcd-operator, we needed a new deployment model.

The Solution: StatefulSetsPermalink

We decided to switch away from CustomResources over to StatefulSet equivalents (etcd, nats). Forking the operator code bases was an option, but we did not need all of the features provided by the operators nor did we want to maintain a forked project. The main operator feature that we needed was dynamic configuration; the operator deployments knew the name of the pods in their configuration before the pods were deployed. We replicated this with StatefulSets (which gave us predictable naming) combined with environment variable substitution.

Another Challenge: Updating Active DeploymentsPermalink

Although updating to StatefulSet NATS/etcd deployments is only necessary for Kubernetes 1.22+, we wanted to reduce our operational burden by updating all running Viziers. We encountered two issues:

  1. We did not have a path to update our dependencies (etcd/NATS).
  2. We had trouble including the recommended clients for the etcd and NATS custom resources.

Updating the dependenciesPermalink

Pixie’s Vizier is deployed and updated using the Kubernetes operator pattern. The Vizier operator already had an update path for our core product, but lacked the equivalent for our dependencies. We originally delayed building the update path for the dependencies because we didn’t want to interrupt connected Viziers for longer than necessary and our deps rarely changed. This would also be the first update with destructive changes: we would need to get rid of the actively running etcd and NATS instances.

We decided to special case this in the Vizier operator code. Whenever we detect the etcd-cluster and nats-cluster CustomResources in the Vizier namespace, we remove the resources and deploy the new StatefulSet versions. We didn’t create an update path for dependencies, but given the destructive changes were a special case, we decided to delay building out a whole update path.

Including the etcd and NATS clientsPermalink

We wanted to use the etcd-cluster and nats-cluster clients to detect old NATS and etcd dependencies. We hit a strange compilation error while attempting to include the clients. It turns out that our packaged version of client-go is incompatible with the packaged version in the clients. The client-go developers introduced a breaking interface change and Go’s dependency management disallows two different incompatible versions with the same import path.

Fortunately, we had recently used client-gen to generate client code for Vizier CustomResources. We decided to do the same with nats-cluster and etcd-cluster, vendoring the clients into our operator code. Once you have a hammer, everything starts to look like a nail.

Now during our updates, the Vizier operator simply uses the vendored clients to check for etcd-cluster or nats-cluster resources deployed by old versions of Vizier and replaces them with the new StatefulSet versions if they happen to exist. Future Viziers will only deploy with the StatefulSet Versions.

ConclusionPermalink

The removal of the beta CRD API from Kubernetes 1.22 posed a unique challenge for our team. Switching to StatefulSet NATS and etcd removed the deprecated third-party operator dependencies. Updating all of our running instances allowed us to avoid bifurcating our deployments.

If you’re looking to use StatefulSet NATS and etcd, check out our NATS and etcd yamls as references. You should also check our blog on how etcd works and why we chose NATS as our message bus.

Let us know if you found this type of post interesting! We’re trying to not only open-source our codebase, but also openly discuss the challenges we’ve faced and lessons that we’ve learned along the way.

Find us on Slack, GitHub, or Twitter at @pixie_run.

FootnotesPermalink


  1. The nats-operator code for managing CRDs has since been updated to the GA API , but they have not released a new version.

Phillip Kuznetsov

Lead SWE @ New Relic, Founding Engineer @ Pixie Labs
This site uses cookies to provide you with a better user experience. By using Pixie, you consent to our use of cookies.