Export Pixie data in the OpenTelemetry format.Learn more🚀
Pixie has an OpenTelemetry plugin!
pixie logo
​
Ctrl/Cmd + K
DocsGET STARTED
​
Ctrl/Cmd + K

3 Reasons to Use Kubernetes Operators (and 2 Reasons Not To)Permalink

Michelle Nguyen
November 10, 2021 • 2 minutes read
Principal Engineer @ New Relic, Founding Engineer @ Pixie Labs

We recently switched Pixie to an operator-based deployment. In order to make this decision, we compiled reasons for why you should and shouldn’t build an operator for your application.

What is a Kubernetes Operator?Permalink

A Kubernetes operator is a controller for packaging, managing, and deploying applications on Kubernetes. In this model, the controller watches a custom resource (CR) which represents the configuration and state of a Kubernetes application. The controller is then responsible for reconciling the actual state of the application with its expected state. This controller loop principle can help automate scaling, upgrades, and recovery for the application.

Operators allow you to manage complex applications by extending the Kubernetes control loop principle to an application defined in a custom resource definition (CRD).

Should you add an Operator?Permalink

Whether or not you should use an operator depends on the specifics of your application, but here are some general points to consider:

🟢 Simplification for the end userPermalink

Abstracting your application into a single CRD helps users view your application as a single component rather than individual, separate parts (deployments/statefulsets/configmaps/etc). The operator can surface an overall system state, which reduces the cognitive load on users.

In Pixie’s case, users previously checked Pixie’s deploy status by viewing the pods:

kubectl get pods -n pl
NAME READY STATUS RESTARTS AGE
kelvin-5b7c8c4c5b-n7v5x 1/1 Running 0 2d20h
pl-etcd-0 1/1 Running 0 2d20h
pl-etcd-1 1/1 Running 0 2d20h
pl-etcd-2 1/1 Running 0 2d20h
pl-nats-0 1/1 Running 0 2d20h
vizier-certmgr-7bcbf9d4bd-r8h5s 1/1 Running 0 2d20h
vizier-cloud-connector-854f8bb487-d69kk 1/1 Running 0 2d20h
vizier-metadata-79f8764589-hmz59 1/1 Running 0 2d20h
vizier-pem-crg62 1/1 Running 12 2d20h
vizier-pem-r2xsn 1/1 Running 4 2d20h
vizier-proxy-f584dc9c8-4gb72 1/1 Running 0 2d20h
vizier-query-broker-ddbc89b-wftbz 1/1 Running 0 2d20h

After the addition of the CRD, the entire state of the application can be summarized with one kubectl describe vizier command:

kubectl describe vizier
Status:
Last Reconciliation Phase Time: 2021-11-05T22:30:56Z
Reconciliation Phase: Ready
Version: 0.9.11
Vizier Phase: Healthy

🟢 Cleanliness and consistencyPermalink

Configuration options live in one place (the CRD) rather than spread out across many configmaps. The values in the CRD can be viewed as the source of truth, and is the single place where users need to make their modifications when adjusting the config.

Pixie originally had four configmaps:

pl-cloud-config
pl-cloud-connector-tls-config
pl-cluster-config
pl-tls-config

These configMaps are now represented by a single CRD.

🟢 Auto-recoveryPermalink

The operator can monitor the overall state of the application and apply whatever changes necessary to get the application into a healthy state. This is especially beneficial for persistent systems, or applications that need to be highly available.

NATS is a major dependency of Pixie, and enables most of Pixie’s pod-to-pod communication. Occasionally we have seen the NATS instance fail, and require a redeploy to recover. The operator can monitor NATS’s status and redeploy when necessary without any action from the user.

🔴 Loss of user controlPermalink

The operator is responsible for deploying K8s resources that are abstracted away from the user. However, many users prefer to know exactly what is deployed on their system. Since it is the operator’s responsibility to manage these resources, the operator may also unknowingly overwrite any manual user changes.

In Pixie’s case, the user deploys only the operator’s YAMLs to get started. Pixie’s operator actually deploys more resources to their cluster, which are not included in these initial YAMLs.

🔴 Maintenance burdenPermalink

The operator is an additional piece of code that needs to be maintained and updated, alongside with the actual application itself. The more powerful the operator, the more complex its logic. The operator may be responsible for keeping the application up-to-date, but what happens when the operator itself needs to be updated?

Although the operator’s logic is not nearly as complicated as Pixie’s actual application, it is still over 1000+ LOC.

ConclusionPermalink

Since Pixie is a complex application performance monitoring tool, we believed the benefits of running an operator-based deployment heavily outweighed the downsides. Feel free to check out the implementation of our operator as an example.


Related posts

Terms of Service|Privacy Policy

We are a Cloud Native Computing Foundation sandbox project.

CNCF logo

Pixie was originally created and contributed by New Relic, Inc.

Copyright © 2018 - The Pixie Authors. All Rights Reserved. | Content distributed under CC BY 4.0.
The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage Page.
Pixie was originally created and contributed by New Relic, Inc.

This site uses cookies to provide you with a better user experience. By using Pixie, you consent to our use of cookies.