Dial C* for Operator: Unlocking Advanced Cassandra Configurations

By: John Sanda

DataStax
Building Real-World, Real-Time AI

--

Cass Operator is an Apache Cassandra® operator for Kubernetes as part of the K8ssandra project. Many of its useful configuration management options aren’t necessarily obvious at first, so this post walks you through a few examples that showcase some of these options.

The DataStax Kubernetes Operator for Apache Cassandra, Cass Operator, essentially manages Cassandra clusters in Kubernetes. It provides configuration management, scaling Cassandra (adding and removing nodes), and error handling. Plus, as part of K8ssandra, it makes the necessary configuration changes required for common operations, like repair and backup and restore.

This post focuses on some of the configuration management capabilities of Cass Operator through a series of examples. For some general background on Cass Operator, check out the DataStax documentation.

The first set of examples do not require any advanced understanding of Cass Operator or Kubernetes in general. The advanced examples assume a deeper understanding of Kubernetes as they cover topics like init containers and StatefulSets.

These examples have been tested against Cass Operator 1.10.1, the latest version as of this writing. The full source of the examples can be found in the cassandradatacenter-examples repository on GitHub. Let’s begin.

Basic examples

These examples demonstrate how to do basic configuration of Cassandra and of Kubernetes resources.

Cassandra and JVM

The snippet of the CassandraDatacenter manifest demonstrates how to configure cassandra.yaml and jvm-server-options for a Cassandra 4.0.3 deployment.

The properties under cassandra-yaml map directly to properties in cassandra.yaml. The properties under jvm-server-options and jvm11-server-options configure heap and garbage collector settings.

Cluster topology

Cass Operator configures cluster to use racks and GossipingPropertyFileSnitch. The following example shows the configuration for a multi-rack cluster spread across three availability zones.

The CassandraDatacenter declares a three-node cluster with three racks, and the Cass Operator makes its best effort to evenly distribute nodes across racks. In this example, the racks will be balanced with one node each.

Cass Operator uses node affinity to pin each rack to a different availability zone, and uses pod anti-affinity by default to ensure that Cassandra pods are isolated from one another. Kubernetes will not schedule multiple Cassandra pods on the same worker node.

Note: topology.kubernetes.io/zone is a common label that is applied to all worker nodes.

Also note that your Kubernetes cluster must have at least three worker nodes with the appropriate labels for this example to work.

Cassandra pod resources

In general, it is a good practice to specify resource requirements for Kubernetes applications and services. Cassandra is no exception. A pod can have multiple containers. The Cassandra pod includes one init container and two main containers.

The server-config-init container generates all of the configuration files in /etc/cassandra.

The cassandra container runs Cassandra. The container runs the management-api as PID 1. It manages the lifecycle of Cassandra. Because it is the primary process in the container, the management-api’s logs go to stdout. This means that if you execute kubectl logs <cassandra-pod> -c cassandra you will get back the management-api’s logs and not Cassandra’s.

The server-system-logger container is a lightweight busybox container that tails /var/log/cassandra/system.log. You can view Cassandra’s logs with kubectl logs <cassandra-pod> -c server-system-logger.

The example below illustrates how to specify CPU and memory requirements for each of these containers.

Kubernetes will only schedule the Cassandra pods on worker nodes with enough resources to satisfy the requests.

Advanced examples

The following examples are advanced for a couple reasons: first, they require more understanding of Kubernetes types and concepts like StatefulSets, init containers, and volumes. Secondly, the examples touch on implementation details of Cass Operator.

Cass Operator creates a StatefulSet for each rack. The template property of a StatefulSet fully describes the pods that will be created.

Custom pod labels

Suppose you want to add custom labels to the Cassandra pods. There is no specific property, like podLabels, in the CassandraDatacenter object to do this. The podTemplateSpec property however, makes it possible.

Cass Operator will add each of these labels to the template of the StatefulSet. This, in turn, means that they will be added to each of the pods.

Note: If the containers property is omitted, then we get a validation error about a null value; so we set it to an empty array.

Environment variable

This next example demonstrates how to add an environment variable to the cassandra container.

In and of itself, this may not seem particularly useful, but it actually highlights something very interesting: how Cass Operator applies merge semantics with podTemplateSpec.

We know that Cass Operator already defines the cassandra container, and it also defines several default environment variables for the container. The declaration in this podTemplateSpec does not replace the default cassandra container, nor does it replace the default environment variables — the environment variable will be added to the list of environment variables along with the default ones.

Init container

Now let’s take a look at how to add an init container to the Cassandra pod.

Init containers run in the order declared. The hello container will run before the server-config-init container. If we want the server-config-init container to run first, we can make the following change:

Cass Operator will run server-config-init first using its default configuration.

Remote JMX

This final example builds off of the previous ones and is borrowed from the K8ssandra project. K8ssandra deploys Cass Operator along with additional components, one of which is Reaper.

Reaper manages repair operations for Cassandra and relies on JMX to perform them. Cassandra has remote JMX disabled by default, so it has to be enabled for Reaper to function properly. JMX authentication also has to be configured because Cassandra only enables it when remote JMX is enabled.

JMX access is configured in the /etc/cassandra/cassandra-env.sh script. It checks to see if the LOCAL_JMX environment variable is set. This happens in the cassandra container. JMX credentials are stored in /etc/cassandra/jmxremote.password. We need to add a set of credentials to that file, using an init container. Another detail is that Cass Operator does not create a volume mount for /etc/cassandra by default.

Cass Operator creates an emptyDir volume named server-config. The server-config-init init container and the cassandra container both mount the volume at /config. server-config-init, which generates configuration files and writes them into this directory. When the cassandra container starts, it copies everything from /config to /etc/cassandra. This provides us with the necessary information needed to build the init container.

Try to run nodetool without credentials using kubectl exec -it <cassandra-pod> -c cassandra -- nodetool status.

It should fail with a SecurityException that says credentials are required.

It should succeed if you run kubectl exec -it <cassandra-pod> -c cassandra -- nodetool -u cassandra -pw cassandra status.

Summing up

Cass Operator provides a variety of options to configure Cassandra, the JVM, and the StatefulSets that it generates. When you need to configure something that Cass Operator does not expose or when you need something more advanced — like an init container with additional volumes — podTemplateSpec offers tremendous flexibility.

That flexibility comes with risks, though. The remote JMX example is entirely dependent on implementation details of Cass Operator. It would be a nice enhancement for Cass Operator to enable you to configure things like init containers and sidecar containers without being tightly coupled to implementation details.

Follow DataStax on Medium for exclusive posts on all things open source, including Cassandra, Pulsar, Kubernetes, and more. To join a buzzing community of developers from around the world and stay in the data loop, follow DataStaxDevs on Twitter and LinkedIn.

This post was originally published on the DataStax Blog.

Resources

  1. What is Cass Operator? | DataStax
  2. GitHub — Cass-operator: The Kubernetes Operator for Apache Cassandra
  3. CassandraDatacenter examples | GitHub
  4. K8ssandra: Cloud-native distribution of Apache Cassandra on Kubernetes
  5. Resource Management for Pods and Containers | Kubernetes

--

--

DataStax
Building Real-World, Real-Time AI

DataStax provides the real-time vector data tools that generative AI apps need, with seamless integration with developers' stacks of choice.