Multi-Tenancy Systems: Apache Pulsar vs. Kafka
Author: Yabin Meng
In this post, we will evaluate the multi-tenancy features available in both Apache Pulsar and Kafka. Our assessment illustrates not only that Pulsar serves as a more robust multi-tenancy system but by comparison, Kafka is actually more like a single-tenancy system. Read on to find out why.
Software multi-tenancy (or simply multi-tenancy) refers to the capability that a single instance of a software system is able to serve multiple tenants. In a multi-tenancy software system, the usage of the system instance is logically divided among multiple tenants but physically shared on the same underlying infrastructure environment. In contrast, with a single-tenancy system, the usage of the system instance is dedicated to only one tenant both logically and physically.
Compared with a single-tenancy system, the benefits of a multi-tenancy system are obvious, such as 1) simplified system setup, configuration, maintenance, and application deployment; 2) on-going cost savings; 3) easy on-boarding of new customers; and so on and so forth.
For a software system to be considered as having multi-tenancy support, it must satisfy two fundamental requirements within one single system instance:
- Security Context – Each tenant should be only able to access its own data. Access to each tenant’s data should always be protected and authorized via necessary security measures such as authentication, authorization, and encryption.
- Resource Segregation – When multiple tenants share the same set of infrastructure resources, each tenant, no matter how heavy or light the workload is, should have a fair opportunity to use the shared infrastructure resources. Otherwise, some tenants may use up all available resources and therefore starve others.
Apache Pulsar is a true multi-tenancy system that has many built-in mechanisms to support it.
In this article, we’ll compare the multi-tenancy features that are available in both Pulsar and Kafka. Through this comparison, it will become clear that Pulsar is a true, robust multi-tenancy system while Kafka has very limited multi-tenancy capability such that it is practically a single-tenancy system.
Note that our discussion about Apache Pulsar and Apache Kafka in this article focuses on their open-source software (OSS) releases (Pulsar and Kafka). The vendor-specific releases and/or products such as the software-as-a-service (SaaS) offerings are beyond the scope of this article.
Multi-tenancy in Pulsar
Pulsar tenant overview
In Pulsar, tenant
is a first-class citizen. By design, messages in Pulsar are published to topics, and topics are organized in a three-level hierarchy structure as shown below.
With this structure:
- One
tenant
represents a specific business unit or a product line. Topics created under atenant
share the same business context that is distinct from others. - Within one
tenant
, topics having similar behavioral characteristics can be further grouped into a smaller administrative unit called anamespace
. Different policies such as message retention or expiry policy can be set either at thenamespace
level or at an individualtopic
level (more on this later). Polices set at thenamespace
level will be applicable to all topics under thenamespace
.
In Pulsar, the name of a topic
actually reflects the following structure:
When a client application (e.g. a producer or a consumer) connects to a Pulsar topic
, it must specify the full string that contains the tenant
and the namespace
names. Otherwise, an error will be reported about invalid topic
names.
Please note that in Pulsar a topic
can be persistent (the topic_name
starts with persistent:// prefix
) or non-persistent (the topic_name
starts with non-persistent://
prefix). In a non-persistent topic, messages are only stored in memory and not persisted to hard drives. Because non-persistent topics are used only rarely in some edge situations, we will not discuss them in this article. For the remainder of this article, I will refer to the persistent topic
in Pulsar as simply “topic
”.
Pulsar security feature overview
The establishment of the security context for a Pulsar tenant relies on different categories of security features that are available in Pulsar. In this section, I will briefly go through these features. And in the next section, we’ll see how a tenant’s security context can be properly implemented in Pulsar.
Like many system software applications, the security features in Pulsar fall into three major categories: authentication, authorization, and data encryption.
Authentication
Pulsar supports authentication with pluggable providers. Currently, the following authentication providers are supported out of the box:
- Transport Layer Security (TLS) authentication
- Athenz
- Kerberos
- JSON Web Token (JWT) authentication
- OAuth 2.0 access token
Customized authentication providers are also possible by extending Pulsar’s authentication API.
In Pulsar, the authentication is responsible for properly identifying a client and associates it with a Role Token
which is simply a string name that can be associated with a set of permissions by Pulsar authorization.
Using a JWT authentication example, the following Pulsar CLI command creates a JWT token that expires in one year:
The generated token (displayed as the CLI output) is associated with a Role Token
named my-test-role
. Any client that has the generated JWT token can successfully connect to Pulsar.
Authorization
The main purpose of the Role Token
that is associated with an authenticated client is for proper Pulsar resource (e.g. a topic
) access control when Pulsar authorization is enabled.
The Pulsar admin CLI command below shows an example of granting message publishing privilege (but no message consuming privilege) to my-test-role Role Token
for all topics within the namespace my-namespace
of the tenant my-tenant
:
In Pulsar, authorization is also pluggable. Out of the box, Pulsar has a built-in authorization provider. However, customized authorization providers can be added by extending Pulsar server authorization API.
Encryption
In Pulsar, data encryption is available as both transport encryption and end-to-end message encryption.
Transport encryption is used to protect the network transmission from being eavesdropped between a client and a Pulsar broker server (and among Pulsar servers) so that sensitive information such as JWT token and credit card information are secured.
For even stronger protection of the message exchange between a producer and a consumer, Pulsar supports end-to-end message encryption (currently supported in the Java client). Simply speaking, the end-to-end message encryption mechanism uses a dynamically generated symmetric AES key, data key, to encrypt and decrypt the messages. The client applications use an asymmetric ECDSA and RSA key pair to encrypt and decrypt the AES data key.
Security context setup for a Pulsar tenant
Establishing a security context for one Pulsar tenant will protect all the message data within this tenant from being accessed by unauthorized users, including those from other tenants. In order to achieve this, we first need to enable the security features in a Pulsar cluster.
Once the Pulsar security is enabled, the security features will be applied the same way to all tenants within the cluster, especially for encryption. From a multi-tenancy perspective, Pulsar supports enabling multiple authentication providers at the same time, so each tenant can use the authentication method of its choice if needed. The main differentiating part of Pulsar security context around tenants is from Pulsar’s authorization mechanism.
Cluster admin role
When enabling authorization in Pulsar, the following configuration parameters need to be set in broker.conf
:
Through the superUserRoles
configuration parameter, a list of role names (separated by comma) can be specified as the Pulsar cluster administrators, or in other words, super-users.
A cluster administrator is able to do all administrative operations such as managing tenants, namespaces, and topics; as well as publishing messages to and consuming messages from all topics under all tenants.
Tenant admin role
A cluster administrator can create a tenant with a list of assigned tenant administrators through the following Pulsar admin CLI command:
A tenant administrator is able to do all administrative operations within the specific tenant, such as managing the namespaces and the topics within the tenant. It is also able to publish messages to and consume messages from all topics within the tenant.
Allowed clusters for geo-replication
It’s worth mentioning that when there are multiple Pulsar clusters forming a geo-replication use case, Pulsar also has the ability to specify which clusters a particular tenant can access. This is achieved by the --allowed-clusters
option when creating the tenant.
- If this option is not provided, the messages within the tenant are able to be replicated among all available clusters.
- Otherwise, the messages within the tenant can only be replicated within the specified clusters.
Access control within a tenant
A tenant administrator can define any of the following access control privileges to a role within the tenant:
- Only publishing messages
- Only consuming messages
- Both publishing and consuming messages
The above privileges can be applied at both the namespace
level and the individual topic
level. When applied at the namespace
level, then a user (with the associated role) gets the privileges on all topics within the namespace
. Otherwise, the user only gets the privilege on one specific topic
.
The Pulsar admin CLI commands to grant the access control privileges at both the namespace
level and the topic
level are as below:
At the namespace
level, the tenant administrator can also grant permissions to a list of roles for a specific subscription, as below:
The above access control privileges can also be revoked using corresponding Pulsar admin CLI commands (or Rest API calls).
Access control with wildcard role matching
When enabling Pulsar authorization, if the authorizationAllowWildcardsMatching
parameter is set to true, the access control privileges can be granted to a set of roles matching a specific pattern. To be specific, in the above CLI command, the --role
option can be specified as in one of the following two forms (as examples):
*.my.role
my.role.*
Note: At the moment, the wildcard matching is only applicable if wildcard-char (*)
presents at first or last position.
Pulsar tenant resource utilization and segregation
In Pulsar, there is an abundance of mechanisms regarding how resource utilization can be controlled and segregated among different tenants.
Resource utilization policies
When messages are processed and stored in Pulsar, they consume the hardware resources – CPU, memory, hard drive, and network bandwidth. Out of the box, Pulsar has provided close to 40 different policies that can be used to impact resource utilization.
The majority of these policies can be set at both the namespace
level and the individual topic
level; while a few of them are only relevant at the namespace
level or at the individual topic
level. When a policy is set at both levels, the topic
level setting is taking precedence over the one at the namespace
level.
These policies can be set by using either Pulsar admin CLI commands or Pulsar Rest APIs. A few of these policies are listed below as an example:
set-publish-rate
set-retention
set-offload-threshold
set-max-message-size
For a complete list of these policies and their application levels, refer to Apache Pulsar documentation.
Resource quota
In Pulsar, a more direct resource utilization control mechanism is to set the resource quota, which can be achieved in different ways:
- message rate on an individual topic
- message rate on a namespace bundle
- message rate on a namespace
- producer and consumer number per topic
Individual topic level message rate
In Pulsar, we can set the rate (either as a number of messages or number of bytes) for message publishing, dispatching, or subscription on an individual topic
. An example of setting the message publishing rate for a topic
using the Pulsar admin CLI command is:
Namespace bundle quota
In Pulsar, when topics are assigned to brokers, it is not done on an individual topic
basis. Instead, each broker takes ownership of a subset of the topics for a namespace
. This subset is a namespace
bundle. All bundles within a namespace
will be assigned to all available brokers as evenly as possible by Pulsar.
Each namespace
bundle is relatively an independent unit and we can set the following resource quotas on it:
- Inbound bandwidth (bytes/second)
- Outbound bandwidth (bytes/second)
- Memory usage (Mbytes)
- Inbound message rate (message/second)
- Outbound message rate (message/second)
The Pulsar admin CLI command sets the namespace
bundle quota is as below:
Namespace level quota
Similar to the individual topic
level quota, we can also set a collective rate (either as a number of messages or number of bytes) for message publishing, dispatching, or subscription for all topics within a namespace
. The example of setting the collective message publishing rate for a namespace
using the Pulsar admin CLI command is as below:
Producer and consumer number per topic
In Pulsar, we can limit the maximum number of producers and consumers that are allowed to publish to one topic
. For consumers, we can even further limit the maximum number of consumers per subscription and the maximum number of unacknowledged messages per consumer. These settings will directly impact the CPU and memory utilization of Pulsar brokers. Here are examples of how to set these limits in the Pulsar admin CLI:
Server isolation
In Pulsar, resource utilization segregation can even be achieved at the server host (broker or bookie) level through broker isolation policy and bookie affinity group.
Broker isolation policy
When assigning topics to brokers, we can use the broker isolation policy to limit the set of brokers that can be used for topic assignment within a set of specified namespaces. The Pulsar admin CLI command for specifying the broker isolation policy is as follows:
We can set the namespace
isolation policy with a primary or a secondary regular expression (regex) to select desired brokers. Since a fully qualified namespace
name includes the tenant
name, we can craft the namespace
regex string in a way so that all topics from a specific tenant
can be mapped to a separate set of brokers.
If no broker matches the specified regex, we cannot create a topic
. If there are not enough primary brokers, topics are assigned to secondary brokers. If there are not enough secondary brokers, topics are assigned to other brokers which do not have any isolation policies.
Bookie affinity group
Similarly, namespaces can also be limited to a subset of bookies for message persistence. This is achieved by assigning bookies to an affinity group as shown in the Pulsar admin CLI command below:
When the bookie affinity group is set for a namespace
, the messages are written to a topic
that belong to the namespace
will be persisted to the specified primary group bookies. If there are not enough bookies in the primary group, messages will be written to the secondary group bookies.
You can establish the bookie-group relationship when setting up the bookie rackaware policies as shown below:
Comparing with Kafka multi-tenancy
According to the Kafka documentation, multi-tenancy support is also possible in Kafka. But when we delve into the details, we can see that the multi-tenancy capability in Kafka is fairly limited and primitive.
Logical topic grouping
In Kafka, the topic
structure is flat. There is no built-in hierarchical structure that we can rely on to organize the topics into groups, which is required by multi-tenancy. In order to address this issue, a common practice in Kafka is to define a topic
name with some sort of logical, prefix-based structure, such as:
Such a structure is simply a naming convention and nothing more. Any logical component within this structure by itself has no meaning at all in Kafka, and you can do nothing about it such as applying security or resource-related policies.
From the multi-tenancy perspective, a big risk associated with such a practice is that anyone with topic
creation privileges can create a topic
with a name structure in whatever the way they like. This is risky because this person may create a topic
that looks like it belongs to a different tenant
(assuming the tenancy is marked by the topic
prefix).
In order to address the above security concern, Kafka has recommended several best practices:
- Disable
topic
auto-creation feature at the broker level (in broker configuration). - Deny
topic
creation privilege for normal users and only allow dedicated administration users or processes to create topics on behalf of normal users. - Use prefix
ACL
(KIP-290) to enforce user access control to Kafka resources, like topics whose names start with a certain prefix pattern (more on this later).
But these practices are not enough to fully remedy the risk associated with Kafka’s flat topic
structure.
Kafka security feature recap
Kafka’s built-in security features also fall into similar categories as in Pulsar: authentication, authorization, and transport data encryption.
Authentication
In Kafka, users can be authenticated using either TLS-based authentication or simple authentication and security layer (SASL) based authentication. At the moment, Kafka supports the following SASL-based authentication mechanisms:
- GSSAPI (Kerberos)
- PLAIN
- SCRAM-SHA-256
- SCRAM-SHA-512
- OAUTHBEARER
SASL based authentication can be used with PLAINTEXT (SASL_PLAINTEXT) or SSL (SASL_SSL) as the transport layer protocol.
Authorization
When Kafka (broker) authenticates a client, it associates the client with a KafkaPrincipal
that represents the client’s identity with the connection. Kafka authorizes the principal associated with the connection and determines what operations are allowed.
Kafka supports a pluggable Authorizer with an out-of-box authorizer implementation AclAuthorizer
(since Kafka 2.3). Versions older than Kafka 0.9 have a built-in authorizer SimpleAclAuthorizer
that has since been deprecated.
AclAuthorizer
supports fine-grained access control for Kafka resources using access control lists (ACLs). Each ACL definition consists of the following information:
- Resource type:
Cluster|Topic|Group|TransactionalId|DelegationToken
- Pattern type:
Literal|Prefixed
- Resource name: Name of resource or prefix or the wildcard
*
- Operation:
Describe|Create|Delete|Alter|Read|Write|DescribeConfigs|AlterConfigs
- Permission type:
Allow|Deny
(Deny has higher precedence) - Principal: Kafka principal represented as
<principalType>:<principalName>
, For exampleUser:Bob
orGroup:Sales
. ACLs may useUser:*
to grant access to all users. - Host: Source IP address of client connection or
*
if all hosts are authorized.
Encryption
Transport encryption is a standard procedure to protect in-transit communications between a client and a Kafka broker, between Kafka brokers, and so on. There is no difference between Kafka and Pulsar in terms of the general procedure used. End-to-end message encryption is also possible in Kafka.
Set up security context for a Kafka “tenant”
Traditionally in Kafka, before KIP-290 was introduced, the supported semantics of resource name in ACL definition is either full resource name or special wildcard * that matches everything.
Based on these semantics, the only way to separate the security contexts for different tenants is to repeat defining the same set of ACLs on every single topic
that logically belongs to one tenant. For topics belonging to another tenant
, a different set of ACLs needs to repeat again. Practically speaking, this approach is too cumbersome to be used as a multi-tenancy security context for a Kafka tenant.
With KIP-290 (via KAFKA-6841), Kafka introduced prefixed ACLs (since version 2.0). This allows Kafka to define bulk ACLs that match all topic
names starting with a certain prefix. For example, an ACL definition might look like: Principal user1
has access to all topics that start with the prefix com.companyA
.
Despite being an improvement, this method for separating security contexts for different tenants is still very coarse and primitive.
Kafka tenant resource utilization and segregation
Resource utilization configuration
In Kafka, there are a few topic
level configuration items that may indirectly impact resource utilization, such as max.message.bytes
, retention.bytes
, and retention.ms
. You can specify these configurations either when creating a topic
(using kafka-topics.sh --config
option) or adding/updating them later (using kafka-configs.sh --add-config
option).
The problem here is that there are not many of these resource utilization-related configuration items, and it is very hard to relate them with a particular tenant.
Resource quota
In Kafka, the main resource utilization control mechanism is the resource quota, which falls into two main categories: server quota and client quota.
The server quota is applied at the broker level that impacts all clients collectively that connect to the broker. An administrator can set these quotas as:
- Rate at which the broker can accept new connections (
max.connection.creation.rate
) - Maximum number of connections per broker (
max.connections
) - Maximum number of connections allowed from a specific IP (
max.connections.per.ip
)
The client quota is more relevant with tenant-based resource control and segregation within Kafka. The client quotas can be applied to a user principal, a client-id group, or a combination of both. The client-id is a logical grouping of clients with a meaningful name chosen by the client application.
There are two types of client quotas, One type is network bandwidth quotas that define byte-rate (X bytes/sec) thresholds (since version 0.9). The other is request rate quotas that define CPU utilization thresholds as a percentage of network and I/O threads (since version 0.11).
The client quota is irrelevant with the topics, and all topics from a user group will share the resource quota limitations set on the user group.
In Kafka, there is also no way to limit the number of producers and consumers that are allowed to connect a topic
for message publishing and consuming.
Conclusion
Multi-tenancy is an architecture choice desired by many software systems because of its obvious benefits. However, it is also hard to implement, especially in a robust and complete manner.
In this article, we explored and compared the multi-tenancy features of two mainstream messaging and streaming technologies – Apache Pulsar and Apache Kafka. From our comparison, we can see that between the two technologies, the multi-tenancy support in Kafka is rather limited from both security context and resource segregation perspectives.
In contrast, Pulsar is by design, a true, robust multi-tenancy system. The tenant is a first-class citizen concept in Pulsar and sits at the very core of Pulsar’s message processing and management hierarchy. For each tenant, there is an abundance of built-in policies that can be applied to it to properly protect the tenant’s data integrity and ensure fair resource utilization.
Follow the DataStax Tech Blog for more developer stories. Check out our YouTube channel for tutorials and here for DataStax Developers on Twitter for the latest news about our developer community.
Resources
- Apache Pulsar
- Apache Kafka
- Pulsar Documentation: Retention Policies
- Pulsar Documentation: Time to live (TTL)
- Pulsar Documentation: Non-persistent messaging
- Transport Layer Security (TLS)
- Athenz
- Kerberos
- JSON Web Token (JWT)
- OAuth 2.0
- Elliptic Curve Digital Signature Algorithm (ECDSA)
- RSA (Cryptosystem)
- Apache Pulsar Documentation: Set Up a Standalone Pulsar Locally
- Kafka Documentation
- Kafka Improvement Proposal (KIP) 290: Support for Prefixed ACLs
- KAFKA-6841: Add Support for Prefixed ACLs