The Guerrilla Guide to Building E-commerce Product Services with DataStax Astra DB

Author: Aaron Ploetz

DataStax
10 min readFeb 15, 2022

Many of the world’s top retailers rely on distributed databases like Apache Cassandra and DataStax Astra DB. In this post, we demonstrate building scalable, distributed data models and E-commerce product services using Java, Spring Boot, and Astra DB.

Distributed databases are often the “silent partner” of many large-scale E-commerce platforms. This is especially true during holiday or pandemic shopping when retail enterprises tend to see sharp increases across multiple digital sales channels.

Whether you work for a start-up or a well-established enterprise, this post will advise you on industry practices when building distributed data models and E-commerce product services. We’ll discuss techniques, strategies, and multiple considerations on building or improving an E-commerce product backend.

Let’s get started.

Focus on the database

All web and mobile applications demand speed and throughput at a tremendous scale. Efficient code and proper image handling are now staples of web development, and most development teams focus on achieving them. However, proper engineering of the backend database model is where many teams tend to stumble.

This post focuses on getting that part right.

To start, you’ll need to register for a free AstraDB account. If you need help with this, follow the steps in our DataStax Devs E-commerce workshop on GitHub.

Data Model design considerations

Modern E-commerce websites are made up of several subsystems, and building a complete, large-scale E-commerce site is beyond the scope of any single blog post. So, this post will focus on the data modeling and service architecture for product services in particular.

To start, we’ll break down “product-service” into three smaller components:

  • Category Navigation
  • Pricing
  • Product

You can see the physical data model behind these components in Figure 1 below.

Figure 1. Data model for E-commerce product services, abridged for space. Partition keys (PK) and clustering keys (CK) are clearly noted, along with the logical many-to-one relationships between the tables.

Typically, we would go through a process of data model refinement, progressing between conceptual, logical, and physical models. Artem Chetbotko expertly demonstrates this process in his Shopping Cart Data Modeling Example.

Key Selection

The data models shown in Figure 1 have partition keys that are designed to support high-cardinality values. That is, the columns chosen as the partition key are intended to have many, many potential values. This helps to ensure even data distribution with Astra DB.

On a read or write operation, the partition key value is put through a consistent hash, returning a numeric token between -2⁶³ and +2⁶³–1. Each node in the database cluster is responsible for certain token ranges, helping to distribute the data. So the higher the number of different partition key values used, the better the data will distribute around the database cluster.

The clustering keys serve two purposes:

  1. Ensure uniqueness
  2. Enforce on-disk sort order

With the data models shown in Figure 1, we’re more concerned with ensuring uniqueness. Primary keys (the combination of partition and clustering keys) in Astra DB are unique. That means it’s important to make uniqueness a part of the data model design effort.

Sort order is typically more important in time series models. As our use cases aren’t time-based, the sort order isn’t as much of a concern, and the default “ascending” sort order will suit us just fine.

Category Navigation

All E-commerce transactions begin with product selection. The idea is that you present a user with a list of “top-level” categories. When they select one of those categories, a new list of categories is returned. All these needs is a simple service to support querying by the parent identifier.

Essentially, the navigation of the product hierarchy happens as the service is called recursively. This process continues until the user has “navigated” their way to the “bottom level” hierarchy categories, where the products list becomes populated.

Pricing

The price table is very simple, and was built with two design considerations:

  1. The web may not be the only place that a retailer is selling their products.
  2. A product’s price has to be modeled separately from the product data.

For the first consideration, the table was designed with an identifier for a store, allowing a single product to have many prices. This provides flexibility in pricing so that a retailer can account for geographic differences. The fact remains that there’s more cost associated with getting products to a store in Upper Manhattan, versus a store in Sheboygan, Wisconsin. Additionally, this design choice allows for a store ID of “web” (“internet,” “online,” etc.) to implement a specific pricing model for products sold on the website.

For the second point, many E-commerce tutorials implement the price simply as a property on the product model. In reality, however, pricing data is subject to different read and write patterns than product data. Pricing data can fluctuate fairly often, while product data tends to be more static. Pricing data will also be read more often and from varying digital channels/devices (web, internal reports, store points of sale, employee handheld devices, etc.).

Price precision

Quick reminder: when storing and working with currency, make sure to choose an appropriate data type. Most databases have a specific, preferred data type for currency. The preferred type for storing currency values in Astra DB is the DECIMAL type, which maps to the BigDecimal Java type.

Traditional floating-point data types like FLOAT and DOUBLE are known as fixed precision types. They store their values in base-2 (binary), meaning the values are converted from the common base-10 system. All of the binary precision points need to have a value (of one or zero), regardless of how many are actually necessary. Because of this, rounding errors become likely when converting between binary and base-10, especially as the value increases (on either side of the decimal point).

On the other hand, types like DECIMAL (Cassandra/Astra DB) and BigDecimal (Java) are arbitrary precision types, storing their values in base-10. This makes them ideal for storing numbers where you need to represent an exact amount. For more information, I recommend this Q&A about double vs. big decimal on Stack Overflow.

Product

Finally, the product services table was designed to be queried by product identifier and contains columns such as a product’s name, description, and brand. One specific feature of this model is that the images column is defined as a collection, which allows a product to be linked to multiple images. We could also take the same approach with linked documents (material safety data sheets, user manuals, etc.).

Remember that the product identifier needs to locate an exact product. This not only helps narrow down and track the exact item that the customer ordered, but it prevents a warehouse worker from having to dig through a bin trying to find the last blue 3XL hoodie. Basically, whichever alpha-numeric naming system you use, just be sure that the “same” products with different colors and sizes all have their own unique product identifiers.

Service layer

We built a service layer with Java and Spring Boot, which also builds repositories via Spring Data. The nice thing about building services with Spring Boot is that it’s also easy to build and test RESTful endpoints. Here we’ll show how we built the product service, starting with the REST controller.

When building the product controller (Figure 2), we want to define the following:

  • The appropriate mapping and HTTP response codes: Here, our mapping serves a GET request on the /product/{productid} endpoint.
  • Our response for a HTTP 200 (success): In the actual class file (see our GitHub repo), there are additional definitions for HTTP codes 400, 404 and 500, which were omitted here for brevity.

Our mapping also ties the endpoint definition to the findByProductId method, which takes an HttpServletRequest object and defines the mapping for the productid parameter. We use the productid in the findById method of the product repository.

As shown in Figure 3, the ProductRepository is simply a Java interface that extends implicit methods from the underlying Spring Data CassandraRepository. Since our product query requirements are fairly simple, we don’t need to add any code to the ProductRepository class. The findById method is made available by the CassandraRepository, and will suit our needs just fine.

Note that the CassandraRepository provides many data access methods, several of which are not appropriate for use with Apache Cassandra or Astra DB. Things like count() and findAll() are just two that you should avoid (honestly, be suspicious of any method containing the word “All”), since they invoke full table scans.

In the same vein, any repository method that takes an Iterable<String> is either doing a multi-key query or a multi-partition batch; neither of which will run efficiently. When deciding to use the delivered repository methods of Spring Data Cassandra, consider what the WHERE clause of that method would look like in CQL, and go from there.

The ProductRepository is configured to return objects of ProductEntity. The entity class allows us to provide a mapping to the table definition in Astra DB. We can start by annotating the class definition with the table name: @Table(“product”).

Each column on the table is defined as a private scoped property, annotated with its defined data type (using @CassandraType) and either the @Column or @PrimaryKey annotation with its column name. The column/key name mapping, in particular, is how differently-worded column names are mapped from AstraDB to Java (ex: “product_id” in Cassandra, to “productId” in Java). Each property also gets its own pair of public getter/setter accessor methods.

Also note that in the case of a compound primary key, you’ll need to use an additional primary key class. Inside this class, you should properly map the individual parts of the key with the @PrimaryKey annotation, where you can specify a partitioned or clustered key type. See the CategoryPrimaryKey.java or PricePrimaryKey.java classes in GitHub for more information.

The only other class to be concerned with is the Product class, which our RESTful controller method returns. Fortunately, Product is a plain old Java object (POJO) that’s just the collection of column values we’ve read from the database and returned in our HTTP response.

The final piece of the service layer is to build a short main class (Figure 5) to invoke Spring Boot.

Putting it all together

With the data models and service layers ready, we will build a simple web UI that calls the service layers for data. From there, we can run the site and see what happens with each service call we’ve discussed in this post.

Category

We start by clicking on the “Products” tab. To seed the product navigation, we define the top-level parent identifier (ffdac25a-0244–4894-bb31-a0884bc82aa9) as a constant in our web code. By passing that parentId to the category service, the topmost categories in our product hierarchy are returned: Clothing, Cups and Mugs, Tech Accessories, and Wall Decor.

Figure 6. Product hierarchy navigation.

Our product navigation is further rounded out, as each selected categoryId is sent back into the category service (as a parentId). This returns the next level categories of our product hierarchy, as shown above in Figure 6: Hoodies, T-Shirts, Jackets, Travel Mugs, Coffee Mugs, and Cups.

Figure 7. Navigating to the product group level.

If we click on “T-Shirts” from the product navigation shown in Figure 6, we’re taken to the product group level. In Figure 7, four product groups have been returned and are available for us to click on, revealing the product page.

Product and price

Figure 8. The product page.

The final view shown in Figure 8 shows the results of the product and price services. On this page, we can see the full product description, choose the appropriate size for the product, and then see the updated price for each sizing option (larger sizes cost more). From there, we can add the product to the shopping cart.

To conclude

In this post we discussed data modeling, implementation, and other design considerations for an E-commerce product backend system. In brief, here are the key takeaways:

  • Price data is subject to change, so it should be in its own table.
  • Prices should be stored using arbitrary precision types like DECIMAL and BigDecimal.
  • Consider how the “top level” of the product hierarchy will be discovered.
  • Each product should have an identifier that distinguishes it from similar products with different attributes (color, size, etc.).
  • Define helpful descriptions for the HTTP response codes in the RESTful controller.
  • The Entity classes let you map the column names to the property names (translate between snake case and camel case).
  • Be careful when using certain predefined repository methods. They might be doing things “under the hood” that don’t work well with distributed databases.

Note that there are many other subsystems that help make a good E-commerce application (shopping cart, user profile, etc.). What you just read is a simple approach to address the basic needs for interacting with Astra DB in a scalable way. While the aspects described above may not address all of the nuances of building similar systems, this should be a good foundation to get you started.

Follow the DataStax Tech Blog for more developer stories. Check out our YouTube channel for tutorials and here for DataStax Developers on Twitter for the latest news about our developer community.

Resources

  1. Shopping Cart Data Modeling Example | DataStax
  2. DataStaxDevs Workshop — Building an E-commerce Website
  3. Double vs. BigDecimal? — Stack Overflow
  4. DataStax managed services powered by Apache Cassandra | DataStax Astra DB

--

--

DataStax
DataStax

Written by DataStax

DataStax provides the real-time vector data tools that generative AI apps need, with seamless integration with developers' stacks of choice.