How to Load Image Embeddings into Astra DB with Roboflow Inference

DataStax
7 min readNov 28, 2023

By James Gallagher, Roboflow, and Aaron Ploetz, DataStax

Image generated with DALL-E 2

Embeddings have taken the fields of natural language processing and computer vision by storm. Embeddings are a numeric representation of images and text that encode semantics. This property has a range of applications.

For example, you can use image embeddings to find images related to a text prompt for use in an image search engine, cluster image datasets, and more.

As you build an application with embeddings, you need a place where you can store your embeddings and query them at scale. With DataStax Astra DB’s vector database, you can store embeddings for use in language and machine vision tasks.

In this guide, we’ll show how to generate CLIP image embeddings to store in Astra DB using Roboflow Inference, an open source inference server. These embeddings can then be queried for use in a range of computer vision applications. By the end, you’ll have a vector database ready for use in computer vision applications that use embeddings.

We have created a Notebook that you can run to follow along with this guide.

Let’s get started!

What is CLIP?

Contrastive Language-Image Pre-Training (CLIP) is a computer vision model developed by OpenAI. CLIP enables you to calculate text and image embeddings that can be compared. You can compare image embeddings to identify the similarity between two images. You can also compare text embeddings with image embeddings to find images related to a text prompt.

The CLIP model architecture

Read more about use cases for CLIP. To use CLIP embeddings in an application, you need to:

  1. Calculate the embeddings for text and images
  2. Store your embeddings in a vector database for use in your application.

In this guide, we will calculate CLIP embeddings with Roboflow Inference, which enables you to use a range of state of the art models across vision domains, from object detection to segmentation to optical character recognition.

You can use both foundation models like CLIP, which require no additional training, and fine-tuned models through an HTTP architecture. With Inference, we don’t need to write any model execution logic. Instead, we can make an HTTP request.

Let’s walk through all the steps to calculate image embeddings and load them into Astra DB.

Step #1: Install Roboflow Inference

If you are using Inference in a Notebook, you can use the hosted Roboflow Inference server and skip this installation process.

Roboflow Inference can be used via Docker and pip.

For this guide, we’ll show how to set up Inference with Docker. The benefit of using Docker is you can separate the embedding process from your application logic. This is particularly useful for applications that need to scale to thousands of queries (i.e. video processing, image search, and creating embeddings for exploring large datasets).

If you do not already have Docker installed, visit the official Docker installation instructions and follow the steps required to install Docker on your device. Next, install the Roboflow Inference CLI:

pip install inference-cli

To set up an Inference server with which you can calculate image embeddings, run:

inference server start

We now have a server ready to accept embedding requests. Next, let’s set up an Astra DB vector database. Once we have our database set up, we can come back to our Inference server and start making requests.

Step #2: Set up a DataStax AstraDB vector database

Create a free DataStax account. Once you have an account, follow the DataStax guide to create a database in Astra DB. This database is where we will store our embeddings. During the database setup process, select the “Vector Database” option:

With a database ready, you can begin configuring and loading embeddings into the database.

First, install the required dependencies for this project:

pip install cassandra-driver supervision inference

Create a new Python file and add the following code:

import os

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import base64
import requests

import cv2
import supervision as sv

ASTRA_DB_TOKEN_BASED_PASSWORD = os.environ[“ASTRA_DB_TOKEN”]
ASTRA_DB_KEYSPACE = os.environ[“ASTRA_DB_KEYSPACE”]

SECURE_CONNECT_BUNDLE_PATH = “secure-connect-vector-search-db.zip”
ASTRA_CLIENT_ID = 'token'
ASTRA_CLIENT_SECRET = ASTRA_DB_TOKEN_BASED_PASSWORD
KEYSPACE_NAME = ASTRA_DB_KEYSPACE
TABLE_NAME = 'images'

cloud_config = {
'secure_connect_bundle': SECURE_CONNECT_BUNDLE_PATH
}
auth_provider = PlainTextAuthProvider(ASTRA_CLIENT_ID, ASTRA_CLIENT_SECRET)
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider, protocol_version=4)
session = cluster.connect()

print(f"Creating table {TABLE_NAME} in keyspace {KEYSPACE_NAME}")
session.execute(f"CREATE TABLE IF NOT EXISTS {KEYSPACE_NAME}.{TABLE_NAME} (id int PRIMARY KEY, name TEXT, description TEXT, item_vector VECTOR<FLOAT, 512>)")

print(f"Creating index image_ann_index on table {TABLE_NAME} and inserting example data")
session.execute(f"CREATE CUSTOM INDEX IF NOT EXISTS image_ann_index ON {KEYSPACE_NAME}.{TABLE_NAME}(item_vector) USING 'StorageAttachedIndex'")

print(f"Truncate table {TABLE_NAME} in keyspace {KEYSPACE_NAME}")
session.execute(f"TRUNCATE TABLE {KEYSPACE_NAME}.{TABLE_NAME}")

There is a lot going on in this code so let’s walk through it step by step. First, we import the required dependencies. We then specify a few configuration variables that we will use to authenticate with AstraDB. We then use those variables to authenticate and start a session.

Next, we run three commands on our database:

  1. We create a table called “images”
  2. We create an index on the “images” table
  3. We truncate the “images” table

Replace “images” in the code above with the name of the database you created in Astra DB.

This code will do everything you need to prepare your database. Note that you only need to run the database creation and setup commands once; when your database is configured, you can remove the setup commands from your script.

To run the code, you will need to download your Secure Connect bundle from DataStax. Rename your bundle to “secure-connect-images.zip” and make sure the folder is in the same directory as the Python script you are writing.

Now that we have a database, we can start generating embeddings.

Step #3: Load embeddings into the database

For this guide, we are going to work with the COCO 128 dataset. This dataset contains a wide variety of images, which makes it ideal for showing the capabilities of vector search with Astra DB. You can download the dataset for free from Roboflow Universe. You can also use your own dataset.

Add the following code to the Python file you created earlier:

IMAGE_DIR = "images/"
API_KEY = os.environ.get("ROBOFLOW_API_KEY")
SERVER_URL = "http://localhost:9001"

results = []

for i, image in enumerate(os.listdir(IMAGE_DIR)):
#Define Request Payload
infer_clip_payload = {
#Images can be provided as urls or as bas64 encoded strings
"image": {
"type": "base64",
"value": base64.b64encode(open(IMAGE_DIR + image, "rb").read()).decode("utf-8"),
},
}

res = requests.post(
f"{SERVER_URL}/clip/embed_image?api_key={API_KEY}",
json=infer_clip_payload,
)

embeddings = res.json()['embeddings']

image = (i, image, "description", embeddings)

results.append(image)

for result in results:
session.execute(f"INSERT INTO {KEYSPACE_NAME}.{TABLE_NAME} (id, name, description, item_vector) VALUES {result}")

This code loads all images in a folder of images, sends the image to an instance of Roboflow Inference to calculate embeddings, then records the embeddings and associated image names to a list. At the end of our script, we add each image and embedding record to the database.

To use the code above, you will need a Roboflow API key. Learn how to retrieve a Roboflow API key. Run the following command to set up your API key in your environment:

export ROBOFLOW_API_KEY=””

If you want to compute embeddings in the cloud, you can do so using the infer.roboflow.com server. You can also run your Inference server locally, as described at the beginning of this guide. If you use a local Inference server, replace https://infer.roboflow.com with http://localhost:9001 in the code snippet above.

When you run the code above, your image embeddings will be indexed in Astra DB and made available for use in querying. Now for the fun part: let’s try out our database with a query!

Step #4: Test run a vector search

Suppose we want to find all images that match the text prompt “cat.” We can do so by querying our Astra DB vector database with a text embedding generated using CLIP. To generate the text embedding, we can use Roboflow Inference. Let’s search for cats.

infer_clip_payload = {
"text": "cat",
}

res = requests.post(
f"{SERVER_URL}/clip/embed_text?api_key={API_KEY}",
json=infer_clip_payload,
)

embeddings = res.json()['embeddings']

for row in session.execute(f"SELECT name, description, item_vector FROM {KEYSPACE_NAME}.{TABLE_NAME} ORDER BY item_vector ANN OF {text_emb.tolist()} LIMIT 1"):
print("\t" + str(row))
plt.title(row.name)
image = cv2.imopen(row.name)
sv.plot_image(image)

This code will calculate a text embedding for a search query (i.e. “cat”), then run an approximate nearest neighbor search in the Astra DB vector database. The top result will be displayed in the notebook. Let’s run our code and see what happens:

Our code successfully identified a cat in our dataset. Great!

Conclusion

The vector search capabilities in Astra DB provide a scalable way in which you can store image and text embeddings for use in building applications. To retrieve image embeddings for use in vision projects, you can use Roboflow Inference. Inference is the technology that Roboflow uses to power millions of API calls to large enterprises each month.

After walking through how to load image embeddings into Astra DB, you now have all the knowledge you need to start building applications with image embeddings and vector search.

If you haven’t already, try Astra DB for free.

--

--

DataStax

DataStax provides the real-time vector data tools that generative AI apps need, with seamless integration with developers' stacks of choice.