Unlocking Local AI: Using Ollama with Agents

DataStax
5 min readFeb 13, 2025

--

By David Jones-Gilardi

Image: Unsplash

By now, there’s a good chance you’ve heard about generative AI or agentic flows (If you’re not familiar with agents and how they work watch this video to get up to speed). There’s plenty of information out there about building agents with providers like OpenAI or Anthropic. However, not everyone is comfortable with exposing their data to public model providers. We get a consistent drum of questions from folks wondering if there’s a more secure and cheaper way to run agents. Ollama is the answer.

If you’ve ever wondered how to run AI models securely on your own machine without sharing your data with external providers, well, here you go!

If you’d rather watch this content, here’s a video covering the same topic.

Why use Ollama?

Ollama enables you to run models locally, ensuring that your data remains private and secure. Not only that, it won’t cost you any tokens. With Ollama, you can confidently run models on your hardware, knowing that your data is safe.

Getting started with Ollama

Step 1: Install the model

If you haven’t used Ollama before, you’ll need to install it locally first. Download and install the version needed for your operating system. It takes about five minutes.

Download and install Ollama

Then, navigate to the models section and select tools. It’s crucial to choose models that support tool calling when you want to build an agent.

Choose "models" in the menu
Choose "Tools" from the model filter at the top of the page

For this post, we’ll use Alibaba’s Qwen 2.5 7 billion parameter model, which is a great choice for local tool calling and agent interactions. It’s only a 4.7GB download (Llama 3.1 405b is 243GB!) and is suitable to run on most machines.

Copy the "ollama run qwen2.5" command

Copy the installation command and paste it into your terminal after installing Ollama. Once the download is complete, you’re ready to start working with the model!

Execute "ollama run qwen2.5" in your terminal to download and install the model in Ollama

Step 2: Setting up Langflow

Next, we’ll use Langflow, a visual IDE that enables you to build generative and agentic AI flows in a low-code or no-code environment. If you’re not familiar with Langflow, check out this link for more information.

1. Install Langflow: Use “uv pip install langflow” in your terminal to install Langflow locally.

2. Create a New Flow: Choose the “Simple Agent” template.

Choose the "Simple Agent" template from the provided options

Once opened, you’ll see a ready-made simple agentic flow complete with an agent (defaulting to OpenAI’s gpt-4o-mini LLM), both URL and calculator tools, and chat input and output components.

The Simple Agent flow defaults to using OpenAI's gpt-4o-mini

Transitioning to Ollama

Now, let’s switch from OpenAI to Ollama:

1. Select custom model — In the model provider list, choose the custom option.

Since our goal is to use Ollama and not OpenAI, click the “Model Provider” dropdown in the agent component and choose “Custom.”

Choose "Custom" from the Language Model dropdown

2. Add Ollama component: Drag and drop the Ollama model into your flow and connect the “Language Model” nodes.

Pull the Ollama model component over from the left hand side menu and connect the "Language Model" nodes

3. Refresh the model list and choose qwen2.5 — Make sure to refresh the model name dropdown to populate the available models. It’s essential to have Ollama running locally for this setup to work.

To use an Ollama model with your agent, it must support tool calling. In Langflow, enable the “Tool Model Enabled” radio button to filter models that have this capability. Once enabled, select qwen2.5 for your operations.

Ensure the enable the "Tool Model Enabled" radio button to filter on tool enabled models only

Running your query

Now, let’s run a query using the Ollama model. Open the “Playground” and try typing in an example like “convert 200 USD to INR”. If everything is wired up correctly, the model will attempt to answer your query using the tools at its disposal.

Open the Playground using the top right hand corner menu
Try an example like "convert 200 USD to INR"

Keep in mind that local models may take longer to process, especially larger ones. However, Qwen 2.5 is optimized for smaller machines, making it pretty solid for local use.

Experimenting with inputs

When working with smaller local models, you may need to experiment with your inputs. Sometimes, you might have to explicitly instruct the model to do something, like use the web to find the latest exchange rates. Adjusting the model’s temperature settings can also help; starting with a conservative value (like 0.10) is a good practice, but feel free to increase it for more creative responses.

The smaller model didn't quite get it right the first time. Give it a nudge with an extra instruction like "use the web to get the latest exchange rates"

Notice how the agent updated its approach when I told it to “use the web to get the latest exchange rates” and gave the correct answer. This time, it used the URL tool to grab the latest exchange rate from the web as compared to relying solely on the knowledge it was trained on.

Now we can see it properly used the URL tool to fetch exchange rates

Conclusion

Finally, once your Ollama agent is set up within Langflow, you can integrate it into your applications via API, allowing you to enable your apps with full agentic capability.

Use the API option to connect AI flows to your applications

That’s all it takes to harness the power of local models securely with Ollama and your agents. If you have any questions or need further assistance, feel free to reach out on our Discord.

Happy coding!

--

--

DataStax
DataStax

Written by DataStax

DataStax provides the real-time vector data tools that generative AI apps need, with seamless integration with developers' stacks of choice.

No responses yet