By now, there’s a good chance you’ve heard about generative AI or agentic flows (If you’re not familiar with agents and how they work watch this video to get up to speed). There’s plenty of information out there about building agents with providers like OpenAI or Anthropic. However, not everyone is comfortable with exposing their data to public model providers. We get a consistent drum of questions from folks wondering if there’s a more secure and cheaper way to run agents. Ollama is the answer.
If you’ve ever wondered how to run AI models securely on your own machine without sharing your data with external providers, well, here you go!
If you’d rather watch this content, here’s a video covering the same topic.
Why use Ollama?
Ollama enables you to run models locally, ensuring that your data remains private and secure. Not only that, it won’t cost you any tokens. With Ollama, you can confidently run models on your hardware, knowing that your data is safe.
Getting started with Ollama
Step 1: Install the model
If you haven’t used Ollama before, you’ll need to install it locally first. Download and install the version needed for your operating system. It takes about five minutes.
Then, navigate to the models section and select tools. It’s crucial to choose models that support tool calling when you want to build an agent.
For this post, we’ll use Alibaba’s Qwen 2.5 7 billion parameter model, which is a great choice for local tool calling and agent interactions. It’s only a 4.7GB download (Llama 3.1 405b is 243GB!) and is suitable to run on most machines.
Copy the installation command and paste it into your terminal after installing Ollama. Once the download is complete, you’re ready to start working with the model!
Step 2: Setting up Langflow
Next, we’ll use Langflow, a visual IDE that enables you to build generative and agentic AI flows in a low-code or no-code environment. If you’re not familiar with Langflow, check out this link for more information.
1. Install Langflow: Use “uv pip install langflow” in your terminal to install Langflow locally.
2. Create a New Flow: Choose the “Simple Agent” template.
Once opened, you’ll see a ready-made simple agentic flow complete with an agent (defaulting to OpenAI’s gpt-4o-mini LLM), both URL and calculator tools, and chat input and output components.
Transitioning to Ollama
Now, let’s switch from OpenAI to Ollama:
1. Select custom model — In the model provider list, choose the custom option.
Since our goal is to use Ollama and not OpenAI, click the “Model Provider” dropdown in the agent component and choose “Custom.”
2. Add Ollama component: Drag and drop the Ollama model into your flow and connect the “Language Model” nodes.
3. Refresh the model list and choose qwen2.5 — Make sure to refresh the model name dropdown to populate the available models. It’s essential to have Ollama running locally for this setup to work.
To use an Ollama model with your agent, it must support tool calling. In Langflow, enable the “Tool Model Enabled” radio button to filter models that have this capability. Once enabled, select qwen2.5 for your operations.
Running your query
Now, let’s run a query using the Ollama model. Open the “Playground” and try typing in an example like “convert 200 USD to INR”. If everything is wired up correctly, the model will attempt to answer your query using the tools at its disposal.
Keep in mind that local models may take longer to process, especially larger ones. However, Qwen 2.5 is optimized for smaller machines, making it pretty solid for local use.
Experimenting with inputs
When working with smaller local models, you may need to experiment with your inputs. Sometimes, you might have to explicitly instruct the model to do something, like use the web to find the latest exchange rates. Adjusting the model’s temperature settings can also help; starting with a conservative value (like 0.10) is a good practice, but feel free to increase it for more creative responses.
Notice how the agent updated its approach when I told it to “use the web to get the latest exchange rates” and gave the correct answer. This time, it used the URL tool to grab the latest exchange rate from the web as compared to relying solely on the knowledge it was trained on.
Conclusion
Finally, once your Ollama agent is set up within Langflow, you can integrate it into your applications via API, allowing you to enable your apps with full agentic capability.
That’s all it takes to harness the power of local models securely with Ollama and your agents. If you have any questions or need further assistance, feel free to reach out on our Discord.
Happy coding!