DGX Spark Series (Part 4): Setting up the Hermes Agentic Assistant

Jared Lander
Jun 16
8 min read

Everyone wants an AI assistant. Here's how to build one that keeps your data where it belongs.

Every company I talk to wants the same thing right now: an AI assistant that handles email, calendar, scheduling, code and research, without shipping the company’s data off to someone else’s servers. The catch is that “give an agent access to everything” and “keep our data governed and compliant” usually feel like opposites. They don’t have to be. I built a private, fully-local assistant on a single DGX Spark. What came out the other side is a useful pattern for how an organization might stand one up for itself.

The internet is buzzing with agentic assistants like OpenClaw, NanoClaw, NemoClaw and KiloClaw, which we recently set up with our friend Alex Gold (and recorded on YouTube). Perhaps the most talked about, recently, is Hermes from Nous Research.

Getting it ready for use involves a combination of command line and menu-driven steps, the latter still running through the CLI. I will walk through everything needed to get it working, though I will hand-wave a bit at some of the gateway implementations. The same steps that gave me a personal assistant are the steps a team would follow to give every employee one, or to give the company itself one.

Local Ollama Model

While Hermes can be used out of the box with one of the commercial LLM providers like OpenAI and Anthropic, I have this powerful GPU with the DGX Spark that I wanted to use. More importantly, since this assistant would process potentially sensitive details, I wanted a fully-local setup. For a privacy conscious company, that “more importantly” is the whole ballgame. Fully-local gives you a much better starting point: the model calls do not have to leave your network. That does not make you compliant by itself, but it gives governance, residency and privacy teams something real to work with. Thankfully, I already have Ollama running on the DGX, which makes this very easy.

Getting a new model into Ollama is fairly simple. From the command line run:

After it downloads you can run it like any other model in Ollama, either from the chat interface or through the API.

For example, to say “hi” to the model via curl, you can run this.

This returns the following:

Our Ollama runs inside a Docker container, for isolation and easy deployments. This is a pattern we recommend for almost any local model deployment, and one we are always happy to talk through.

Why Gemma4?

There is no shortage of open source models to choose from, so why did we go with Gemma4? In short, Gemma4 had a lot of people excited, saying things like, “this is the first local model that feels like paid models.” But that is something said about almost every new model these days. The more honest answer is that it seemed like a sensible starting point and well suited to the DGX. The lesson for a team is less about Gemma4 specifically and more about the workflow: pick a starting point, then let real usage tell you whether to move.

Why Gemma4:26b?

Gemma4 comes in a number of different sizes. There is often a perception that more parameters means a better model. But more parameters also means slower compute. I first tried the 31b version and found it really slow. Since I am using this as a sort of personal assistant, I wanted a faster model. In my experience 26b is significantly faster while being just about as “good,” however that is defined. This is largely because 31b is a dense model and 26b is a mixture of experts.

This is exactly the tradeoff every team deploying local models runs into — the biggest model is rarely the right model. You are balancing quality against latency and the cost of the hardware you have to keep fed, and for an interactive assistant that people are waiting on, a faster mixture-of-experts model that is “about as good” usually wins. That decision is worth making deliberately rather than defaulting to the largest number you can fit.

Initial Hermes Setup

We will be running Hermes in a Docker container, so we will ultimately use Docker Compose, but for the first setup we want to run the container by itself. The Hermes stack will live on a different machine than Ollama, so we will use a little Tailscale magic to make sure they can talk to each other. These extra steps are unnecessary if you are running Hermes on the same machine as Ollama, or if you are using a commercial LLM provider. The payoff for going through them is reproducibility. Containerized services wired together over a private network are deployable infrastructure, not a one-off someone has to remember how to rebuild.

This command starts the initial Hermes setup, which is menu driven:

This brings up a list of providers:

Depending on the selected model, you will receive different instructions. To use our local Ollama model we select “Custom endpoint (enter URL manually)” and then enter “https://local.domain.com/ollama/v1” for the URL. Notice there is no trailing slash after “v1”.

You can leave the API Key blank if you are using Ollama (unless you have put up some security around Ollama) and set the Context Length to 262144 so that you can use plenty of tokens in your conversations.

Next, you will be asked to set up a gateway, which is how you communicate with Hermes, via another menu:

I initially set up Slack and Telegram. Some of the other gateways verge too close to violating service terms or make me worried about leaking personal information, so I left them alone. That instinct is the same one a company should apply — meeting people where they already work (Slack, Teams) is great, but every gateway you turn on is another surface to govern, so turn on what you need and leave the rest.

There is a fair amount of prework that needs to be done before setting up gateways (bot tokens, app registrations and such), so rather than walk through each platform, I will point you to this KiloClaw end-to-end guide as a starting point.

Docker Compose

After the initial setup, it is time to persist the Hermes agent. Similar to the docker run command above, this will handle Tailscale and volumes for persistence. I will briefly explain the different services, then add annotations in comments. This Compose file is the deliverable, the thing you check into version control and hand to whoever operates it. Once this is in Compose, the setup becomes something you can review, version and rebuild.

ts-hermes: This is a sidecar for Hermes so that it can funnel all of its traffic through Tailscale. If you are not using Tailscale then this is not necessary.
hermes: This is the main container where Hermes does all of its work (and dispatches to the LLM). All of its traffic is routed through the ts-hermes container so that it can communicate with other devices in our Tailnet. It needs one volume to store its settings and memory. I provided an optional volume for it to have a persistent workspace for git repos.
dashboard: While this uses the same container as hermes, its function is a user-friendly dashboard.

Adding Models or Gateways

After Hermes is set up with Docker Compose, you can add additional models by running the appropriate setup command using docker run like this.

Giving the Agent Capabilities Without Giving Up Control

Now you can interact with Hermes via your favorite messaging platform. You can use it as a director for having other agents write code, as a researcher, as a personal assistant or as whatever else you can imagine. The key pattern is telling Hermes to remember work it has done, or to build skills, and that allows it to improve over time. I gave it the URL of a YouTube video and it remembered how to summarize the video for the future. That is the part that matters most for a business case: this is not a tool that performs the same on day one and day one hundred. With persistent memory and a growing library of skills, the assistant compounds in value the longer a team uses it, which is the opposite of how most software ages.

Beyond the standard chat interface, I gave Hermes greater capabilities by setting it up with a Lander Analytics Workspace account, meaning it can access email, Google Drive and Calendar. This is where the security pattern lives, and it is one any company can copy. The agent gets its own identity, not mine. It can only access its own email, which only contains messages where it was CC’d. It can only access Drive files shared with it. And for the calendar, I shared mine with just meeting times and not content, so it knows how to schedule for me without reading what the meetings are about. That is least-privilege access, scoped deliberately so the agent has exactly what it needs and nothing more.

I did the same thing on the engineering side by giving Hermes its own GitHub account in my organization. Every code change it makes is committed under that account; you can look at any commit and know a human or the agent made it, and you can audit accordingly. Scoping access this tightly, a dedicated identity, narrow permissions, an audit trail, is, in my view, the whole point of running an agent yourself. You get the capabilities without handing over more than you intend to, and you get the paper trail a regulated business needs.

What is Next?

As I get comfortable with what Hermes can do, I want to make it more autonomous, moving from “ask it to do a thing” toward scheduled tasks where it takes in information and acts on it on its own. For a personal assistant that means it starts handling the recurring stuff before I ask. For a team, that is where this stops being a chat toy and starts being infrastructure: an assistant that triages the morning’s email, prepares the day’s schedule, kicks off a research summary, or opens a pull request, on a schedule, attributable, and entirely on your own hardware. Self-hosted open models, scoped credentials, containerized deployment, agents with persistent memory and accumulating skills, that is what enterprise agentic infrastructure is going to look like, and you can see the shape of it on a single DGX Spark today. I will write again as I make progress.

This is the kind of problem we work on at Lander Analytics. We help teams stand up local and private LLM infrastructure, deploy agentic assistants safely with scoped access and real auditability, and decide when self-hosted open models are enough versus when you need commercial APIs. Whether you want to give your people a private AI assistant that keeps company data in house, or you are sketching out what your enterprise agent stack should look like, reach out at info@landeranalytics.com. We’re here to help.

Jared P. Lander

Founder and Chief Data Scientist

Lander Analytics

Subscribe to our Substack and below to our monthly emails for practical AI strategies for your organization: what to build, what to avoid, and how to make systems reliable in the real world.

Work with us: If you want help identifying the right first workflow, building a permissioned knowledge base, or training your team to ship responsibly, reach out at info@landeranalytics.com.

About the author: Jared P. Lander is Chief Data Scientist and founder of Lander Analytics, where he helps organizations build practical, measurable AI workflows grounded in strong data foundations.