Third Language, Same Result: MultiHopRAG in Go

We added a third language. Go compiles to a single static binary, uses goroutines instead of threads, and embeds its vector store in-process. No runtime, no server, no dependencies.

Accuracy: 81.6%. Within 2.2 percentage points of Python and C#.

	Python	C#	Go	Delta (max)
Overall Accuracy	82.9%	83.8%	81.6%	2.2pp
Avg Iterations	1.9	1.4	2.6	+0.7

All three use the same model (gpt-oss-120b via Fireworks AI) and the same embeddings (nomic-embed-text). Same framework abstractions. Different everything else.

All three outperform every published baseline on MultiHopRAG.

Per-Type Breakdown#

Query Type	Python	C#	Go	Best Prior (IRCoT + RAG)
Inference	88.0%	91.3%	87.7%	80.1%
Comparison	78.2%	78.4%	73.9%	66.2%
Temporal	76.5%	74.4%	75.0%	60.4%
Null	94.7%	97.3%	99.3%	93.5%

Go scores highest on null queries at 99.3%, misclassifying only 2 out of 301. On temporal queries it lands between Python and C#. The gap is on comparison queries, where metadata filtering limitations in the embedded vector store make source-specific retrieval harder.

What Changed This Time#

Component	Python	C#	Go
Language	Python 3.12	C# / .NET 10	Go 1.22
Vector Store	ChromaDB (external)	LiteDB (embedded)	chromem-go (embedded)
Code Execution	AST sanitizer + `exec()`	Roslyn scripting	AST interpreter (core-go)
Metadata	Runtime introspection	Source generators	`go generate` directives
Type System	Pydantic `GoalContext`	Generic `GoalSeeking<T>`	Go structs + interfaces
Concurrency	Synchronous	`async/await`	Goroutines + semaphore
Packaging	`pyproject.toml` + uv	NuGet + `dotnet` CLI	`go.mod` -> single binary

Three languages. Three runtimes. Three vector stores. Three concurrency models. The framework abstractions (10 primitives, the introspection boundary, the goal-seeking loop) remain identical. The decompositions are intentionally different: Go uses 13 phased patterns where Python and C# use 7. More on that below.

What's Different About Go#

Single binary deployment. go build -o multihop-rag . produces one executable. No Python environment, no .NET runtime, no database server. The vector store is embedded in the process and persists to disk.

Goroutine parallelism. The benchmark runs 10 parallel workers using a semaphore pattern. Each worker gets its own LLM client and agent instance with no shared state.

Phased decompositions. The Go implementation uses 13 decomposition patterns (vs 7 in Python/C#), split into hop-1 (retrieve + extract + assess) and hop-2 (assess + synthesize). This forces the agent to spread work across iterations rather than cramming everything into one plan. The trade-off: higher average iterations (2.6 vs 1.4-1.9) but more thorough evidence gathering.

Metadata filtering limits. chromem-go supports exact-match metadata filters only. Date range queries require post-filtering in Go code, while ChromaDB handles them natively. This likely accounts for the comparison query gap, where source-specific retrieval is less precise with exact-match constraints.

Code Generation Across Three Languages#

The same query, planned in three languages:

python

# Python
docs = self.retrieve("cryptocurrency trial guilty verdict", k=5)
evidence = self.extract_evidence(self.combine_contexts(docs), question)
self.assess_sufficiency(question, evidence)

csharp

// C#
var docs = await Retrieve("cryptocurrency trial guilty verdict", k: 5);
var evidence = await ExtractEvidence(CombineContexts(docs), question);
await AssessSufficiency(question, evidence);

// Go
docs := self.Retrieve("cryptocurrency trial guilty verdict", 5)
evidence := self.ExtractEvidence(self.CombineContexts(docs), question)
self.AssessSufficiency(question, evidence)

Same logic. Same flow. The LLM adapts the syntax to the target language. The framework provides the structure.

The Pattern Holds#

Three languages is no longer a pair. It's a pattern. Python for ML pipelines. C# for enterprise backends. Go for infrastructure and CLIs. Each team uses their stack. Each team gets 80%+ accuracy on a benchmark where the best prior method tops out at 75%.

The framework is the invariant.

Reproduce It#

The Go MultiHopRAG benchmark is available on request — email rajkumar@opensymbolic.ai for access.