Compass NL→Cypher
A natural language to Cypher query translator for knowledge graphs, using a grammar-guided decoding strategy to guarantee syntactically valid output. Designed for non-technical users querying Neo4j databases.
Python · Neo4j · Hugging Face · ANTLR4 · FastAPI
Overview
Compass is a natural language to Cypher query translator designed for non-technical users who need to query knowledge graphs stored in Neo4j. The problem with off-the-shelf text-to-SQL or text-to-query approaches is that they frequently generate syntactically invalid queries — an unacceptable failure mode in production when the target audience cannot debug Cypher.
Compass addresses this through grammar-guided decoding: at each step of generation, only tokens that form a valid prefix according to the Cypher ANTLR4 grammar are permitted. This hard constraint eliminates syntactic errors entirely while still allowing flexible natural language input.
Approach
The underlying model is a fine-tuned sequence-to-sequence transformer, trained on a curated dataset of (natural language question, Cypher query) pairs drawn from a combination of public benchmarks and internally generated synthetic data. The grammar constraint is implemented as a logit mask over the vocabulary, computed dynamically from the partial parse state of the output sequence.
Schema awareness is handled through a lightweight schema injection mechanism: at inference time, the relevant node labels, relationship types, and property names from the target graph are prepended to the input as structured context. This allows a single model to operate across multiple graphs without retraining.
Evaluation
On the Spider-CQL benchmark, Compass achieves 71.3% exact match — competitive with approaches that do not use grammar-guided decoding, and with the advantage of a 0% syntactic error rate. User testing with non-technical users showed a 40% reduction in query retry rate compared to a baseline approach without grammar constraints.