GrepAI disappointed me because, how sad, I once again, in the ai hype, did not expect a project with “grep” in the name to be just a source code indexer that has neither:
-
- a straight-to-the-point command line interface
- single line matching behavior, only contextualized secondarily
- abstraction rather than specialization into source code
- one-off actions, like if there has to be indexing then there should be a way to not start a watcher
- any composability without deserialization, designed with imperative programs and not LLMs in mind
But their documentation told me about some stuff like what good (and quite ethical ad lightweight) embeddings model to use and how they compare.
ollama serve
ollama pull nomic-embed-text-v2-moe
grepai init
grepai watch –background
grepai search “blah blah”
Found 1 results for: "blah blah" ─── Result 1 (score: 0.2583) ─── File: blah.txt:1-2 1 │ 2 │ 3 │
(2 lines — 0.7s~0.9s to ~6s warmup on a decent ultrabook)
And from its codebase, I learnt there is steps:
- It chunks into huge blob-chunks with 10% overlap, but we would do this line by line
- For each REST request it sends one chunk as “prompt” in the JSON, in order to receive a vector of float32 in a JSON as “embedding”
- It does cosine similarity search comparing the vector embedding of the query phrase with neighboring vectors of all the indexed chunks
And so I set out to search for good embedded database solution to keep the embedding-chunk key-value in (I suppose it’s no worth even trying to keep file positions —
- they may keep shifting,
- files can be searched conventionally really cheap,
- and even if something were to get altered there can be cheap Levenstein fuzzy mayhaps fzf to recover the find
- )
First I found out there’s sqlite-vec. That would be good, but I dug more.
They I found a part of Daniel T. Perry’s series on creating an Embedding-based Steam Search over game descriptions and reviews, Part 3: Querying the Embeddings.
Daniel lays out the options: Facebook AI Similarity Search library, and Hierarchical Navigable Small World algorithm implementation, hnswlib.
- Daniel was failed by FAISS because the only way to install it is via Conda.
Idk about FAISS, but HNSW gives you an approximate result. hnswlib is a headers-only C++ library with Python bindings, and it seems to produce an index file. I guess I would make a command line tool to generate an index and to query it…
But Daniel also remarks that searching 10 000 embeddings takes 3 seconds on his machine, and it seems to me his vectors might be larger judging by his chosen model, Instructor. He only considered a performant index when expecting to reach millions of embeddings to search.
There’s a reasonable chance the loopback-interface network serialization-deserialization model-memory-allocation delays might be worse than that on my data, and so I’m better off
- Just focusing on memoizing the embeddings of my queries
- Perhaps I could be searching through them too
- I could try adding a select to preserve the successful find
- Perhaps I could be searching through them too
But 3 seconds can still be something, my netbook can turn out to be not quite something, and I might turn out too lazy to implement any simpler Approximate Nearest Neighbors persisted index. It might be coolest to just use the hnswlib. We’ll see where I’ll get with that. Maybe there will be a next post with some scripting to do all that.
Also it seems a warm-up strategy for the ollama will be necessary.