--- title: "Getting Started with XI Lucent" description: "Install Xio.Lucent, provision assets, ingest your first document, and run your first query." published: 2026-05-14T12:11:27.420936+00:00 updated: 2026-05-14T12:11:27.420936+00:00 tags: ["getting-started", "lucent", "tutorial"] url: https://xiobjects.com/docs/xio/lucent/getting-started source: XI Objects --- # Getting Started with XI Lucent This guide gets you from a blank project to a working semantic search pipeline in a few minutes. ## Prerequisites - .NET 10.0 or later - A project that uses dependency injection (ASP.NET Core, Worker Service, or a test host) ## Install ```bash dotnet add package Xio.Lucent ``` ## Register the engine One call registers everything: the ONNX embedder, semantic chunker, SQLite vector store, FTS5 full-text store, and hybrid retrieval strategy. ```csharp builder.Services.AddLucent(); ``` By default, Lucent stores its databases under `.lucent/` relative to the process working directory. To move them somewhere else: ```csharp builder.Services.AddLucent(opts => { opts.StorageRoot = "/var/data/lucent"; }); ``` ## Provision assets Before the engine can embed anything, it needs two assets: the `nomic-embed-text-v1.5` ONNX model and the sqlite-vec native extension. Call `ProvisionAsync` once at startup. ```csharp var engine = app.Services.GetRequiredService(); await engine.ProvisionAsync(); ``` This downloads and verifies both assets against pinned checksums. Once downloaded, subsequent restarts skip the download. You can also provision ahead of time and ship the assets with your container image. ## Ingest a document `AddDocumentAsync` accepts a collection ID and a request. Collections are created implicitly on first write. ```csharp var result = await engine.AddDocumentAsync("docs", new AddDocumentRequest { Content = "# XI Lucent\n\nA semantic ingestion and retrieval library for .NET 10.", DocumentId = "readme", ContentTypeHint = "text/markdown" }); Console.WriteLine($"Chunked into {result.ChunkCount} chunks in {result.ChunkingDuration.TotalMilliseconds:F0}ms"); ``` `DocumentId` is optional. If you omit it, Lucent generates one from a BLAKE3 hash of the content. If you ingest the same document ID twice with unchanged content and the same embedder model, Lucent short-circuits and returns immediately without re-embedding. ## Run a query ```csharp var result = await engine.QueryAsync("docs", new QueryRequest { Text = "what is lucent", TopK = 5 }); foreach (var hit in result.Chunks) { var chunk = hit.Chunk; var meta = chunk.Metadata; Console.WriteLine($"[{hit.Score:F3}] {chunk.Content}"); Console.WriteLine($" Document: {chunk.DocumentId}"); Console.WriteLine($" Heading: {meta.Heading}"); Console.WriteLine($" Section: {meta.SectionPath}"); Console.WriteLine($" Page: {meta.PageNumber}"); // null for non-paginated content } ``` Every result carries where in the source material the chunk came from: which document, which heading, which section, which page (for PDFs), which slide (for PPTX), which row (for CSV), which DOM element (for HTML). The content never comes back without its origin. ## Ingest from a file For binary formats like PDF or Word, pass a stream: ```csharp await using var file = File.OpenRead("spec.pdf"); var result = await engine.AddDocumentAsync("docs", new AddDocumentRequest { ContentStream = file, Source = new SourceInfo { FileName = "spec.pdf" } }); ``` Lucent detects the format, decomposes the file to text, and threads per-page provenance through the chunking stage. A chunk from page 12 of the PDF comes back with `PageNumber = 12`. A chunk from slide 4 of a PPTX comes back with `SlideIndex = 4`. You don't configure any of this; it comes from the decomposer automatically. ## What the default setup gives you | Component | Default | |-----------|---------| | Embedder | `OnnxEmbedder` — `nomic-embed-text-v1.5`, 768 dimensions, runs locally | | Chunking | `SemanticChunker` — splits on cosine similarity drop-offs, respects structural boundaries | | Vector store | `SqliteVectorStore` — sqlite-vec cosine KNN | | Full-text store | `SqliteFtsStore` — FTS5 BM25 in the same database | | Retrieval | `HybridRetrievalStrategy` — vector + FTS5 fused via Reciprocal Rank Fusion | | Re-ranker | `NoOpScorer` — pass-through; swap in `CrossEncoderScorer` if you need it | ## Next Steps - Read how the [Ingestion & Retrieval Pipeline](/docs/xio/lucent/concepts/pipeline) works end to end - Explore [Chunking Strategies](/docs/xio/lucent/concepts/chunking) and when to choose each one - Learn to [Swap Adapters](/docs/xio/lucent/guides/swap-adapters) to replace the embedder or vector store - Expose Lucent over HTTP with [Xio.Lucent.Api](/docs/xio/lucent/api/rest) or as an MCP server with [Xio.Lucent.Mcp](/docs/xio/lucent/api/mcp)