Favor clear data boundaries, observable pipelines, and least-privilege credentials. The platform gives you parsing, chunking, search, and structured document routes—you still own corpus design, evaluation, and how evidence is shown to end users.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Do
- Align each knowledge base with one domain or compliance boundary so settings and access patterns stay explainable.
- Tune retrieval and rerank with labeled queries before changing LLM prompts—bad evidence rarely fixes itself in the prompt.
- Enable source metadata in search when user-facing answers must cite documents.
- Avoid
- One oversized mixed corpus—split KBs and use metadata filters where the API allows.
- Shipping without citation metadata or retrieval metrics (Glossary — integration quality).
- Recycling one machine credential across unrelated services (Scope model).
Start from HTTP status and Problem JSON (title, detail, optional code), then map to token type, scopes, KB settings, or provider configuration. Cross-check the API reference for the exact route you called.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Symptom → check
- 401 / 403 — Wrong token type (developer vs project), wrong
projectIdin path, or missing scope (Available scopes). - Search errors or empty hits —
formatmust match parse outputs available for the document; confirm jobs completed (Jobs). - Rerank failures — KB
rerankConfigand provider credentials; see Embedding models & rerank. - 429 — Throttle concurrency; read limit headers (Throttling); exponential backoff with jitter.
- Low relevance — Widen
retrieveTopK, adjust hybrid settings, or revisit chunking (Advanced retrieval). - Webhooks — Verify
X-Indexify-Signatureon the raw body (Verification).
- 401 / 403 — Wrong token type (developer vs project), wrong
Use this to quickly map an endpoint returning empty/missing data to the likely KB setting or missing artifact (pipeline stage, output format, or multimodal structure).
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Parsed document
- Endpoints
- KB settings or artifacts to check
settings.pipelineincludesparse- Requested representation is in
settings.parse.outputFormats(viaAccept) - Parse job finished successfully
- Document chunks
- Endpoints
- GET …/chunks (
?format=…per spec)
- GET …/chunks (
- KB settings or artifacts to check
settings.pipelineincludeschunk(or includesindex, since index ⇒ chunk)- Requested
formatis insettings.parse.outputFormats - Chunking job finished successfully
- Search
- Endpoints
- POST …/search (
?format=…query param)
- POST …/search (
- KB settings or artifacts to check
settings.pipelineincludesindex(embeddings exist)- Requested
formatis insettings.parse.outputFormats
- Elements, relationships, sections, tables, figures
- Endpoints
- KB settings or artifacts to check
settings.parse.outputFormatsincludesjson- Parse(JSON) finished successfully
- Search with section or element narrowing
- Endpoints
- POST …/search (with
groupBy: sectionand/orelementTypes)
- POST …/search (with
- KB settings or artifacts to check
settings.pipelineincludesindex(embeddings available)settings.parse.outputFormatsincludesjsonwhen relying on section/element metadata- Parse and index jobs finished successfully
Short definitions for terms used across this guide and the API reference. For procedures, prefer linked sections: Authentication, Ingestion, Retrieval, MCP.
If you’re reading this as a reference, feel free to jump around—each section is written to stand on its own, and the left-hand search is the fastest way to find a specific endpoint or concept.
Who is calling and what they may do. How-to: Authentication and Security, Appendix — token matrix.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Developer — human user of the product (signup, login, account UI). Often authenticated with a session or JWT for Control & Ingestion Plane actions.
- Developer JWT — bearer token representing the logged-in developer, used for account- and project-level operations (not the same as machine project tokens).
- Project — top-level container for credentials, knowledge bases, and billing-style boundaries. Admin routes (projects, credentials, KB listing): GET /projects, POST /projects, GET /projects/{projectId}, POST …/credentials, DELETE …/credentials/{credentialId}, GET …/kbs — full matrix in API reference.
- Project access token — short-lived OAuth2 access token obtained with
client_credentialsusing a project’s client id/secret. Used for KB, documents, jobs, search, webhooks against that project. - Machine credential / project credential — the client id + secret pair created on a project; used only on trusted servers to mint project access tokens.
- Scope — fine-grained permission string on a project token (for example
docs:read,search:run). Requests fail with 403 if the token lacks the required scope. - Least privilege — practice of issuing narrowly scoped credentials per service (ingest-only vs search-only) to limit blast radius if a secret leaks.
- Control & Ingestion Plane vs data plane — Control & Ingestion Plane: projects, settings, credential CRUD. Data plane: runtime ingestion, search, and webhooks on KBs.
- Rate limit key — identifier used for throttling (typically per project when using a project token, else per developer or client IP).
- Idempotency key — client-supplied header value so retries of the same logical operation do not create duplicates.
Projects, KBs, documents, chunks, and structured outputs. See Knowledge Base Design and Structured document access.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Knowledge Base (KB) — named configuration and storage boundary under a project. Each KB has its own parse/chunk/index/search settings, documents, jobs, and optional webhook.
- Document — a single uploaded file or ingestion unit in a KB. Has metadata (name, type, size), processing jobs, and derived artifacts (parsed tree, chunks, vectors).
- Source file / blob — raw bytes stored for a document; processing reads this to produce parsed outputs.
- Parsed document — structured output of the parse stage: layout-aware representation (sections, elements, tables, figures, etc.) rather than a single plain string.
- Element — typed node in a parsed document (paragraph, heading, table cell region, figure, etc.). Useful for UI and agents that need more than flat text.
- Section — logical subdivision of a document (often heading-derived). API routes expose sections and their child elements.
- Chunk — text (and metadata) segment stored for retrieval. Created in the chunk stage; embedding vectors typically correspond to chunks.
- Chunking — policy that splits or merges parsed content into chunk boundaries (size, hierarchy, markdown tables, metadata attachment).
- Embedding vector — fixed-length numeric representation of chunk or query text used for similarity search in vector space.
- Index / indexing — stage that writes embeddings and retrieval structures so search can run. Distinct from “search index” as a generic term.
- Corpus — the set of chunks (and associated metadata) searchable within a KB after successful processing.
Parse → chunk → index and async jobs. Operational guide: Jobs and webhooks, Document ingestion.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Pipeline — ordered processing stages for each document, reflected in job payloads (GET …/jobs/{jobId}); typically parse → chunk → index.
- Parse stage — converts source format into structured parsed form and chosen output formats (markdown, html, json, doctags, etc.).
- Chunk stage — transforms parsed content into chunks according to KB chunk settings.
- Job — asynchronous unit of work for a document ingest or reprocess. GET …/jobs/{jobId} returns status, optional
stages, and errors when failed. - Job stage — fine-grained state within a job (for example parsing, chunking, indexing) with success/failure and timing.
- Reprocess / retry — trigger processing again after failures or settings changes; may create new attempts or follow idempotency rules.
- Processing complete — state where required stages succeeded and search can return results for the affected content (subject to eventual consistency).
- Failure / terminal error — job or stage stopped with an error payload; clients should surface codes and retryability hints when present.
Search modes, top-K, rerank. Guides: Retrieval fundamentals, Advanced retrieval, Embedding models & rerank.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Search / query — HTTP request to a KB’s search endpoint with a natural language or keyword string plus optional settings overrides.
- Semantic search — retrieval using embedding similarity between query and chunks (meaning-based match).
- Keyword search — retrieval matching lexical terms (important for SKUs, error codes, API paths).
- Hybrid search — combines semantic and keyword signals so both paraphrases and exact tokens can rank well.
- Retriever — first stage that pulls a candidate set of chunks (often top-K by similarity or hybrid score).
- Rerank / reranker — second-stage model that scores each query–chunk pair for finer relevance; reduces noise before the LLM sees context.
- retrieveTopK — setting controlling how many candidates the first stage returns before rerank.
- rerank.topK — setting controlling how many chunks survive reranking into the final context set.
- Top-K — generic term for “K best results” after a scoring step.
- Similarity / distance — geometric relationship between vectors; higher similarity usually means closer in embedding space.
- Embedding model — provider-specific model id (for example
openai:text-embedding-3-small) configured on the KB for both index and query embedding when applicable. - Query embedding — vector computed for the user query at search time using the KB’s embedding configuration.
- Format (search) — response projection for chunk text (
markdown,html,text,json,doctags). Must align with parse output formats available for the document. - Pagination cursor — opaque token (
nextCursor) for stable continuation of large result sets. - Source metadata / citation metadata — fields tying a hit back to document, offsets, or section paths so answers can cite sources.
Patterns for models plus tools. See Building agent workflows and MCP Integration.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- RAG (retrieval-augmented generation) — pattern where an LLM answers using retrieved chunks as context, reducing hallucinations when evidence exists.
- Grounding — anchoring model output in retrieved passages; strong grounding implies citations trace to real chunks.
- Hallucination — plausible but unsupported statement; RAG and citations mitigate but do not eliminate.
- Context window — maximum tokens an LLM can attend to; rerank and top-K exist to fit the best evidence within this budget.
- Prompt — instructions and retrieved text sent to the model; quality of evidence often matters more than prompt tricks.
- Agent — system that plans multiple steps (tool calls, retrieval, refinement) rather than one-shot prompt/answer.
- Tool / function calling — agent invokes external APIs (including your wrappers around Indexify search) with structured arguments.
- MCP (Model Context Protocol) — standard way for agent hosts to expose tools to a model; Indexify exposes a subset of the REST surface as MCP tools (MCP Integration).
- Orchestration layer — your own scheduler or a third-party agent framework that sequences tool calls and manages state between steps.
Push notifications to your HTTPS endpoint. Setup: Jobs and webhooks, Webhook security.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Webhook — HTTPS callback URL registered on a KB to receive JSON payloads when subscribed events occur.
- Webhook event — named occurrence (
job.pending,job.completed,job.failed, coarse stage completions, granular milestones likejob.parsing_format_completed, partial-failure signals, etc.). - Payload — JSON body POSTed to your URL, including event type, timestamps, and nested project/KB/document/job data.
- Delivery — single HTTP attempt or retry sequence for one webhook payload; failures may retry per webhook retry policy.
- Retry policy — max attempts and backoff between deliveries when your endpoint returns non-success or times out (configured on the webhook).
- Signing secret — shared HMAC secret; Indexify may send
X-Indexify-Signatureso you can verify authenticity. - At-least-once delivery — duplicates are possible; your handler should be idempotent (dedupe by delivery or event id).
- Test webhook — synthetic POST (
webhook.test) to validate connectivity and signature verification without waiting for real jobs.
Layout-aware content from JSON parse. Guide: Structured document access.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Structured access — per-document elements, sections, tables, figures, and relationships exposed via document-scoped GET routes.
- Relationship — typed edge between elements within a document (containment, reading order, captions, etc.).
- Multimodal — processing that retains non-textual structure (tables as tables, figures with captions, layout) rather than flattening everything to prose.
- Table (document) — structured grid extracted as first-class content; may support row-level retrieval modes when enabled.
- Figure — image or diagram with caption and metadata.
- Bounding box / bbox — coordinates grounding content in a page; useful for PDF viewers and provenance.
- OCR — optical character recognition for scanned pages or images inside documents.
- Output format — parse-time representation choice (
markdown,html, etc.) influencing what search can return.
HTTP semantics, secrets, and SLO-style thinking. See Reliability and operations and Troubleshooting.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- TLS / HTTPS — required for webhook URLs in typical configurations; protects payloads in transit.
- Secret management — storing client secrets and webhook secrets in vaults or managed secret stores, not in repos or frontends.
- 403 Forbidden — authentication succeeded but scopes or ownership checks failed.
- 401 Unauthorized — missing or invalid token.
- 429 Too Many Requests — rate limit exceeded; honor
Retry-Afterand backoff. - 504 / upstream errors — transient faults from providers or dependencies; retry with limits.
- Correlation id — client-generated id propagated across logs to tie webhook handling to originating API calls.
- SLO / SLA — service objectives you define (freshness, latency); Indexify provides metrics hooks via jobs and headers you can chart.
- Monitoring email — optional project setting to receive failure/recovery notifications for jobs and webhooks.
Testing vocabulary for retrieval systems. Apply with Best practices and API Runner.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- Smoke test — minimal automated path (auth → upload or existing doc → job success → search hit) run after deploys or config changes.
- Golden query set — fixed list of questions with expected source documents or passages; used to regression-test retrieval.
- Precision@K — fraction of top-K results that are relevant; common offline metric for retrieval tuning.
- Recall — fraction of all relevant documents (or chunks) found in the candidate set; trades off with precision when K is small.
- Regression — retrieval quality or latency gets worse after a change; guard with golden sets and dashboards.
- Shadow mode — run new retrieval settings or providers in parallel without serving users, compare scores offline.
- Feature flag — toggle in your app to route traffic between retrieval profiles or Indexify KBs.
- Canary — roll out a change to a small slice of users or traffic before full cutover.
- Dead letter queue — store failed webhook deliveries (or your handler’s failures) for manual replay after fixing bugs.
- Backpressure — slow downstream consumers when ingestion or search load spikes to avoid cascading failures.
- Cold start — first query or first document in a new KB may pay one-time latency until caches warm.
- Warm path — steady-state requests after caches and connections are established.
- Determinism — same inputs producing the same stored chunks and vectors given fixed embedding/rerank models and KB settings.
- Model drift — embedding or rerank model changes upstream so vectors are no longer comparable without reprocessing.
HTTP, errors, and spec terms. Primary references: API reference, Appendix.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Definitions
- API — HTTP JSON interface to Indexify (
api.indexify.devin production examples). - REST — resource-oriented HTTP patterns (paths, verbs, status codes) used by most endpoints.
- Problem JSON — API errors often use
Content-Type: application/problem+jsonwith a JSON body{ status, title, detail, code? }(codeis optional). This is RFC 7807–inspired but Indexify does not set atypeURI field. - OpenAPI / spec — machine-readable description of endpoints; Indexify landing may expose a spec for downloads and docs UI.
- Environment — logical deployment tier (production, staging, development); use separate projects and credentials per tier.
- Deprecation — older fields or behaviors scheduled for removal; check changelog and migration notes.
- KB settings — JSON blob on create/update controlling pipeline, parse, chunk, index embedding, search rerank.
- Provider — third-party AI vendor (OpenAI, Cohere, Voyage, Jina, Google, AWS Bedrock, Nomic) behind embedding or rerank configuration.
- Token (LLM) — length unit for language model input; distinct from OAuth access token.
- Token (OAuth) — bearer credential for API authorization.
- Base URL — host prefix for all API calls; examples use
https://api.indexify.devbut your deployed region or vanity host may differ. - Content-Type — HTTP header; JSON bodies use
application/json, uploads typicallymultipart/form-data. - Accept — HTTP header indicating preferred response shapes where negotiated (most Indexify APIs return JSON).
- User-agent — optional client identifier string; useful for support when debugging abuse or quotas.
- API — HTTP JSON interface to Indexify (
Dense reference: how the API surface is grouped, which token types apply, HTTP semantics, limits, webhooks, and KB settings keys. For project-token scope strings, see Available scopes. Use this alongside the API reference and API Runner for path-level detail and interactive calls.
If you’re reading this as a reference, feel free to jump around—each section is written to stand on its own, and the left-hand search is the fastest way to find a specific endpoint or concept.
- URLs & auth classes
- Host — production examples use
https://api.indexify.dev; use the Base URL for your account/region. - URL layout — project admin: GET /projects, POST /projects, GET /projects/{projectId}; credentials: POST …/credentials, DELETE …/credentials/{credentialId}; KB index: GET …/kbs, POST …/kbs. KB-scoped data plane lives under
/projects/{projectId}/kbs/{kbId}/— see API reference. projectIdin the path must match the project access token’s project.- OAuth machine auth — POST /oauth/token (
client_credentials, form body). - Developer (human) auth — POST /auth/signup, POST /auth/login and related routes in API reference; not project tokens.
- Use only paths documented in the public API reference; undocumented URLs are unsupported.
- Host — production examples use
- Errors & lists
- Problem JSON —
application/problem+jsonwithstatus,title,detail, optionalcode(notypeURI today). - Pagination —
nextCursorin body; passcursor(or spec name) on follow-up. - Cursors are not immortal — re-list after large data changes.
- Problem JSON —
Rule of thumb: developer JWT for Control & Ingestion Plane operations; project token for data plane operations.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Token types
- Developer JWT — Projects, account, credentials, settings (e.g. monitoring email). Never in public clients.
- Project access token — KBs, docs, jobs, search, parsed content, webhooks via
client_credentials. - Mismatch —
401/403often = wrong token type orprojectIdin path ≠ token’s project. - TTL — Short-lived; in-memory cache + refresh; never commit tokens or secrets.
- Rotation — Create a new credential, update your services, revoke the old credential; brief overlap avoids downtime.
- Success
200/201/204— Parse JSON when present.
- Client errors
400— Validation; fix payload, ids, or query params perdetail.401— Missing/invalid token or wrong token type for route.403— Scopes or project mismatch.404— Unknown resource or missing webhook config.409— State conflict; readdetailfor retry vs change strategy.413/415/422— Size, media type, or semantic config rejection.429— Rate limit; useRetry-AfterandX-RateLimit-*.
- Server errors & retries
502/ upstream — Transient; backoff with cap; logdetail.- Idempotent retries — GETs safe; POST/PUT use
Idempotency-Keywhere documented.
Figures below match the public API spec where stated; platform or plan limits may vary—treat error responses and docs as source of truth.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Uploads & batches
- Multipart
fileparts; per-file max in spec (often tens of MB) →413if exceeded. - Batch file count cap per endpoint — check spec.
- Multipart
- Search, webhooks, concurrency
- Large
retrieveTopK/rerank.topK→ latency and cost; tune with evals. - Webhook
retryPolicy:maxAttempts1–10,backoffSeconds1–300,(maxAttempts-1)*backoffSeconds ≤ 300. - Match client concurrency to rate-limit headers to avoid 429 storms.
- Huge queries/filters may hit provider/HTTP limits before app logic.
- Large
- Protocol
POST+application/jsonunless docs say otherwise.- Subscribable events include job lifecycle events (full enum in API docs).
webhook.testonly from the test endpoint.
- Security & handler design
- Verify
X-Indexify-Signature(HMAC-SHA256, raw body, constant-time). - Return
2xxfast; queue heavy work. - Dedupe with
deliveryIdor composite keys — duplicates happen. - Repeated failures exhaust retries → possible monitoring email.
- Verify
Reference for the main blocks inside a knowledge base configuration. Full field-level schemas are in ./api-docs and the OpenAPI components.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Top-level
settingskeyspipeline— Stages per ingest (parse,chunk,index, …).parse— Output formats,parse.pdf.chunk—method(hierarchical|hybrid),size,mergePeers,tokenizer,attachMetadata.index—embeddingConfig+ optional multimodal index flags.search— DefaultrerankConfig+ multimodal retrieval defaults.- Unknown top-level keys rejected — PATCH only documented shapes.
Search request bodies and query strings are defined in the OpenAPI document; use it for exact field names. This lists concepts you will see repeatedly.
Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).
- Concepts
query— Embedded with the KB’s embedding model.format— Hit text projection (markdown,html, …).settingsoverrides — Merge with KB defaults (searchMode,retrieveTopK,rerank,rerank.topK,includeSourceMetadata, …).includeSourceMetadata— Citation-friendly metadata in hits.nextCursor— Paging for large result sets.
- Standard headers
Authorization: Bearer— Developer JWT or project token per route.Content-Type—application/jsonormultipart/form-datafor uploads.Idempotency-Key— On supported mutating calls (e.g. webhook upsert).X-RateLimit-*— Client-side pacing.Retry-After— On429.