Preview / closed beta. Interested? Email info@indexify.dev.

Developer Guide

Build Your KB Pipeline

A complete, hands-on guide for developers building agents: start from scratch, get a working integration shipped, then level it up with advanced retrieval, structured document access, and the operational details you’ll want in production.

Build Your KB Pipeline

Knowledge Base Design

A knowledge base is the unit you configure and query: parse and chunk policies, embedding and rerank providers, and per-KB webhooks. Structured elements (from JSON parse) are automatic whenever json is a parse output. Create with POST …/kbs, update with PATCH …/kbs/{kbId}—request/response schemas and validation live in the API reference. Good design upfront reduces reprocessing and retrieval surprises later.

If you’re reading this as a reference, feel free to jump around—each section is written to stand on its own, and the left-hand search is the fastest way to find a specific endpoint or concept.

KB settings examples by product scenario

Copy-ready POST …/kbs bodies for common integration shapes. They align with the scenario cards in Start here — what you can build; treat them as starting points and adjust using Retrieval and Advanced retrieval experiments.

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

  • Using these examples
    • Each card is a full JSON body for POST …/kbs (name, description, settings). Run it with a project access token; use Open endpoint in api-docs / Open in API Runner under the snippet to inspect the operation.
    • Updating an existing KB: PATCH …/kbs/{kbId} with only changed top-level keys; omit settings to leave stored settings unchanged. See the API reference for request shapes and validation rules.
    • After create, run through Document ingestion, then validate with search and (if enabled) Structured knowledge.
    • Prefer one KB profile per product scenario in production so behavior, monitoring, and rollback stay traceable.

Ask-anything knowledge chatbot

End users ask open questions and expect short answers grounded in your docs, with citations they can trust. Configure parse → chunk → index on the KB (hybrid chunks, optional rerank); poll or subscribe via webhooks until jobs complete, then call search for passage-level hits and includeSourceMetadata for inline cites and debugging.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-ask-anything",
  "description": "Grounded Q&A with citations across one or more corpora.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown",
        "json"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid",
        "mergePeers": true
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "cohere",
        "config": {
          "model": "rerank-v3.5"
        }
      }
    }
  }
}'

KB-aware tool-calling agent

An agent first retrieves grounded facts from your KB, then calls business tools (tickets, runbooks, APIs) with traceable context. Indexify adds doctags, typed element graphs (from JSON parse), and relationship edges so tool arguments can reference stable element- and section-level evidence, with Jina rerank when you need tighter precision on technical corpora.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-tool-calling-agent",
  "description": "Structured parse outputs and rerank for tool-ready evidence.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown",
        "json",
        "doctags"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid"
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "jina",
        "config": {
          "model": "jina-reranker-v2-base-multilingual"
        }
      }
    }
  }
}'

Document navigation copilot

Users move through long manuals by headings, sections, tables, and related content—not only flat text search. Indexify enables structured parsing (JSON → elements) and section-native routes so headings and typed elements surface in the API; after ingest, use GET …/sections, GET …/sections/{sectionId}/elements, and POST …/search for navigation-style lookups.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-doc-navigation",
  "description": "Multimodal parse for section-aware UX via JSON elements and per-document section routes.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown",
        "json",
        "doctags"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid",
        "mergePeers": true
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "cohere",
        "config": {
          "model": "rerank-v3.5"
        }
      }
    }
  }
}'

Workflow automation via HTTP API

Your backends classify, route, or summarize when webhooks fire or on a schedule: call POST …/search, parsed content (GET …/parsed), and structure routes with a project access token. Use JSON/doctags parse outputs, multimodal retrieval defaults (tables, figures), and Voyage rerank where configured. For IDE agents, MCP mirrors many of the same reads (see Agents & MCP).

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-workflow-automation",
  "description": "Machine-readable outputs and multimodal search for pipelines.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "json",
        "doctags",
        "markdown"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid"
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "voyage",
        "config": {
          "model": "rerank-2.5-lite"
        }
      },
      "multimodal": {
        "retrievalModes": [
          "chunk",
          "table_row",
          "figure_caption"
        ],
        "grounding": {
          "includeSectionPath": true,
          "includeBbox": false
        }
      }
    }
  }
}'

Compliance and policy Q&A

Teams answer regulatory or internal-policy questions where wrong paraphrases carry risk. Indexify supports accurate PDF/table extraction (OCR + table modes), hierarchical chunking for statute-like docs, section paths in search grounding defaults, and typed element relationships from JSON parse so citations map to stable elements; combine search with sections and elements routes for navigation across policy corpora.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-compliance-qa",
  "description": "Audit-oriented parsing and hierarchical chunks for policy corpora.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown",
        "json"
      ],
      "pdf": {
        "isOcrEnabled": "yes",
        "tableStructureMode": "ACCURATE"
      }
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hierarchical"
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "cohere",
        "config": {
          "model": "rerank-v3.5"
        }
      },
      "multimodal": {
        "grounding": {
          "includeSectionPath": true
        }
      }
    }
  }
}'

IDE knowledge agent (MCP)

Developers query internal runbooks and API docs from Cursor or another MCP host while coding. Tune the KB for hybrid search, rerank, and JSON/doctags parse so section and element tools have signal when the agent drills past search snippets.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-mcp-ide-agent",
  "description": "Developer-docs KB with hybrid search and structured parse for MCP retrieval.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown",
        "json",
        "doctags"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid",
        "mergePeers": true
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "cohere",
        "config": {
          "model": "rerank-v3.5"
        }
      }
    }
  }
}'

KB with raised file size limit (25 MB)

For corpora that include large PDFs, presentation decks, or scanned reports, raise settings.maxFileSizeMb above the server default (10 MB). The per-KB limit applies to both direct file uploads and server-fetched URL content. Files exceeding the limit return 413 file_too_large.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-large-docs",
  "description": "KB for large PDFs and scanned reports up to 25 MB.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown",
        "json"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid",
        "mergePeers": true
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "openai:text-embedding-3-small",
        "config": {
          "dimensions": 1536
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "cohere",
        "config": {
          "model": "rerank-v3.5"
        }
      }
    },
    "maxFileSizeMb": 25
  }
}'

KB with custom tokenizer for hybrid chunking

Hybrid chunk size is measured in tokens. By default the platform uses sentence-transformers/all-MiniLM-L6-v2 (recommended for OpenAI, Cohere, Google, AWS, and Voyage embedding models). When using nomic:nomic-embed-text-v1.5, set chunk.strategy.tokenizer to nomic-ai/nomic-embed-text-v1.5 so chunk boundaries match that model's vocabulary.

Create KB (full request)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "product-nomic-kb",
  "description": "KB using Nomic embeddings with matching hybrid chunk tokenizer.",
  "settings": {
    "pipeline": [
      "parse",
      "chunk",
      "index"
    ],
    "parse": {
      "outputFormats": [
        "markdown"
      ]
    },
    "chunk": {
      "attachMetadata": true,
      "strategy": {
        "method": "hybrid",
        "size": 512,
        "tokenizer": "nomic-ai/nomic-embed-text-v1.5"
      }
    },
    "index": {
      "embeddingConfig": {
        "model": "nomic:nomic-embed-text-v1.5",
        "config": {
          "dimensionality": 768
        }
      }
    },
    "search": {
      "rerankConfig": {
        "provider": "cohere",
        "config": {
          "model": "rerank-v3.5"
        }
      }
    }
  }
}'

Document Ingestion

Ingestion is how you add or replace source material in a knowledge base: multipart upload creates a document row and an asynchronous job that runs parse → chunk → index. When json is among settings.parse.outputFormats, parse also produces a structured representation of the document (elements, tables, figures, and relationships). You observe completion by polling GET …/jobs/{jobId} or by webhooks. After success, run search or fetch structured outputs depending on your product. Per-route detail: API reference.

Tip: the snippets are meant to be copy/paste friendly—start with the happy path, then intentionally poke at scopes, missing fields, and bad inputs so you know exactly what your app will see in production.

List documents in the KB (docs:read; optional limit, cursor=<nextCursor> document id, sort, order; bad cursor → 400 invalid_cursor)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents?limit=20" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Get one document by id (docs:read; soft-deleted → 404 document_not_found)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Get parsed document body (docs:parsed:read; no query params — use Accept only; 404 parsed_not_found if format not stored yet)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003/parsed" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Accept: text/markdown"

Paginated chunks (docs:parsed:read; format must be in KB parse.outputFormats; 422 chunking_not_completed until job completes)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003/chunks?limit=20&format=markdown" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Accept: application/json"

Typical ingestion workflow

Repeatable sequence for automation and runbooks; adjust names and ids to your environment.

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

Delete and re-ingest

How deletion and re-ingest/reprocess affect derived outputs (parsed/chunks/vectors/structured elements).

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

  • Delete a document
    • DELETE …/documents/{documentId} removes the row and all derived data for that document (parsed text, elements, chunks, vectors, jobs). Conflict when a job is still running; MCP has no delete—use REST (docs:delete).
  • Re-ingest / replace a document
    • GET …/content downloads the original uploaded bytes with the stored MIME type and normal download headers—useful for backups or clients that need the source file. Not exposed in MCP.
    • PUT …/content replaces the file under the same document id: the API wipes derived data first (parsed output, chunks, vectors, jobs for this doc), then stores the new bytes and queues a new pipeline job so stale chunks cannot linger. Optional Idempotency-Key follows the same rules as POST …/documents. MCP does not wrap replaces.
    • If job creation fails mid-flight, the API may revert the document record and remove the new blob while derived data is already gone—use reprocess (below) or re-upload as needed.
  • Reprocess (same bytes, new settings)
    • POST …/reprocess keeps the same blob but re-runs the pipeline under the KB’s current settings (still purges derived outputs first). Send JSON {}. Not in MCP; errors and scopes: POST …/reprocess.

Supported formats and constraints

The upload API accepts common office, markup, tabular, and (when enabled on the KB) image/audio inputs. Exact limits, MIME types, and error codes are documented with POST …/documents—do not rely on undocumented extensions.

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

  • Supported inputs
    • Rich documents: PDF, DOCX, PPTX.
    • Markup: HTML, Markdown, AsciiDoc, WebVTT.
    • Tabular: XLSX, CSV.
    • Images and audio when OCR or transcription is enabled in KB settings—behavior follows the public schema for those settings.
  • File size limit
    • The default maximum is 10 MB per file (applies to both multipart/form-data uploads and server-fetched URL content).
    • To raise or lower the limit for a specific KB, set settings.maxFileSizeMb (integer, must be > 1) when creating or patching the KB via POST …/kbs or PATCH …/kbs/{kbId}. Files exceeding the effective limit return 413 file_too_large.
    • Example: { "settings": { "maxFileSizeMb": 25 } } raises the ceiling to 25 MB for that KB only. Send null to revert to the server default.

Idempotent uploads and retries

Send Idempotency-Key on POST …/documents so HTTP retries and duplicate submissions replay the same outcome when the fingerprint matches (see that operation in API reference). Pair with the job polling pattern in Jobs and webhooks.

Tip: the snippets are meant to be copy/paste friendly—start with the happy path, then intentionally poke at scopes, missing fields, and bad inputs so you know exactly what your app will see in production.

  • Idempotency
    • Generate one key per logical upload (e.g. stable id from your CMS); reuse the same key for HTTP-layer retries, not a new key per attempt.
    • Store the key next to your documentId / jobId so automated jobs can retry safely after outages.

Upload with Idempotency-Key: same project + KB + same file bytes may replay cached 201 (~24h TTL per OpenAPI); use a new key when bytes or kbId change

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Idempotency-Key: f1e2d3c4-b5a6-7890-abcd-ef1234567890" \
  -F "file=@./handbook.pdf"

Jobs and Webhook Lifecycle

Long-running work (parsing, chunking, indexing) is modeled as jobs you poll or list, and optionally as webhook deliveries to your HTTPS endpoint. Both surfaces are part of the public contract: responses and payloads are documented in the API reference. Pair polling with webhooks when you need low-latency reactions; keep polling as a backstop when a delivery fails.

Tip: the snippets are meant to be copy/paste friendly—start with the happy path, then intentionally poke at scopes, missing fields, and bad inputs so you know exactly what your app will see in production.

List jobs in the KB (optional status filter; next page: same query + cursor=<nextCursor> UUID from prior response)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/jobs?limit=20&status=failed" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Inspect one job — PROJECT_ID, KB_ID, JOB_ID must be UUIDs (invalid → 400 invalid_input; wrong project vs token → 401)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/jobs/880e8400-e29b-41d4-a716-446655440004" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Cancel a non-terminal job (scope jobs:cancel). Terminal job → 200 noop; job.canceled webhook only when status transitions to canceled.

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/jobs/880e8400-e29b-41d4-a716-446655440004/cancel" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Retry one failed/canceled job (scope jobs:retry). Non-retryable status → 400 job_not_retryable.

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/jobs/880e8400-e29b-41d4-a716-446655440004/retry" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Retry every failed/canceled job in the KB (scope jobs:retry). 202 + { items: [...] }; items may be empty.

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/jobs/retry" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

What a webhook is and when to use it

When subscribed events occur for a knowledge base, the product POSTs JSON to your HTTPS URL. You implement the receiver; Indexify implements delivery, retries, and (when configured) signing. This is strictly an agent-developer-facing callback.

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

  • Why webhooks
    • Trigger downstream workflows when ingestion finishes: refresh caches, notify users, enqueue summarization, or update an external CMS index.
    • Reduce polling load versus tight loops on GET …/jobs/{jobId}—still keep polling as a safety net for missed deliveries.
    • Treat delivery as at-least-once: dedupe using stable ids from the payload (Glossary — webhooks).
    • Verify X-Indexify-Signature on the raw body before mutations (Webhook signature verification).

How to configure webhooks

Each KB may have one webhook configuration: HTTPS URL, subscribed events, active flag, optional signing secret, and retry policy. Operations and request bodies are listed under the KB webhook routes in the API reference. The cURL examples below include Open endpoint in api-docs and Open in API Runner.

Tip: the snippets are meant to be copy/paste friendly—start with the happy path, then intentionally poke at scopes, missing fields, and bad inputs so you know exactly what your app will see in production.

Read webhook config for the KB (webhooks:read; no secret in body; 404 kb_not_found or webhook_not_found)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/webhook" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Create or replace webhook for a KB (201 if new row, 200 if replace; optional Idempotency-Key replays same JSON)

curl -s -X PUT "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/webhook" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-app.example.com/indexify/webhook",
    "active": true,
    "secret": "replace-with-signing-secret",
    "events": [
      "job.pending",
      "job.started",
      "job.parsing_started",
      "job.parsing_format_completed",
      "job.parsing_structure_ready",
      "job.parsing_partial_failure",
      "job.parsing_completed",
      "job.chunking_started",
      "job.chunking_format_selected",
      "job.chunking_partial_failure",
      "job.chunking_completed",
      "job.indexing_started",
      "job.indexing_completed",
      "job.completed",
      "job.failed",
      "job.canceled"],
    "retryPolicy": { "maxAttempts": 3, "backoffSeconds": 30 }
  }'

Partially update webhook ({} no-op 200 Webhook without secret; include secret to rotate -> WebhookWithSecret)

curl -s -X PATCH "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/webhook" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Content-Type: application/json" \
  -d '{"active":false}'

Delete webhook for the KB (webhooks:delete; 204 empty body; 404 kb_not_found / webhook_not_found)

curl -s -o /dev/null -w "%{http_code}\n" -X DELETE "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/webhook" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Test webhook delivery (202 + deliveryId; 502 webhook_delivery_failed if URL non-2xx/inactive; 404 if no webhook)

curl -s -X POST "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/webhook/test" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Webhook event catalog and meanings

Subscribe only to events your automation needs. Job events mirror GET …/jobs/{jobId}. Payload shapes and fields (data.parse, data.chunking, data.indexing) are documented per event in the API reference.

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

  • Job lifecycle
    • job.pending — Job row queued (upload, replace, reprocess, or retry); arrives before processing begins.
    • job.started — Work began; useful for in-progress UI.
    • job.parsing_completed / job.chunking_completed / job.indexing_completed — Stage success milestones.
    • job.completedSafe default hook to mark content searchable and run post-processing.
    • job.failed — Terminal failure; correlate with job detail and optional monitoring email.
    • job.canceled — Emitted when a job transitions to canceled (operator/API cancel); not sent again on idempotent POST …/cancel if the job was already terminal.
  • Per-stage progress (optional)
    • job.parsing_started / job.chunking_started / job.indexing_started — Right before that pipeline step runs.
    • job.parsing_format_completed — One parse output format stored; data.parse: format, ordinal, total.
    • job.parsing_structure_ready — Multimodal structure saved after JSON parse; data.parse: structureReady.
    • job.chunking_format_selected — Chunks created from original_file or a parsed format; data.chunking: source.
  • Partial failures (non-terminal)
    • job.parsing_partial_failure / job.chunking_partial_failure — Some formats or paths failed while the job continued; data.parse / data.chunking include succeeded and failed arrays.
  • Test-only
    • webhook.test — Emitted only when you call the test endpoint; use it to validate URL, TLS, and signature verification without waiting for real jobs.