Preview / closed beta. Interested? Email info@indexify.dev.

Developer Guide

Structured Knowledge

A complete, hands-on guide for developers building agents: start from scratch, get a working integration shipped, then level it up with advanced retrieval, structured document access, and the operational details you’ll want in production.

Structured Knowledge

Structured Document Access

The API returns layout-aware structure, not only flat text: per-document elements, sections, tables, figures, and relationships from JSON parse. Start from search for documentId, then use document-scoped GETs (below). Full schemas: API reference; MCP: sections, tables, figures tools.

Tip: the snippets are meant to be copy/paste friendly—start with the happy path, then intentionally poke at scopes, missing fields, and bad inputs so you know exactly what your app will see in production.

  • Per-document structure
    • List or fetch elements, sections, tables, figures, and relationships for a single documentId. Typical uses: show every table in a policy, caption↔figure linking, section-scoped tools in a viewer.
  • Elements
  • Element types (public enum)
    • Documented values include: title, heading, paragraph, list, table, figure, caption, code_block, section, container, other—confirm the current list in the API reference.
    • Headings / sections — table of contents and in-document navigation.
    • Table, figure, caption — multimodal and layout-grounded answers.
    • List, paragraph — structured summaries; code_block — technical documentation.
    • container / other — preserved hierarchy when the parser cannot map to a finer type.
  • Relationship types
    • Each relationship has a type label plus source and target element ids within the document.
    • document_contains_element — document root to structural descendants.
    • parent_child — hierarchy within a document.
    • precedes — reading or layout order between elements.
    • caption_of — caption linked to its figure.
  • Authorization scopes
  • Listing elements
  • Integration patterns
    • Navigation assistants — List sections or filter elements → fetch chunks for the chosen branch.
    • Audit / policy — Use search with elementTypes constrained to heading / section / table, then walk relationships to supporting paragraphs.
    • Retrieval + structure — Use sections or elements to choose where to read, then search or chunk fetch for what to pass to a model, keeping citations consistent.
  • The cURL examples below include Open endpoint in api-docs and Open in API Runner under each block. Pair structured reads with search for end-to-end flows.

Parsed document body (Accept header; no ?query — duplicate keys → 400 invalid_input)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003/parsed" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Accept: text/markdown"

Chunks page (docs:parsed:read; next page: append cursor=<nextCursor> from prior JSON)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003/chunks?limit=20&format=markdown" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk" \
  -H "Accept: application/json"

List document elements (docs:parsed:read; omit include= for full payload+grounding; cursor = prior nextCursor UUID)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003/elements?limit=50&type=table,figure&include=payload,grounding" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Get one element by id (docs:parsed:read; only ?include=…; add relationships for embedded edges cap)

curl -s -X GET "https://api.indexify.dev/projects/550e8400-e29b-41d4-a716-446655440001/kbs/660e8400-e29b-41d4-a716-446655440002/documents/770e8400-e29b-41d4-a716-446655440003/elements/aa0e8400-e29b-41d4-a716-446655440006?include=payload,grounding,relationships" \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJtYWNoaW5lIn0.dBjftJeZ4CVP-mB92K27uhbUJU1p1b_wW1gFWFOEjXk"

Why structured access matters for agents

Vector search finds relevant passages; structured data-plane routes supply tables as data, figure assets, section trees, cross-references, and processing artifacts so tools and UIs stay grounded. Use per-document GET …/documents/{documentId}/… APIs together with POST …/search when you need cross-document discovery—see Building agents with Indexify for end-to-end scenarios.

Use the bullets as a checklist: read once, then keep this open while you implement so you don’t miss the small but important details (like token type, scopes, and strict query rules).

  • Beyond chunk-only RAG
    • Chunks are lossy and overlapping; listing or fetching elements, sections, tables, and figures by stable ids gives the model layout-aware inputs—especially when the answer depends on row/cell accuracy or visual content, not a paraphrase of PDF text.
  • Safer tool-calling
  • Scoped retrieval and long-document UX
    • GET …/sections, POST …/search with groupBy: section or elementTypes, and per-document chunks / elements let the agent or user narrow to a chapter, clause, or manual branch before pulling evidence—patterns that are awkward when you only have an unordered set of text chunks.
  • Explainability and audit
    • GET …/relationships and related-elements routes expose cross-references (“see Figure 2”, “per §3.1”) so citations and policy answers map to explicit regions.