DocuSift API

One API key. Two ways to get the data.

A single REST endpoint plus a webhook push. Upload any PDF or image, and we return a deterministic { data: … } envelope with classification confidence, extraction confidence, and the extracted fields.

Quickstart

Three steps from signup to extracted data.

  1. Sign up and mint an API key under Settings → API Keys. Keys look like ds_live_… and are tenant-scoped.
  2. POST a file to /api/v1/extract with the X-API-Key header. The response holds the document id.
  3. Either GET /api/v1/extract/:id until status is extracted (or approved), or register a webhook and we’ll POST the result to you.

Authentication

Every request sends the key in an X-API-Key header. Keys are tenant-scoped, revocable from the dashboard, and carry a per-key rate limit configurable by the tenant admin.

X-API-Key: ds_live_3f8a…
  • Transport: HTTPS only. HTTP requests are refused.
  • Rotation: generate a new key, switch clients over, then revoke the old one. Never ship keys in browser or mobile code.
  • Rate limit: default 60 requests per minute per key; surfaced as HTTP 429 with a Retry-After header.

Endpoints

Two endpoints cover the full lifecycle. Base URL: https://docusift.co.

MethodPathPurpose
POST/api/v1/extractUpload a document (multipart/form-data). Returns 202 with the created document id.
GET/api/v1/extract/:idRead document status and, once complete, the extracted data plus an inlined ocr object with the rendered Markdown and raw OCR JSON.

Upload a document

One or more files per request. Files that fail validation land in errors while valid files still process.

curl
curl -X POST https://docusift.co/api/v1/extract \
  -H "X-API-Key: ds_…" \
  -F "file=@invoice-1042.pdf"

# 202 Accepted
{
  "data": [
    {
      "id": "doc_01HZ…",
      "file_name": "invoice-1042.pdf",
      "mime_type": "application/pdf",
      "source": "api",
      "status": "uploaded"
    }
  ],
  "errors": []
}
node.js
import { readFileSync } from 'node:fs';

const form = new FormData();
form.set(
  'file',
  new Blob([readFileSync('invoice-1042.pdf')], { type: 'application/pdf' }),
  'invoice-1042.pdf',
);

const res = await fetch('https://docusift.co/api/v1/extract', {
  method: 'POST',
  headers: { 'X-API-Key': process.env.DOCUSIFT_API_KEY! },
  body: form,
});
const { data, errors } = await res.json();
const [doc] = data;
console.log(doc.id);

Poll for the result

Typical completion is a few seconds. Poll on 1s backing off to 5s, or skip polling entirely by registering a webhook.

curl
curl https://docusift.co/api/v1/extract/doc_01HZ… \
  -H "X-API-Key: ds_…"

# 200 OK
{
  "data": {
    "id": "doc_01HZ…",
    "file_name": "invoice-1042.pdf",
    "doc_type": "invoice",
    "status": "extracted",
    "classification_confidence": 0.99,
    "extraction_confidence": 0.96,
    "data": {
      "invoice_number": "INV-1042",
      "vendor_name": "Acme Co.",
      "total_amount": 1284.50,
      "currency": "USD",
      "line_items": [ /* … */ ]
    },
    "ocr": {
      "markdown": "# Invoice INV-1042\n\n...",
      "json": { "pages": [ { "page_number": 1, "words": [ /* … */ ] } ] }
    }
  }
}
node.js
const res = await fetch(
  `https://docusift.co/api/v1/extract/${id}`,
  { headers: { 'X-API-Key': process.env.DOCUSIFT_API_KEY! } },
);
const { data } = await res.json();

if (data.status === 'extracted' || data.status === 'approved') {
  // data.data holds the extracted fields
  console.log(data.data.invoice_number);
}

Response envelope

Every response wraps its payload in a data key. For a single document the shape is { data: {…fields} }; for collections it’s { data: [ … ], errors: [ … ] }. Extracted fields live under data.data.

  • Status lifecycle: uploaded classifyingextracted / needs_reviewapproved.
  • Confidence: classification_confidence and extraction_confidence are both floats in [0, 1].
  • Timestamps: all timestamps are ISO-8601 UTC strings or null.

File support

MIME typeExtensionsMax size
application/pdf.pdf30 MB
image/jpeg.jpg, .jpeg30 MB
image/png.png30 MB
image/tiff.tif, .tiff30 MB

Scanned, native, handwritten, and multi-page documents all run through the same pipeline. Oversized or unsupported files appear in the errors array without blocking the rest of the batch.

Webhooks

Register a target URL in Settings → Integrations. We POST a signed JSON payload on document.processed with 3 retries on exponential backoff (60s, 5m, 15m).

example payload
POST https://your-app.example.com/webhooks/docusift
Content-Type: application/json
X-DocuSift-Event: document.processed
X-DocuSift-Signature: sha256=5e3a…b9f1

{
  "event": "document.processed",
  "timestamp": "2026-04-23T12:04:11.203Z",
  "data": {
    "id": "doc_01HZ…",
    "doc_type": "invoice",
    "status": "approved",
    "data": { /* same extracted fields */ }
  }
}

Verify the signature

The X-DocuSift-Signature header is an HMAC-SHA256 of the raw request body using your webhook secret. Always verify in constant time.

node.js
import { createHmac, timingSafeEqual } from 'node:crypto';

export function verifyDocuSiftSignature(
  rawBody: string,
  header: string,
  secret: string,
): boolean {
  const expected = 'sha256=' +
    createHmac('sha256', secret).update(rawBody).digest('hex');
  const a = Buffer.from(header);
  const b = Buffer.from(expected);
  return a.length === b.length && timingSafeEqual(a, b);
}

Error codes

Errors respond with a { detail: string } body and a status code from this table.

StatusMeaningWhat to do
400Malformed request body or missing fileCheck multipart encoding; retry with a valid file.
401Missing or invalid API keyConfirm X-API-Key. Keys revoke instantly.
404Document does not belong to the caller’s tenantCheck the id and the key’s tenant.
413File exceeds 30 MBDownscale, split, or compress the source.
415Unsupported MIME typeUpload one of pdf / jpeg / png / tiff.
429Per-key rate limit hitRespect Retry-After. Back off with jitter.
5xxTransient server errorRetry with exponential backoff; contact support if persistent.