Schema Extraction API
Create a new schema extraction job for a document.
Endpoint: POST /schema/extract
Request Headers
Header | Type | Required | Description |
---|---|---|---|
Authorization | string | Yes | Bearer token authentication |
project_id | string | Yes | Project identifier |
Request Body
SchemaExtractionRequest
Field | Type | Required | Description |
---|---|---|---|
file_id | string | Yes | ID of the uploaded file to process |
document_schema | DocumentSchema | Yes | Schema definition for extraction |
DocumentSchema
Field | Type | Required | Description |
---|---|---|---|
name | string | Yes | Name of the schema |
description | string | Yes | Description of the schema |
fields | SchemaField[] | Yes | List of top-level fields |
SchemaField
Field | Type | Required | Description |
---|---|---|---|
name | string | Yes | Field name |
description | string | Yes | Field description |
type | string | Yes | One of: "string", "number", "email", "phone", "date", "object", "array" |
required | boolean | Yes | Whether the field is required |
fields | SchemaField[] | No | Nested fields for object and array types |
Example request body:
{
"file_id": "123e4567-e89b-12d3-a456-426614174000",
"document_schema": {
"name": "Invoice Schema",
"description": "Schema for processing invoice documents",
"fields": [
{
"name": "invoice_number",
"description": "Unique invoice identifier",
"type": "string",
"required": true
},
{
"name": "issue_date",
"description": "Date when invoice was issued",
"type": "date",
"required": true
},
{
"name": "line_items",
"description": "List of items in the invoice",
"type": "array",
"required": true,
"fields": [
{
"name": "description",
"description": "Item description",
"type": "string",
"required": true
},
{
"name": "amount",
"description": "Item amount",
"type": "number",
"required": true
}
]
}
]
}
}
Response
SchemaExtractionResponse
Field | Type | Required | Description |
---|---|---|---|
job_id | string | Yes | ID of the created extraction job |
status | string | Yes | Current status of the job |
created_at | datetime | Yes | When the job was created |
result | object|null | No | Extraction results |
{
"job_id": "98765432-abcd-efgh-ijkl-123456789000",
"status": "PENDING",
"created_at": "2024-01-01T12:00:00Z",
"result": null
}
Examples
cURL
curl -X POST 'https://api.example.com/schema/extract' \
-H 'Authorization: Bearer your-api-key' \
-H 'project_id: your-project-id' \
-H 'Content-Type: application/json' \
-d '{
"file_id": "123e4567-e89b-12d3-a456-426614174000",
"document_schema": {
"name": "Invoice Schema",
"description": "Schema for processing invoice documents",
"fields": [
{
"name": "invoice_number",
"description": "Unique invoice identifier",
"type": "string",
"required": true
},
{
"name": "total_amount",
"description": "Total invoice amount",
"type": "number",
"required": true
}
]
}
}'
Python
import requests
url = 'https://api.example.com/schema/extract'
headers = {
'Authorization': 'Bearer your-api-key',
'project_id': 'your-project-id'
}
payload = {
'file_id': '123e4567-e89b-12d3-a456-426614174000',
'document_schema': {
'name': 'Invoice Schema',
'description': 'Schema for processing invoice documents',
'fields': [
{
'name': 'invoice_number',
'description': 'Unique invoice identifier',
'type': 'string',
'required': True
},
{
'name': 'total_amount',
'description': 'Total invoice amount',
'type': 'number',
'required': True
}
]
}
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
Node.js
const axios = require("axios");
const url = "https://api.example.com/schema/extract";
const headers = {
Authorization: "Bearer your-api-key",
project_id: "your-project-id",
};
const payload = {
file_id: "123e4567-e89b-12d3-a456-426614174000",
document_schema: {
name: "Invoice Schema",
description: "Schema for processing invoice documents",
fields: [
{
name: "invoice_number",
description: "Unique invoice identifier",
type: "string",
required: true,
},
{
name: "total_amount",
description: "Total invoice amount",
type: "number",
required: true,
},
],
},
};
axios
.post(url, payload, { headers })
.then((response) => console.log(response.data))
.catch((error) => console.error(error));
Response Codes
Code | Description |
---|---|
201 | Job created successfully |
400 | Missing project_id header |
500 | Internal server error |