Searching Over Encrypted Documents
Search encrypted documents without the server seeing your search terms or results using blind indexing.
Learn how to search across your encrypted documents without the server ever seeing your search terms or results. This guide walks you through the complete search flow: generating a search token on your device, sending it to the server, and decrypting the results.
Prerequisites:
- A registered account with an active access token
- At least one uploaded and processed document (status:
processed) - Your blind index key (BIK), derived from your master key during account setup
- A cryptography library capable of HMAC computation and symmetric decryption
Overview
Literal uses blind indexing to let you search encrypted documents. Both your documents and your search queries are transformed into one-way cryptographic tokens before they reach the server. The server compares tokens to find matches — without ever knowing what the tokens represent.
At a high level:
- At upload time — your document's searchable content was normalized and converted into blind index tokens using your search key. These tokens were stored alongside your encrypted document.
- At search time — your device creates a matching token from your search query using the same key and normalization process, then sends it to the server.
- The server compares tokens — it finds stored tokens that match your query token and returns the corresponding encrypted documents.
- You decrypt locally — your device unwraps the document encryption key and decrypts the metadata.
The server sees only opaque tokens in both directions. It cannot determine what you searched for or what the results contain.
For a deeper explanation of the cryptographic model, see Encrypted Search. For how document references are anonymized, see Blind Routing.
Step 1 — Normalize Your Search Term
Before generating a search token, your search term must be normalized to ensure it matches the tokens that were created at upload time. Normalization converts text to a consistent format so that differences like capitalization or extra whitespace don't cause misses.
The normalization process applies:
- Unicode NFC normalization — ensures characters with multiple representations (e.g., accented letters) are stored in a single canonical form
- Case folding — converts all text to lowercase
- Whitespace normalization — collapses runs of spaces, tabs, and newlines into a single space, and trims leading and trailing whitespace
For example:
| Input | Normalized Output |
|---|---|
" Driver's License " | "driver's license" |
"PASSPORT" | "passport" |
"José García" | "josé garcía" |
Normalization is deterministic — the same input always produces the same output. It is also idempotent — normalizing an already-normalized string returns the same string.
Important: You must use the same normalization logic that was used when the document was indexed. If you are building a custom client, use the Literal normalizer library (available as a WASM module for browser and Node.js environments) to ensure parity.
Step 2 — Generate A Search Token
Once your search term is normalized, compute a blind index token from it using your blind index key (BIK). The token is a HMAC-SHA256 output: a 32-byte value derived from the normalized text.
Token Generation Algorithm
The token is computed as:
token = HMAC-SHA256(bik, normalized_text)Where:
bikis your blind index key (32 bytes)normalized_textis the normalized search term from Step 1- Output is a 32-byte HMAC-SHA256 digest
This is a one-way transformation — the server cannot reverse the token to recover your search term, and different search terms produce entirely different tokens.
Implementation
Your client code needs to:
- Normalize the search term (Step 1)
- Compute HMAC-SHA256 using your BIK and the normalized text
- Base64-encode the resulting 32 bytes
JavaScript/TypeScript example (using @noble/hashes):
import { hmac } from '@noble/hashes/hmac';
import { sha256 } from '@noble/hashes/sha256';
// bik is a Uint8Array (32 bytes)
// normalizedText is a string
const token = hmac(sha256, bik, normalizedText);
const tokenBase64 = btoa(String.fromCharCode(...token));Python example (using hmac standard library):
import hmac
import hashlib
import base64
token = hmac.new(
bik, # 32 bytes
normalizedText.encode('utf-8'),
hashlib.sha256
).digest()
tokenBase64 = base64.b64encode(token).decode('utf-8')Library Recommendations
We recommend using battle-tested cryptography libraries:
- JavaScript/TypeScript:
@noble/hashes— minimal, audited, fast - Python: Built-in
hmac+hashlib(standard library) - Rust:
hmaccrate +sha2crate
Important: The Literal SDK (coming soon) will handle token generation automatically. For now, ensure your client implementation produces tokens that match the algorithm above. Use test vectors to verify your implementation.
Which Key To Use
- Personal search — use your personal BIK (derived from your User Master Key during account creation)
- Organization search — use the organization's BIK (derived from the Entity Master Key; enclave generates entity-scoped tokens during document sharing)
Step 3 — Send The Search Request
Send a POST request to the search endpoint with your base64-encoded token:
Personal (Consumer) Search
curl -X POST https://api.kyndex.co/v1/search \
-H 'Authorization: Bearer <access_token>' \
-H 'Content-Type: application/json' \
-d '{
"token": "<base64-search-token-32-bytes>",
"scope": "consumer"
}'Organization (Entity) Search
curl -X POST https://api.kyndex.co/v1/search \
-H 'Authorization: Bearer <access_token>' \
-H 'Content-Type: application/json' \
-d '{
"token": "<base64-search-token-32-bytes>",
"scope": "entity",
"entity_token": "<base64-entity-token>"
}'Request Body Reference
| Field | Type | Required | Description |
|---|---|---|---|
token | string | Yes | Base64-encoded blind index search token (32 bytes decoded) |
scope | string | Yes | consumer for personal documents, entity for organization documents |
index_types | string[] | No | Filter results by index category (see Filtering by index type below) |
entity_token | string | Only for entity scope | Cryptographic token identifying the organization to search within |
POST /v1/search accepts one token per request. Multi-term queries require one request per term;
intersect the result sets on your device to find documents matching all terms.
Step 4 — Handle The Response
A successful search returns a list of matching documents, each containing encrypted data that only you can read:
Consumer Scope Response
{
"documents": [
{
"doc_token": "<base64-document-reference>",
"metadata_encrypted": "<base64-encrypted-metadata>",
"wrapped_dek_umk": "<base64-dek-wrapped-with-your-key>"
}
],
"count": 1
}Entity Scope Response
{
"documents": [
{
"doc_token": "<base64-document-reference>",
"metadata_encrypted": "<base64-encrypted-metadata>",
"wrapped_dek": "<base64-dek-wrapped-with-entity-key>"
}
],
"count": 1
}Response Fields
| Field | Description |
|---|---|
doc_token | An opaque, one-way reference to the matching document. Not a readable document ID. |
metadata_encrypted | The document's metadata (title, type, tags, fields), encrypted with the document encryption key (DEK). |
wrapped_dek_umk | (Consumer scope) The DEK wrapped with your personal master key. Unwrap this to decrypt the metadata. |
wrapped_dek | (Entity scope) The DEK wrapped with the organization's key. Unwrap using the entity master key. |
count | Total number of matching documents. |
Step 5 — Decrypt The Results
The server returns only encrypted data. To read the actual document metadata, decrypt client-side:
- Unwrap the DEK — Use your master key to unwrap
wrapped_dek_umk(consumer scope) or the entity key to unwrapwrapped_dek(entity scope). This gives you the document's symmetric encryption key. - Decrypt the metadata — Use the DEK to decrypt
metadata_encrypted. The result contains the document's title, type, tags, extracted fields, and other metadata. - Use
doc_tokenfor follow-up operations — Thedoc_tokenis a blind reference you can use in subsequent API calls (e.g., downloading the document or managing access grants).
Zero-Knowledge Guarantee: The server never sees your search term, the decrypted metadata, or the DEK. It only compares opaque tokens and returns encrypted blobs.
Filtering By Index Type
Literal indexes several categories of document information. You can narrow your search by specifying which categories to match against:
| Index Type | What It Matches |
|---|---|
doc_type | The kind of document (e.g., "passport", "driver's license") |
doc_field | Specific extracted fields (e.g., a name, ID number) |
doc_date | Date values associated with the document |
doc_tag | User-applied labels and tags |
text_content | Words and phrases extracted from the document's text content |
Example: Search only for documents whose type matches your query:
{
"token": "<base64-search-token>",
"scope": "consumer",
"index_types": ["doc_type"]
}Omitting index_types searches across all categories.
What The Server Sees vs. Doesn't
Understanding the privacy boundary is essential when building on Literal:
The Server Sees:
- That a search request was made (the HTTP request itself)
- An opaque 32-byte token (your search query, transformed)
- How many results matched
- Which encrypted documents correspond to matching tokens
The Server Does NOT See:
- What you searched for (the token is one-way and cannot be reversed)
- What the matching documents contain (all metadata is encrypted)
- Why those particular documents matched
- The plaintext of your search queries — tokens are one-way and cannot be reversed to recover the original term. The server can observe that the same token was submitted more than once but cannot determine what it represents.
- Any correlation between your search activity and other users' activity
- Who performed the search — your identity is represented by a blind member token derived from your personal key, not your user ID
Even the document references in the response (doc_token) are blind tokens — the server cannot map them back to document identifiers without your key.
For more on this privacy model, see Zero-Knowledge Model.
Error Handling
The search endpoint returns standard error responses:
| Status | Meaning | Action |
|---|---|---|
400 | Invalid request — token is not valid base64, not 32 bytes, or missing required fields | Check that your token is correctly base64-encoded and exactly 32 bytes when decoded |
401 | Unauthorized — session expired or token invalid | Re-authenticate and obtain a new access token |
403 | Forbidden — not a member of the specified entity (entity scope only) | Verify your entity membership before searching in organization scope |
429 | Rate limited | Back off and retry after the indicated period |
500 | Internal error | Retry with exponential backoff |
All errors use the RFC 9457 Problem Details format:
{
"type": "https://api.kyndex.co/errors/INVALID_REQUEST",
"title": "Bad Request",
"status": 400,
"detail": "Search token must be 32 bytes (base64 encoded)",
"instance": "/v1/search"
}Searching After Claiming A Shared Document
When someone shares a document with you via a grant, it doesn't automatically appear in your personal search results. After you claim a grant, you need to submit your own blind index tokens for that document so it becomes searchable in your consumer scope.
The flow is:
- Claim the grant — obtain the DEK for the shared document
- Decrypt the document metadata — extract the searchable fields
- Generate blind index tokens — normalize each field and compute tokens using your personal BIK
- Submit tokens — send them to the consumer indexes endpoint for that document
Submit your tokens with an active session:
curl -X POST https://api.kyndex.co/v1/documents/consumer-indexes \
-H 'Authorization: Bearer <access_token>' \
-H 'Content-Type: application/json' \
-d '{
"doc_token": "<base64-document-reference>",
"tokens": [
{ "token": "<base64-blind-index-token>", "index_type": "doc_type" },
{ "token": "<base64-blind-index-token>", "index_type": "text_content" }
]
}'Once submitted, the document will appear in future consumer scope searches that match any of your submitted tokens.
Example — Full Search Flow
Here is the complete search flow in pseudocode:
// 1. Normalize the search term
searchTerm = "Driver's License"
normalized = normalize(searchTerm) // → "driver's license"
// 2. Generate a blind index token
// Compute HMAC-SHA256(bik, normalized_text)
token = hmacSha256(personalBIK, normalized) // → 32 bytes
tokenBase64 = base64Encode(token)
// 3. Send the search request
response = POST /v1/search {
token: tokenBase64,
scope: "consumer",
index_types: ["doc_type"]
}
// 4. Handle the response
if response.count == 0 {
// No documents match this search term
return "No results found"
}
// 5. Decrypt each result
for doc in response.documents {
// Unwrap the document encryption key (DEK) using your user master key
dek = unwrapKey(userMasterKey, doc.wrapped_dek_umk)
// Decrypt the metadata using the DEK
metadata = decrypt(dek, doc.metadata_encrypted)
// metadata now contains: title, type, tags, extracted fields, etc.
print metadata.title // e.g., "John's Passport"
// Use doc.doc_token for follow-up operations (download, grant, etc.)
}Related Resources
- Encrypted Search — understand the cryptographic foundations of blind indexing
- Blind Routing — learn why document references are anonymized
- Zero-Knowledge Model — see the full picture of what the server can and can't observe
- Getting Started — authentication, first upload, and API basics