Kyndex
Guides

Searching Over Encrypted Documents

Search encrypted documents without the server seeing your search terms or results using blind indexing.

Learn how to search across your encrypted documents without the server ever seeing your search terms or results. This guide walks you through the complete search flow: generating a search token on your device, sending it to the server, and decrypting the results.

Prerequisites:

  • A registered account with an active access token
  • At least one uploaded and processed document (status: processed)
  • Your blind index key (BIK), derived from your master key during account setup
  • A cryptography library capable of HMAC computation and symmetric decryption

Overview

Literal uses blind indexing to let you search encrypted documents. Both your documents and your search queries are transformed into one-way cryptographic tokens before they reach the server. The server compares tokens to find matches — without ever knowing what the tokens represent.

At a high level:

  1. At upload time — your document's searchable content was normalized and converted into blind index tokens using your search key. These tokens were stored alongside your encrypted document.
  2. At search time — your device creates a matching token from your search query using the same key and normalization process, then sends it to the server.
  3. The server compares tokens — it finds stored tokens that match your query token and returns the corresponding encrypted documents.
  4. You decrypt locally — your device unwraps the document encryption key and decrypts the metadata.

The server sees only opaque tokens in both directions. It cannot determine what you searched for or what the results contain.

For a deeper explanation of the cryptographic model, see Encrypted Search. For how document references are anonymized, see Blind Routing.

Step 1 — Normalize Your Search Term

Before generating a search token, your search term must be normalized to ensure it matches the tokens that were created at upload time. Normalization converts text to a consistent format so that differences like capitalization or extra whitespace don't cause misses.

The normalization process applies:

  • Unicode NFC normalization — ensures characters with multiple representations (e.g., accented letters) are stored in a single canonical form
  • Case folding — converts all text to lowercase
  • Whitespace normalization — collapses runs of spaces, tabs, and newlines into a single space, and trims leading and trailing whitespace

For example:

InputNormalized Output
" Driver's License ""driver's license"
"PASSPORT""passport"
"José García""josé garcía"

Normalization is deterministic — the same input always produces the same output. It is also idempotent — normalizing an already-normalized string returns the same string.

Important: You must use the same normalization logic that was used when the document was indexed. If you are building a custom client, use the Literal normalizer library (available as a WASM module for browser and Node.js environments) to ensure parity.

Step 2 — Generate A Search Token

Once your search term is normalized, compute a blind index token from it using your blind index key (BIK). The token is a HMAC-SHA256 output: a 32-byte value derived from the normalized text.

Token Generation Algorithm

The token is computed as:

token = HMAC-SHA256(bik, normalized_text)

Where:

  • bik is your blind index key (32 bytes)
  • normalized_text is the normalized search term from Step 1
  • Output is a 32-byte HMAC-SHA256 digest

This is a one-way transformation — the server cannot reverse the token to recover your search term, and different search terms produce entirely different tokens.

Implementation

Your client code needs to:

  1. Normalize the search term (Step 1)
  2. Compute HMAC-SHA256 using your BIK and the normalized text
  3. Base64-encode the resulting 32 bytes

JavaScript/TypeScript example (using @noble/hashes):

import { hmac } from '@noble/hashes/hmac';
import { sha256 } from '@noble/hashes/sha256';

// bik is a Uint8Array (32 bytes)
// normalizedText is a string
const token = hmac(sha256, bik, normalizedText);
const tokenBase64 = btoa(String.fromCharCode(...token));

Python example (using hmac standard library):

import hmac
import hashlib
import base64

token = hmac.new(
    bik,  # 32 bytes
    normalizedText.encode('utf-8'),
    hashlib.sha256
).digest()
tokenBase64 = base64.b64encode(token).decode('utf-8')

Library Recommendations

We recommend using battle-tested cryptography libraries:

  • JavaScript/TypeScript: @noble/hashes — minimal, audited, fast
  • Python: Built-in hmac + hashlib (standard library)
  • Rust: hmac crate + sha2 crate

Important: The Literal SDK (coming soon) will handle token generation automatically. For now, ensure your client implementation produces tokens that match the algorithm above. Use test vectors to verify your implementation.

Which Key To Use

  • Personal search — use your personal BIK (derived from your User Master Key during account creation)
  • Organization search — use the organization's BIK (derived from the Entity Master Key; enclave generates entity-scoped tokens during document sharing)

Step 3 — Send The Search Request

Send a POST request to the search endpoint with your base64-encoded token:

curl -X POST https://api.kyndex.co/v1/search \
  -H 'Authorization: Bearer <access_token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "token": "<base64-search-token-32-bytes>",
    "scope": "consumer"
  }'
curl -X POST https://api.kyndex.co/v1/search \
  -H 'Authorization: Bearer <access_token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "token": "<base64-search-token-32-bytes>",
    "scope": "entity",
    "entity_token": "<base64-entity-token>"
  }'

Request Body Reference

FieldTypeRequiredDescription
tokenstringYesBase64-encoded blind index search token (32 bytes decoded)
scopestringYesconsumer for personal documents, entity for organization documents
index_typesstring[]NoFilter results by index category (see Filtering by index type below)
entity_tokenstringOnly for entity scopeCryptographic token identifying the organization to search within

POST /v1/search accepts one token per request. Multi-term queries require one request per term; intersect the result sets on your device to find documents matching all terms.

Step 4 — Handle The Response

A successful search returns a list of matching documents, each containing encrypted data that only you can read:

Consumer Scope Response

{
  "documents": [
    {
      "doc_token": "<base64-document-reference>",
      "metadata_encrypted": "<base64-encrypted-metadata>",
      "wrapped_dek_umk": "<base64-dek-wrapped-with-your-key>"
    }
  ],
  "count": 1
}

Entity Scope Response

{
  "documents": [
    {
      "doc_token": "<base64-document-reference>",
      "metadata_encrypted": "<base64-encrypted-metadata>",
      "wrapped_dek": "<base64-dek-wrapped-with-entity-key>"
    }
  ],
  "count": 1
}

Response Fields

FieldDescription
doc_tokenAn opaque, one-way reference to the matching document. Not a readable document ID.
metadata_encryptedThe document's metadata (title, type, tags, fields), encrypted with the document encryption key (DEK).
wrapped_dek_umk(Consumer scope) The DEK wrapped with your personal master key. Unwrap this to decrypt the metadata.
wrapped_dek(Entity scope) The DEK wrapped with the organization's key. Unwrap using the entity master key.
countTotal number of matching documents.

Step 5 — Decrypt The Results

The server returns only encrypted data. To read the actual document metadata, decrypt client-side:

  1. Unwrap the DEK — Use your master key to unwrap wrapped_dek_umk (consumer scope) or the entity key to unwrap wrapped_dek (entity scope). This gives you the document's symmetric encryption key.
  2. Decrypt the metadata — Use the DEK to decrypt metadata_encrypted. The result contains the document's title, type, tags, extracted fields, and other metadata.
  3. Use doc_token for follow-up operations — The doc_token is a blind reference you can use in subsequent API calls (e.g., downloading the document or managing access grants).

Zero-Knowledge Guarantee: The server never sees your search term, the decrypted metadata, or the DEK. It only compares opaque tokens and returns encrypted blobs.

Filtering By Index Type

Literal indexes several categories of document information. You can narrow your search by specifying which categories to match against:

Index TypeWhat It Matches
doc_typeThe kind of document (e.g., "passport", "driver's license")
doc_fieldSpecific extracted fields (e.g., a name, ID number)
doc_dateDate values associated with the document
doc_tagUser-applied labels and tags
text_contentWords and phrases extracted from the document's text content

Example: Search only for documents whose type matches your query:

{
  "token": "<base64-search-token>",
  "scope": "consumer",
  "index_types": ["doc_type"]
}

Omitting index_types searches across all categories.

What The Server Sees vs. Doesn't

Understanding the privacy boundary is essential when building on Literal:

The Server Sees:

  • That a search request was made (the HTTP request itself)
  • An opaque 32-byte token (your search query, transformed)
  • How many results matched
  • Which encrypted documents correspond to matching tokens

The Server Does NOT See:

  • What you searched for (the token is one-way and cannot be reversed)
  • What the matching documents contain (all metadata is encrypted)
  • Why those particular documents matched
  • The plaintext of your search queries — tokens are one-way and cannot be reversed to recover the original term. The server can observe that the same token was submitted more than once but cannot determine what it represents.
  • Any correlation between your search activity and other users' activity
  • Who performed the search — your identity is represented by a blind member token derived from your personal key, not your user ID

Even the document references in the response (doc_token) are blind tokens — the server cannot map them back to document identifiers without your key.

For more on this privacy model, see Zero-Knowledge Model.

Error Handling

The search endpoint returns standard error responses:

StatusMeaningAction
400Invalid request — token is not valid base64, not 32 bytes, or missing required fieldsCheck that your token is correctly base64-encoded and exactly 32 bytes when decoded
401Unauthorized — session expired or token invalidRe-authenticate and obtain a new access token
403Forbidden — not a member of the specified entity (entity scope only)Verify your entity membership before searching in organization scope
429Rate limitedBack off and retry after the indicated period
500Internal errorRetry with exponential backoff

All errors use the RFC 9457 Problem Details format:

{
  "type": "https://api.kyndex.co/errors/INVALID_REQUEST",
  "title": "Bad Request",
  "status": 400,
  "detail": "Search token must be 32 bytes (base64 encoded)",
  "instance": "/v1/search"
}

Searching After Claiming A Shared Document

When someone shares a document with you via a grant, it doesn't automatically appear in your personal search results. After you claim a grant, you need to submit your own blind index tokens for that document so it becomes searchable in your consumer scope.

The flow is:

  1. Claim the grant — obtain the DEK for the shared document
  2. Decrypt the document metadata — extract the searchable fields
  3. Generate blind index tokens — normalize each field and compute tokens using your personal BIK
  4. Submit tokens — send them to the consumer indexes endpoint for that document

Submit your tokens with an active session:

curl -X POST https://api.kyndex.co/v1/documents/consumer-indexes \
  -H 'Authorization: Bearer <access_token>' \
  -H 'Content-Type: application/json' \
  -d '{
    "doc_token": "<base64-document-reference>",
    "tokens": [
      { "token": "<base64-blind-index-token>", "index_type": "doc_type" },
      { "token": "<base64-blind-index-token>", "index_type": "text_content" }
    ]
  }'

Once submitted, the document will appear in future consumer scope searches that match any of your submitted tokens.

Example — Full Search Flow

Here is the complete search flow in pseudocode:

// 1. Normalize the search term
searchTerm = "Driver's License"
normalized = normalize(searchTerm)  // → "driver's license"

// 2. Generate a blind index token
// Compute HMAC-SHA256(bik, normalized_text)
token = hmacSha256(personalBIK, normalized)  // → 32 bytes
tokenBase64 = base64Encode(token)

// 3. Send the search request
response = POST /v1/search {
  token: tokenBase64,
  scope: "consumer",
  index_types: ["doc_type"]
}

// 4. Handle the response
if response.count == 0 {
  // No documents match this search term
  return "No results found"
}

// 5. Decrypt each result
for doc in response.documents {
  // Unwrap the document encryption key (DEK) using your user master key
  dek = unwrapKey(userMasterKey, doc.wrapped_dek_umk)

  // Decrypt the metadata using the DEK
  metadata = decrypt(dek, doc.metadata_encrypted)

  // metadata now contains: title, type, tags, extracted fields, etc.
  print metadata.title  // e.g., "John's Passport"

  // Use doc.doc_token for follow-up operations (download, grant, etc.)
}

On this page