Back to blog

How People Partner Manages Context

The Pipeline (Every Message)

Every message you send goes through 7 stages before you see a response:

User Input → PII Scan → Query Classification → Context Assembly → Token Budget → LLM → Answer Verification → Response

All of it orchestrated between the React frontend and the Rust backend. No cloud preprocessing. No external API calls until the LLM itself. Here's how each stage works.


Stage 1: PII Scanning

Before anything touches the AI, the user's input is scanned for financial PII — Social Security numbers, credit card numbers, bank account patterns — using regex in Rust. If detected, the text is auto-redacted and replaced with placeholders. The user sees a notification, never a blocking modal. Security without friction.

"John's SSN is 123-45-6789""John's SSN is [SSN REDACTED]"

The redacted version is what gets sent to the LLM. The original is never logged.

Stage 2: Query Classification

The system classifies every query into one of 6 types using keyword heuristics — no LLM call needed:

| Query Type | Example | What It Retrieves | |---|---|---| | Individual | "Tell me about Sarah" | Full employee profiles (max 3) | | Comparison | "Top performers in Engineering" | Multiple full profiles (max 8) | | List | "Who's in Marketing?" | Lightweight summaries (max 30) | | Aggregate | "What's our headcount?" | Org-wide stats only, no individual data | | Attrition | "Who left this year?" | Recent terminations with full context | | General | "How should I handle this PIP?" | Balanced sample (max 5) |

The classification uses priority-based logic: explicit names always win (→ Individual), ranking keywords beat department keywords (→ Comparison over List), and so on. This means the system retrieves exactly the right depth of data for every question — no wasted tokens, no missing context.

Stage 3: Query Mention Extraction

A heuristic NER-like system extracts structured signals from the raw query:

  • Employee names — Capitalized words that aren't common English/HR terms, with possessive handling ("Sarah's" → "Sarah")
  • Department names — Word-boundary matching ("IT" matches only at word boundaries, not inside "wITh")
  • Intent flagsis_performance_query, is_tenure_query, is_top_performer_query, is_theme_query, and more
  • Tenure direction — Distinguishes "who's been here longest" vs "newest hires" vs "upcoming anniversaries"
  • Theme detection — Maps semantic variants to canonical themes ("people skills" → communication, "coaching" → mentoring)
  • Target field — Whether theme queries target strengths ("excels at") or opportunities ("struggles with")

All of this happens in Rust, in microseconds, with zero network calls.

Stage 4: Context Assembly

Based on the query type, the system assembles context from 5 sources:

4a. Company Context

Single SQL query: company name, state, industry, active employee count, department count.

4b. Organization Aggregates

A battery of SQL queries computed for every message (~2K characters when formatted):

  • Headcount breakdown (total, active, terminated, on leave)
  • Department distribution with percentages
  • Performance rating distribution (exceptional / exceeds / meets / needs improvement)
  • eNPS score with promoter/passive/detractor breakdown
  • YTD attrition stats (voluntary/involuntary, average tenure, annualized turnover rate)

This is what makes aggregate queries accurate — the LLM doesn't guess, it reads pre-computed ground truth.

4c. Employee Context (Query-Type-Adaptive)

This is the key innovation. Different queries retrieve different depths:

  • Individual: Full profile with all ratings, all eNPS scores, career summary, extracted review highlights (strengths, opportunities, themes, quotes), and trend analysis
  • Comparison: Same full profiles but for multiple employees, with specialized retrieval for top/bottom performers or theme-based filtering
  • List: Lightweight summary structs (~70 characters each) — just name, department, title, status
  • Aggregate: No individual employee data at all — the org aggregates are sufficient
  • Attrition: Recent terminations with termination reason, tenure, and performance history

Employee retrieval priority: selected employee (from UI panel) → name matches → department matches → specialized queries → random sample fallback.

4d. Cross-Conversation Memory

Hybrid search for up to 3 relevant past conversation summaries:

  1. First tries summary-only FTS5 search (more focused)
  2. Falls back to full conversation FTS if no summary matches

Summaries are 2–3 sentence AI-generated distillations of past conversations. No vector database. No embedding API calls. SQLite FTS5 — works completely offline.

4e. Document Context

FTS5 search across ingested company documents (employee handbook, policies, etc.). Returns ranked chunks with source metadata for citation. Token-budgeted to avoid crowding out other context.

Stage 5: Token Budget Management

The system operates within a 200K context window:

| Allocation | Tokens | Notes | |---|---|---| | System prompt | 20K | Persona + company + employees + docs + memory | | Conversation history | 150K | 75% of window | | Output reserved | 8K | Space for the response | | Safety buffer | 22K | Headroom |

Within the system prompt budget, each query type gets a different token allocation:

| Query Type | Employee Tokens | Memory Tokens | Total | |---|---|---|---| | Aggregate | 0 | 500 | 1,000 | | Individual | 4,000 | 1,000 | 5,000 | | Comparison | 3,000 | 500 | 3,500 | | List | 2,000 | 500 | 2,500 | | Attrition | 2,000 | 500 | 2,500 | | General | 2,000 | 1,000 | 3,000 |

When conversation history exceeds the budget, the oldest user/assistant message pairs are silently dropped. This preserves recent context without notification — a deliberate UX decision. The user shouldn't have to worry about context management.

Employee context is capped at 10 employees and 16K characters (~4K tokens) to prevent any single query from blowing the budget.

Stage 6: System Prompt Assembly

Everything gets composed into a structured system prompt:

  • Persona preamble — The selected AI persona with company-specific variables
  • Communication style — Persona-specific guidelines
  • Company context — Name, state, employee count, departments
  • Organization data — Formatted aggregates (headcount, ratings, eNPS, attrition)
  • Context awareness rules — State-specific employment law guidance, employee name references, document citation preferences
  • Boundaries — Guidance not legal advice, recommend counsel for litigation
  • Relevant employees — Formatted profiles or summaries
  • Relevant documents — Document chunks with source citations
  • Relevant past conversations — Memory summaries

Stage 7: Answer Verification

Post-response, not pre-response. For aggregate queries only, the system:

  1. Extracts numeric claims from the AI's response using regex (headcount patterns, percentages, ratings, eNPS scores)
  2. Compares each claim against the pre-computed ground truth aggregates
  3. Returns a verification result: Verified, Partial Match, or Unverified
  4. The frontend displays a verification badge on the message

This catches hallucinated numbers — if the AI says "82 active employees" but the database has 85, it's flagged. No silent errors.

Background Operations

These run on a default model regardless of user selection, keeping costs predictable:

| Operation | Trigger | Model | |---|---|---| | Conversation summaries | After conversation ends | Default (Sonnet) | | Title generation | After first message | Default (Sonnet) | | Highlight extraction | After review import | Default (Sonnet) | | Employee career summaries | After highlights exist | Default (Sonnet) |

The model selector only affects interactive chat. Background operations always use the cost-efficient default.

Model-Aware Adaptation

When users switch models (e.g., Sonnet → Opus), the system adapts automatically. The selected model's context window determines the conversation token budget (75% of window). Gemini's 1M context window gets 750K for conversation; OpenAI's 128K gets 96K. Max output tokens are passed through to the provider.

What Makes This Different

  1. Zero-LLM-call context routing — Query classification and employee retrieval happen entirely in Rust with SQL and heuristics. No "pre-processing" LLM call.
  2. Query-adaptive retrieval — An aggregate question doesn't waste tokens on individual profiles. A name query doesn't waste tokens on org-wide stats.
  3. Ground truth verification — The LLM's numeric claims are checked against SQL-computed facts post-response.
  4. PII never leaves the device unredacted — Scanning happens before context building, in Rust, with no network calls.
  5. Cross-conversation memory without embeddings — SQLite FTS5 hybrid search. No vector database, no embedding API calls. Works offline.
  6. Token observability — Every query tracks retrieval metrics: employees found/included, memories found/included, token budget vs actual usage, retrieval time in milliseconds.

The result: every answer is grounded in your actual data, verified against your actual numbers, and processed entirely on your machine.


People Partner is a local-first HR knowledge platform that runs entirely on your Mac. Your employee data, policies, and documents stay on your machine — processed by on-device AI, never uploaded to the cloud. $99 one-time purchase.

People Partner brings your scattered HR data into one place — private, local, and paired with AI.

Try it free