How People Partner Manages Context
The Pipeline (Every Message)
Every message you send goes through 7 stages before you see a response:
User Input → PII Scan → Query Classification → Context Assembly → Token Budget → LLM → Answer Verification → Response
All of it orchestrated between the React frontend and the Rust backend. No cloud preprocessing. No external API calls until the LLM itself. Here's how each stage works.
Stage 1: PII Scanning
Before anything touches the AI, the user's input is scanned for financial PII — Social Security numbers, credit card numbers, bank account patterns — using regex in Rust. If detected, the text is auto-redacted and replaced with placeholders. The user sees a notification, never a blocking modal. Security without friction.
"John's SSN is 123-45-6789" → "John's SSN is [SSN REDACTED]"
The redacted version is what gets sent to the LLM. The original is never logged.
Stage 2: Query Classification
The system classifies every query into one of 6 types using keyword heuristics — no LLM call needed:
| Query Type | Example | What It Retrieves | |---|---|---| | Individual | "Tell me about Sarah" | Full employee profiles (max 3) | | Comparison | "Top performers in Engineering" | Multiple full profiles (max 8) | | List | "Who's in Marketing?" | Lightweight summaries (max 30) | | Aggregate | "What's our headcount?" | Org-wide stats only, no individual data | | Attrition | "Who left this year?" | Recent terminations with full context | | General | "How should I handle this PIP?" | Balanced sample (max 5) |
The classification uses priority-based logic: explicit names always win (→ Individual), ranking keywords beat department keywords (→ Comparison over List), and so on. This means the system retrieves exactly the right depth of data for every question — no wasted tokens, no missing context.
Stage 3: Query Mention Extraction
A heuristic NER-like system extracts structured signals from the raw query:
- Employee names — Capitalized words that aren't common English/HR terms, with possessive handling ("Sarah's" → "Sarah")
- Department names — Word-boundary matching ("IT" matches only at word boundaries, not inside "wITh")
- Intent flags —
is_performance_query,is_tenure_query,is_top_performer_query,is_theme_query, and more - Tenure direction — Distinguishes "who's been here longest" vs "newest hires" vs "upcoming anniversaries"
- Theme detection — Maps semantic variants to canonical themes ("people skills" → communication, "coaching" → mentoring)
- Target field — Whether theme queries target strengths ("excels at") or opportunities ("struggles with")
All of this happens in Rust, in microseconds, with zero network calls.
Stage 4: Context Assembly
Based on the query type, the system assembles context from 5 sources:
4a. Company Context
Single SQL query: company name, state, industry, active employee count, department count.
4b. Organization Aggregates
A battery of SQL queries computed for every message (~2K characters when formatted):
- Headcount breakdown (total, active, terminated, on leave)
- Department distribution with percentages
- Performance rating distribution (exceptional / exceeds / meets / needs improvement)
- eNPS score with promoter/passive/detractor breakdown
- YTD attrition stats (voluntary/involuntary, average tenure, annualized turnover rate)
This is what makes aggregate queries accurate — the LLM doesn't guess, it reads pre-computed ground truth.
4c. Employee Context (Query-Type-Adaptive)
This is the key innovation. Different queries retrieve different depths:
- Individual: Full profile with all ratings, all eNPS scores, career summary, extracted review highlights (strengths, opportunities, themes, quotes), and trend analysis
- Comparison: Same full profiles but for multiple employees, with specialized retrieval for top/bottom performers or theme-based filtering
- List: Lightweight summary structs (~70 characters each) — just name, department, title, status
- Aggregate: No individual employee data at all — the org aggregates are sufficient
- Attrition: Recent terminations with termination reason, tenure, and performance history
Employee retrieval priority: selected employee (from UI panel) → name matches → department matches → specialized queries → random sample fallback.
4d. Cross-Conversation Memory
Hybrid search for up to 3 relevant past conversation summaries:
- First tries summary-only FTS5 search (more focused)
- Falls back to full conversation FTS if no summary matches
Summaries are 2–3 sentence AI-generated distillations of past conversations. No vector database. No embedding API calls. SQLite FTS5 — works completely offline.
4e. Document Context
FTS5 search across ingested company documents (employee handbook, policies, etc.). Returns ranked chunks with source metadata for citation. Token-budgeted to avoid crowding out other context.
Stage 5: Token Budget Management
The system operates within a 200K context window:
| Allocation | Tokens | Notes | |---|---|---| | System prompt | 20K | Persona + company + employees + docs + memory | | Conversation history | 150K | 75% of window | | Output reserved | 8K | Space for the response | | Safety buffer | 22K | Headroom |
Within the system prompt budget, each query type gets a different token allocation:
| Query Type | Employee Tokens | Memory Tokens | Total | |---|---|---|---| | Aggregate | 0 | 500 | 1,000 | | Individual | 4,000 | 1,000 | 5,000 | | Comparison | 3,000 | 500 | 3,500 | | List | 2,000 | 500 | 2,500 | | Attrition | 2,000 | 500 | 2,500 | | General | 2,000 | 1,000 | 3,000 |
When conversation history exceeds the budget, the oldest user/assistant message pairs are silently dropped. This preserves recent context without notification — a deliberate UX decision. The user shouldn't have to worry about context management.
Employee context is capped at 10 employees and 16K characters (~4K tokens) to prevent any single query from blowing the budget.
Stage 6: System Prompt Assembly
Everything gets composed into a structured system prompt:
- Persona preamble — The selected AI persona with company-specific variables
- Communication style — Persona-specific guidelines
- Company context — Name, state, employee count, departments
- Organization data — Formatted aggregates (headcount, ratings, eNPS, attrition)
- Context awareness rules — State-specific employment law guidance, employee name references, document citation preferences
- Boundaries — Guidance not legal advice, recommend counsel for litigation
- Relevant employees — Formatted profiles or summaries
- Relevant documents — Document chunks with source citations
- Relevant past conversations — Memory summaries
Stage 7: Answer Verification
Post-response, not pre-response. For aggregate queries only, the system:
- Extracts numeric claims from the AI's response using regex (headcount patterns, percentages, ratings, eNPS scores)
- Compares each claim against the pre-computed ground truth aggregates
- Returns a verification result: Verified, Partial Match, or Unverified
- The frontend displays a verification badge on the message
This catches hallucinated numbers — if the AI says "82 active employees" but the database has 85, it's flagged. No silent errors.
Background Operations
These run on a default model regardless of user selection, keeping costs predictable:
| Operation | Trigger | Model | |---|---|---| | Conversation summaries | After conversation ends | Default (Sonnet) | | Title generation | After first message | Default (Sonnet) | | Highlight extraction | After review import | Default (Sonnet) | | Employee career summaries | After highlights exist | Default (Sonnet) |
The model selector only affects interactive chat. Background operations always use the cost-efficient default.
Model-Aware Adaptation
When users switch models (e.g., Sonnet → Opus), the system adapts automatically. The selected model's context window determines the conversation token budget (75% of window). Gemini's 1M context window gets 750K for conversation; OpenAI's 128K gets 96K. Max output tokens are passed through to the provider.
What Makes This Different
- Zero-LLM-call context routing — Query classification and employee retrieval happen entirely in Rust with SQL and heuristics. No "pre-processing" LLM call.
- Query-adaptive retrieval — An aggregate question doesn't waste tokens on individual profiles. A name query doesn't waste tokens on org-wide stats.
- Ground truth verification — The LLM's numeric claims are checked against SQL-computed facts post-response.
- PII never leaves the device unredacted — Scanning happens before context building, in Rust, with no network calls.
- Cross-conversation memory without embeddings — SQLite FTS5 hybrid search. No vector database, no embedding API calls. Works offline.
- Token observability — Every query tracks retrieval metrics: employees found/included, memories found/included, token budget vs actual usage, retrieval time in milliseconds.
The result: every answer is grounded in your actual data, verified against your actual numbers, and processed entirely on your machine.
People Partner is a local-first HR knowledge platform that runs entirely on your Mac. Your employee data, policies, and documents stay on your machine — processed by on-device AI, never uploaded to the cloud. $99 one-time purchase.