# How People Partner Manages Context

> A technical deep-dive into the 7-stage pipeline that turns every question into an accurate, grounded HR answer — from PII scanning to answer verification.

*Canonical URL: https://peoplepartner.io/blog/how-people-partner-manages-context*
*Published: 2026-03-03*
*Keywords: ai context pipeline, hr ai architecture, local ai processing, token management, pii scanning, ai answer verification*

## The Pipeline (Every Message)

Every message you send goes through 7 stages before you see a response:

**User Input → PII Scan → Query Classification → Context Assembly → Token Budget → LLM → Answer Verification → Response**

All of it orchestrated between the React frontend and the Rust backend. No cloud preprocessing. No external API calls until the LLM itself. Here's how each stage works.

---

## Stage 1: PII Scanning

Before anything touches the AI, the user's input is scanned for financial PII — Social Security numbers, credit card numbers, bank account patterns — using regex in Rust. If detected, the text is auto-redacted and replaced with placeholders. The user sees a notification, never a blocking modal. Security without friction.

`"John's SSN is 123-45-6789"` → `"John's SSN is [SSN REDACTED]"`

The redacted version is what gets sent to the LLM. The original is never logged.

## Stage 2: Query Classification

The system classifies every query into one of 6 types using keyword heuristics — no LLM call needed:

| Query Type | Example | What It Retrieves |
|---|---|---|
| Individual | "Tell me about Sarah" | Full employee profiles (max 3) |
| Comparison | "Top performers in Engineering" | Multiple full profiles (max 8) |
| List | "Who's in Marketing?" | Lightweight summaries (max 30) |
| Aggregate | "What's our headcount?" | Org-wide stats only, no individual data |
| Attrition | "Who left this year?" | Recent terminations with full context |
| General | "How should I handle this PIP?" | Balanced sample (max 5) |

The classification uses priority-based logic: explicit names always win (→ Individual), ranking keywords beat department keywords (→ Comparison over List), and so on. This means the system retrieves exactly the right depth of data for every question — no wasted tokens, no missing context.

## Stage 3: Query Mention Extraction

A heuristic NER-like system extracts structured signals from the raw query:

- **Employee names** — Capitalized words that aren't common English/HR terms, with possessive handling ("Sarah's" → "Sarah")
- **Department names** — Word-boundary matching ("IT" matches only at word boundaries, not inside "wITh")
- **Intent flags** — `is_performance_query`, `is_tenure_query`, `is_top_performer_query`, `is_theme_query`, and more
- **Tenure direction** — Distinguishes "who's been here longest" vs "newest hires" vs "upcoming anniversaries"
- **Theme detection** — Maps semantic variants to canonical themes ("people skills" → communication, "coaching" → mentoring)
- **Target field** — Whether theme queries target strengths ("excels at") or opportunities ("struggles with")

All of this happens in Rust, in microseconds, with zero network calls.

## Stage 4: Context Assembly

Based on the query type, the system assembles context from 5 sources:

### 4a. Company Context

Single SQL query: company name, state, industry, active employee count, department count.

### 4b. Organization Aggregates

A battery of SQL queries computed for every message (~2K characters when formatted):

- Headcount breakdown (total, active, terminated, on leave)
- Department distribution with percentages
- Performance rating distribution (exceptional / exceeds / meets / needs improvement)
- eNPS score with promoter/passive/detractor breakdown
- YTD attrition stats (voluntary/involuntary, average tenure, annualized turnover rate)

This is what makes aggregate queries accurate — the LLM doesn't guess, it reads pre-computed ground truth.

### 4c. Employee Context (Query-Type-Adaptive)

This is the key innovation. Different queries retrieve different depths:

- **Individual:** Full profile with all ratings, all eNPS scores, career summary, extracted review highlights (strengths, opportunities, themes, quotes), and trend analysis
- **Comparison:** Same full profiles but for multiple employees, with specialized retrieval for top/bottom performers or theme-based filtering
- **List:** Lightweight summary structs (~70 characters each) — just name, department, title, status
- **Aggregate:** No individual employee data at all — the org aggregates are sufficient
- **Attrition:** Recent terminations with termination reason, tenure, and performance history

Employee retrieval priority: selected employee (from UI panel) → name matches → department matches → specialized queries → random sample fallback.

### 4d. Cross-Conversation Memory

Hybrid search for up to 3 relevant past conversation summaries:

1. First tries summary-only FTS5 search (more focused)
2. Falls back to full conversation FTS if no summary matches

Summaries are 2–3 sentence AI-generated distillations of past conversations. No vector database. No embedding API calls. SQLite FTS5 — works completely offline.

### 4e. Document Context

FTS5 search across ingested company documents (employee handbook, policies, etc.). Returns ranked chunks with source metadata for citation. Token-budgeted to avoid crowding out other context.

## Stage 5: Token Budget Management

The system operates within a 200K context window:

| Allocation | Tokens | Notes |
|---|---|---|
| System prompt | 20K | Persona + company + employees + docs + memory |
| Conversation history | 150K | 75% of window |
| Output reserved | 8K | Space for the response |
| Safety buffer | 22K | Headroom |

Within the system prompt budget, each query type gets a different token allocation:

| Query Type | Employee Tokens | Memory Tokens | Total |
|---|---|---|---|
| Aggregate | 0 | 500 | 1,000 |
| Individual | 4,000 | 1,000 | 5,000 |
| Comparison | 3,000 | 500 | 3,500 |
| List | 2,000 | 500 | 2,500 |
| Attrition | 2,000 | 500 | 2,500 |
| General | 2,000 | 1,000 | 3,000 |

When conversation history exceeds the budget, the oldest user/assistant message pairs are silently dropped. This preserves recent context without notification — a deliberate UX decision. The user shouldn't have to worry about context management.

Employee context is capped at 10 employees and 16K characters (~4K tokens) to prevent any single query from blowing the budget.

## Stage 6: System Prompt Assembly

Everything gets composed into a structured system prompt:

- **Persona preamble** — The selected AI persona with company-specific variables
- **Communication style** — Persona-specific guidelines
- **Company context** — Name, state, employee count, departments
- **Organization data** — Formatted aggregates (headcount, ratings, eNPS, attrition)
- **Context awareness rules** — State-specific employment law guidance, employee name references, document citation preferences
- **Boundaries** — Guidance not legal advice, recommend counsel for litigation
- **Relevant employees** — Formatted profiles or summaries
- **Relevant documents** — Document chunks with source citations
- **Relevant past conversations** — Memory summaries

## Stage 7: Answer Verification

Post-response, not pre-response. For aggregate queries only, the system:

1. Extracts numeric claims from the AI's response using regex (headcount patterns, percentages, ratings, eNPS scores)
2. Compares each claim against the pre-computed ground truth aggregates
3. Returns a verification result: **Verified**, **Partial Match**, or **Unverified**
4. The frontend displays a verification badge on the message

This catches hallucinated numbers — if the AI says "82 active employees" but the database has 85, it's flagged. No silent errors.

## Background Operations

These run on a default model regardless of user selection, keeping costs predictable:

| Operation | Trigger | Model |
|---|---|---|
| Conversation summaries | After conversation ends | Default (Sonnet) |
| Title generation | After first message | Default (Sonnet) |
| Highlight extraction | After review import | Default (Sonnet) |
| Employee career summaries | After highlights exist | Default (Sonnet) |

The model selector only affects interactive chat. Background operations always use the cost-efficient default.

## Model-Aware Adaptation

When users switch models (e.g., Sonnet → Opus), the system adapts automatically. The selected model's context window determines the conversation token budget (75% of window). Gemini's 1M context window gets 750K for conversation; OpenAI's 128K gets 96K. Max output tokens are passed through to the provider.

## What Makes This Different

1. **Zero-LLM-call context routing** — Query classification and employee retrieval happen entirely in Rust with SQL and heuristics. No "pre-processing" LLM call.
2. **Query-adaptive retrieval** — An aggregate question doesn't waste tokens on individual profiles. A name query doesn't waste tokens on org-wide stats.
3. **Ground truth verification** — The LLM's numeric claims are checked against SQL-computed facts post-response.
4. **PII never leaves the device unredacted** — Scanning happens before context building, in Rust, with no network calls.
5. **Cross-conversation memory without embeddings** — SQLite FTS5 hybrid search. No vector database, no embedding API calls. Works offline.
6. **Token observability** — Every query tracks retrieval metrics: employees found/included, memories found/included, token budget vs actual usage, retrieval time in milliseconds.

The result: every answer is grounded in your actual data, verified against your actual numbers, and processed entirely on your machine.

---

[People Partner](https://peoplepartner.io) is a local-first HR knowledge platform that runs entirely on your Mac. Your employee data, policies, and documents stay on your machine — processed by on-device AI, never uploaded to the cloud. $99 one-time purchase.