A chat-first, human-in-the-loop copilot that rides alongside the CSR during a live Lexis Advance / Lexis+ legal-research interaction and, for each inbound request, suggests the right source, a tuned Boolean search, and a Lexis+ Ask prompt — each with a one-line rationale, for CSR review and never auto-sent. CSR-assist near-term; the same capability could reach customers directly long-term, by a delivery vehicle that is still open.
When a legal-research customer contacts LexisNexis support, the CSR formulates the search by hand — choosing the source, composing a Boolean query with Lexis connectors (w/5, /p, pre/2), and returning a permalink. This "gold-standard" move is supervisable and repeatable: the CSR does intent capture → clarifying question → connector composition → permalink hand-off, each step observable. But it is entirely manual, depends on individual CSR skill, and the search result the AI product was meant to deliver still routes through a person.
In a sample of 6,057 support interactions analysed by Data Science, roughly 83% were legal-research in nature and ~63% of chats resolved with a CSR-crafted permalink, with 1,446 hand-built Boolean searches in the chat archive. These figures are a directional hypothesis anchor, not proven scale or rate — no annualized volume is claimed here.
Therese Steiner identified this as her #1 priority for 2026, with no comparable backup if it struggles. The competitive wedge: LexisNexis already operates reference attorneys (a capability Thomson Reuters/Westlaw has but doesn't productise, and Harvey lacks entirely). The training signal is the byproduct of the CS operation already running.
The bet & how we'll know
The bet: we believe a copilot can produce source + Boolean + Lexis+ Ask-prompt suggestions good enough that CSRs accept them as-is most of the time, without slowing the interaction. We'll know by measuring accept-as-is rate and wrong-source rate on graded interactions — offline first, then in live suggest-only use. If accept-as-is doesn't clear the bar (or citation-hallucination stays above its ceiling), we stop before any integration investment.
Near-term metric (measurable now): accept-as-is rate + wrong-source rate, graded by senior CSRs. Guardrail: zero hallucinated legal answers; citation-hallucination below ceiling. Thresholds proposed — not SME-confirmed.
Baseline (CS Ops, Therese, 2026-05-29): today, unaided — Legal Chat AHT 28 min, CSat ~78%, FCR 71%; no copilot, quality varies by CSR. The offline eval sets the first model data point (accept-as-is on held-out graded questions); the AHT/CSat/FCR baselines anchor the live-use outcome read.
Outcome metric (the ROI bet, provable in live use): Legal Chat AHT 28 → 21 min (−25%, CS Ops' target) without degrading CSat (78%) or FCR (71%). Measurable once CSRs adopt and use the copilot steadily in live alpha — not at offline eval, and not deferred to customer-facing. Therese's own caveat: the AHT win depends on build plus adoption + steady CSR use, and a suggest-only assist can add review time before it removes any. If AHT doesn't move toward target under steady use within the alpha window, the HITL ROI thesis is wrong and we pivot.
Riskiest assumption: real-time ingestion — consuming the live conversation fast enough to stay ahead of the CSR. If it's infeasible, the live copilot doesn't exist. The offline eval proves suggestion quality cheaply; live-ingestion feasibility must be tested early, before integration spend.
Details
The copilot is chat-first, CSR-assist, and human-in-the-loop, scoped to Lexis Advance and Lexis+ (not Protege — a different product with a different support pattern). For each inbound research request it proposes a source, a tuned Boolean, and a Lexis+ Ask prompt, each with a short rationale. Suggestions never auto-send; the CSR copies, edits, or rejects, and that action is logged as the feedback signal. The copilot starts with chat. Voice is a harder, later channel: a phone call transcribed live so the copilot suggests a search for the CSR to review on the call, with the result emailed to the customer afterward — email is voice's delivery step, not a separate channel. Voice stacks live voice-to-text on top of the real-time-ingestion problem, so it comes after chat.
Scope is deliberately bounded to product help and search formulation — never legal answers. CSRs cannot practice law, and a hallucinated legal answer is a catastrophic risk; the copilot helps the CSR find and formulate, it does not advise.
How it works — one interaction, turn by turn
A single research request, start to finish. The copilot's suggestions appear in the CSR's side panel; the CSR accepts, edits, or rejects — and that choice is logged as the training signal for the next refresh. Illustrative example.
Target Audience
Initial audience — the Legal Research CSR team (~50 CSRs and supervisors). They are the day-one users of the suggest-only copilot.
Ultimate audience — end customers (attorneys, paralegals, researchers). Long-term, the same capability could reach them directly — but the delivery vehicle is open (in-product Lexis Advance/Lexis+, or a customer-facing chatbot via a unified NICE deployment), dependent on feasibility and existing architecture, and a downstream decision owned by another team, outside this build's scope.
What Customer Operations (Therese Steiner) is betting on is a 25% Legal Chat AHT reduction (28 → 21 min) without degrading CSat (78%) or FCR (71%). Whether a human-in-the-loop copilot delivers that is the hypothesis the pilot is built to test — not a foregone conclusion: Therese's caveat is the AHT win depends on build plus adoption and steady CSR use, and a suggest-only tool can add review time before it removes any. So near-term success is measured on suggestion quality and acceptance; the AHT/CSat/FCR outcome is read once the tool is in steady live use.
Proposed Solution
A gated capability progression, human-in-the-loop first. Each phase ends at a gate, not a deadline — if a gate misses, the surface holds position or pivots rather than graduating. The copilot encodes existing CSR behavior; it does not invent a greenfield model. Real-time ingestion of the live conversation is the central technical unknown, so the early phases stay offline and suggest-only, and the live, latency-sensitive integration is scoped to later stages.
These four phases are the project's spine. The same four drive the delivery-overview (there, sequenced with gates and dependencies), and each is a stage in Mark Koussa's lifecycle: Phase 1 Concept Build (Stage 3) → Phase 2 Build Alpha (Stage 4) → Phase 3 Alpha Testing (Stage 5) → Phase 4 Beta+ (Stage 7).
Phase 1 · Concept Build · now
Feasibility evaluation (offline)
On the existing transcript sample, prove the model produces a good source + Boolean + Ask-prompt on held-out questions, graded by senior CSRs. No live tool and no CSR usage yet — this is an offline accuracy proof. Also re-cluster our corpus under a documented method to set the target taxonomy + eval set (the prior 35-cluster analysis is year-old, different data, and unknown method). Scope and method are defined first; duration follows from scope, not a fixed window.
Phase 2 · Build Alpha · suggest-only
CSR Copilot (HITL)
Build the copilot and deploy it suggest-only — every suggestion logged, none auto-sent. How it reaches CSRs is open: the fast start is a standalone EI-hosted app the CSR runs beside Oracle AgentWeb (EI-owned, no cross-team integration); integrating into AgentWeb is the alternative (needs another team, slower). This is where the tool is observed in real CSR use ("shadowed"). Starts narrow — a subset of the top question clusters, in chat, with a small set of CSRs — because we can't hit target accuracy on every cluster out of the gate.
Expand to more (eventually all) of the top clusters — still in chat — adding jurisdiction-aware connector tuning and Shepardize / secondary-source / form recommendations. The expansion is cluster coverage, not new channels.
Gate (proposed — not SME-confirmed): accept-as-is rate holds as cluster coverage widens · citation-hallucination below ceiling.
Phase 4 · Beta+ · customer-facing
Customer-facing agent (horizon)
Expose the same capability directly to customers, with sampled human review as a quality service. Vehicle undecided — in-product (Lexis Advance/Lexis+) or a customer-facing chatbot via NICE — dependent on feasibility + existing architecture, and a downstream decision owned by another team, outside this build's scope. Long-term horizon. A directional extension discussed with Therese — nudging customers toward LexisNexis proprietary content to deepen stickiness — would live here, distinct from the deferred upsell-detection workstream.
Data Sources
Our chat/voice corpus In hand
5,732 chat + 325 voice interactions (6,057) — the recent pull from Therese (working belief: ~March 2026; date being confirmed), plus 1,446 natural-language → Boolean pairs in the chat archive. The volume substrate and initial training seed; source of the research-rate and permalink-rate figures. Sufficient to test the bet, not to train against.
Prior clustering analysis — separate, older dataset In hand
606 graded docs + 35 question clusters from a June 2025 run by a prior DS team (the QA workbooks). Source of the cluster, transfer-rate, and urgency figures. Zero file overlap with our corpus above — the two must not be blended, and which dataset should anchor the eval/training set is open (see Risks).
CX1 raw transcript access (12-month rolling) Required · pending
Source-truth transcripts (not the AWS-summarised layer) for a rolling window — needed for training-grade volume (estimated ~200k–250k interactions). Owner path: Therese → LN Murthy. Blocks Build Alpha.
Agent Web dispositions + auto-summary Required · pending
Pick-list disposition codes and auto-summary text joined to the transcript ID — the source of outcome labels (resolved / escalated / repeat-caller) without manual review. Read-only view. Blocks Build Alpha outcome labelling.
Customer engagement telemetry Needed for outcomes · home open
Did the customer click the permalink and use what we sent? The transcript shows it was sent, not clicked — so this needs engagement telemetry from a system we don't own (chat platform, instrumented-link layer, or Lexis+ for in-product use); where it lives is an open question. The richest outcome signal, layered on top of the CS-side metrics, not a substitute.
Knowledge base inventory Needed · pending
KB index + per-article metadata (practice area, source type) so the copilot grounds its rationale in known-good content rather than free-generating it.
Prior data-science POC Unconfirmed
Therese referenced a prior data-science POC (2026-04-23) that "showed promise but isn't done." Owner, scope, and reusability are unconfirmed — no verified attribution. To be located and reviewed.
Scope boundary
Search formulation, not legal answers. Every data source and capability above serves the copilot's job of helping the CSR find and formulate a search and hand off a permalink. None of it extends to generating legal advice or substantive answers — that boundary is load-bearing for the whole concept and is what keeps the human-in-the-loop model defensible.
Risks, Gaps, and Unknowns
Real-time conversation ingestion is the biggest unknown. A live copilot must consume the conversation as it happens and stay ahead of the CSR, and that real-time path is unproven. Whether NICE can push each chat turn/session payload (vs. a one-time CSR-initiated request), and the maximum acceptable response latency, are unresolved. This sits on the critical path to Build Alpha.
Voice is a measurement black box. In the analysed sample, voice transcripts contained zero Lexis permalinks — voice CSRs deliver by email and print, so the search, result list, and outcome live outside the transcript. This is a measurement gap, not a service gap, but it means voice can't be learned from or evaluated until instrumentation lands. Voice is also a harder copilot to build than chat — it needs live voice-to-text feeding a suggestion the CSR reviews on the call — so it's a later channel, after chat clusters are covered, not a near-term phase.
The sample is a hypothesis anchor, not training-grade. 6,057 interactions is a strong anchor for testing the bet but too small to train against without seasonality leak; Deepak estimates ~200k–250k interactions are needed. No rate or annualized-volume claim is made from this sample. Separately, the cluster and transfer-rate evidence comes from an older June 2025 dataset (606 docs) with no file overlap with this corpus — the two are kept distinct, not blended.
Hallucinated legal advice is catastrophic. Scope must stay product-help + search formulation. Any drift into substantive legal answers exposes LexisNexis to the exact risk the human-in-the-loop model exists to prevent. Guardrails and the never-auto-send rule are non-negotiable.
Protege migration may shrink the addressable base. As customers migrate to Protege (which handles prompting natively), the CSR-assist use case shrinks. The feasibility work should test the migration curve and the residual non-Protege user base that justifies the build (flagged by Jack Diamond, 2026-05-01) before heavy integration investment.
Conversational platform decision is unresolved. Amelia (current, NICE) vs Cognigy (proposed, more natively agentic with HITL built in) — pros/cons, migration cost, vendor lock-in, and team learning curve are open and need a decision framework. The architecture split (EI builds the backend API service; LN owns the NICE integration) is a confirmed working assumption, not yet a built contract.
Gate thresholds are proposed, not confirmed. The accept-as-is and hallucination-ceiling figures that govern phase graduation originate as a proposal and have not been validated with the CSR SMEs or QA owners who would enforce them. They must be confirmed before they gate anything.