LexisNexis Embedded Innovation · Customer Ops Discovery

Legal Research Copilot — Delivery Overview

How the Legal Research Copilot is sequenced: the evidence behind the bet, how Therese Steiner's direction maps to phases, where each phase sits in Mark Koussa's lifecycle, the integration it depends on, and what we need from you to unlock each step. The plan is gated, not dated — phases graduate on evidence and unlocked dependencies, not a calendar.

The evidence behind the bet

The funnel above is our corpus (Dataset A). Within that core, the prior June 2025 analysis (separate dataset) found the demand concentrates — top-15 of 35 clusters ≈ 59% — shown below; the two are not stacked into one funnel.

From our corpus · 5,732 chat + 325 voice

6,057

CS interactions in our corpus (5,732 chat + 325 voice)

~83%

are legal-research in nature — the volume the copilot targets

~63%

of chats resolve with a CSR-crafted Lexis permalink — the gold-standard move

From a prior clustering analysis · separate, older dataset · 606 graded docs (Jun 2025)

59%

of graded questions fall in the top 15 of 35 clusters — concentrated, MVP-friendly

29.5%

of graded docs were transferred to another team — leakage the copilot can reduce

60.6%

of graded docs carry deadline / urgency language

Concentration (prior analysis): the top 5 clusters alone (Federal Civ Pro, Civ Pro motions, citation/source location, contracts, FRE depositions) cover ~27% of graded questions — a tight Phase-1 target. Competitive wedge: LexisNexis already runs reference attorneys; Westlaw has them but doesn't productise them, Harvey has no human layer.

Two distinct datasets — not blended. The volume figures (top row, green) come from our chat/voice corpus — 5,732 chat + 325 voice (6,057), the recent pull from Therese (working belief: ~March 2026; date being confirmed). The cluster, transfer, and urgency figures (amber) come from a separate, older clustering analysis — 606 graded docs from June 2025, run by a prior DS team, with no file overlap with our corpus. All figures are a directional hypothesis anchor, not proven scale; sufficient to test the bet, not to train against (≈200k–250k interactions estimated for training). No annualized volume or rate is claimed.

Alpha scope boundary — what we're building to first. The alpha targets one interaction: a legal-research chat that ends in an actual Lexis link delivered to the customer — the ~63% of chats that resolve with a CSR-crafted permalink. The gold-standard move: a clean, measurable outcome and a clear accept / edit / reject training signal. Chats that resolve without a delivered link are out of current alpha scope — not out of the concept. As coverage widens at beta, the scoping line shifts from link-presence to a broader resolution / outcome signal (source guidance, Shepardize, cite fixes — read from AgentWeb dispositions), and those interactions re-enter. Link-presence is the deliberate alpha gate (confirmed with Therese, 2026-06-08).

Therese's direction → how it maps to phases

From the 26 Mar Embedded Innovation collaboration · Therese's framing

Control

The CSR vets every suggestion before any customer sees it; nothing auto-sends. Who decides what reaches the customer — the safety spine of the near-term build.

Build Alpha + Co-pilot — core

Placement & adoption

The suggestion has to be unmissable in the CSR's flow, or it won't get used — where it appears. That's exactly the tension in the surface decision: a standalone EI app beside AgentWeb ships fast and EI-owned but is a separate window (real adoption risk); AgentWeb integration is genuinely in-flow but slower and cross-team.

Surface open — standalone or AgentWeb

Direct-to-customer

Long-term, the same capability could serve customers directly for routine research, freeing CS for complex work. Vehicle undecided (in-product Lexis Advance/Lexis+, or a customer-facing chatbot via NICE) — a downstream decision owned by another team.

In-product — horizon

Why "shadow" is not a phase that starts today. "Shadow mode" implies a deployed model running alongside CSRs while we watch, which presumes a built tool and a resolved way to receive chat context. There isn't one yet. The preferred Build Alpha path is a live/session-based chat feed into the EI app. If that external enablement cannot land in the alpha window, a deliberately approved CSR copy/paste fallback could support earlier learning, but it should be treated as a contingency to align with Therese rather than the ideal architecture.

Phased plan — gated, not dated

The axis below is relative, not calendar — time runs left to right so you can read the shape and pace, but it is anchored to T0 (when the preferred live-feed access path lands), not to specific months. Phase 0 eval runs now; everything after it waits on access we don't yet have or a conscious fallback decision — shown as a labeled unknown-duration zone, not an assumed date. Each phase graduates only by clearing its gate; a slipped dependency holds position and pushes everything downstream.

◆ gate — must clear to graduate · ▒ unknown-duration zone, gated on access · the axis is elapsed-from-start, not a calendar.
T0 = CX1 raw access + Agent Web bridge + real-time ingestion path in place for the preferred live-feed alpha. If external enablement slips, a manual CSR copy/paste alpha can be evaluated as a narrower learning path, but only after Therese alignment.

Concept Build · Mark Stage 3 · now

Feasibility eval

~weeks · scope-dependent

Starts when

Now — no blocking dependency. Uses the sample already in hand.

Does

Offline: model produces source + Boolean + Ask-prompt on held-out questions; senior CSRs grade. No live tool.

Also

Re-cluster our corpus under a documented method to define the target taxonomy + eval set — the prior 35-cluster analysis is year-old, different data, unknown method, so it scopes nothing.

Gate → Qualitative lift demonstrated on the top clusters of the re-run taxonomy, before any infrastructure ask.

Build Alpha · Mark Stage 4 · suggest-only

CSR Copilot (HITL)

pace set by ingestion build

Starts when

Preferred path: CX1 raw access + Agent Web bridge land, and the real-time ingestion path is solved. Fallback path: if external enablement cannot land in time, a formally approved CSR copy/paste path can support earlier alpha learning while the live/session feed moves to beta or the next release.

Does

Suggest-only, on a side-by-side surface next to LN's stack. The ideal alpha consumes live chat context; the fallback alpha would start from CSR-pasted chat/context with the limitation clearly documented. Top clusters, small CSR group. Accept/edit/reject logged.

Gate → accept-as-is clears threshold · "wrong source" below ceilingproposed · not SME-confirmed

Alpha Testing · Mark Stage 5 · co-pilot

Expanded co-pilot

~a quarter once started

Starts when

Gate 1 passes — top-cluster accuracy cleared.

Does

Expand to more / all clusters — still in chat — with jurisdiction-aware tuning, Shepardize / secondary / forms. Expansion is cluster coverage, not new channels.

Gate → accept-as-is holds as coverage widens · citation-hallucination below ceilingproposed · not SME-confirmed

Beta+ · Mark Stage 7 · customer-facing

Customer-facing agent

horizon

Starts when

Gate 2 passes + Lexis+ product sponsorship + production-grade commitments (SLA, SRE, security).

Does

Same capability reaches customers directly; sampled human review as a quality service. Vehicle undecided (in-product Lexis Advance/Lexis+ or a NICE chatbot) — a downstream decision owned by another team. Content-nudge for stickiness lives here.

Gate → production readiness — the direct-to-customer threshold, deliberately conservative.

Each ▸ gate must clear before the next phase starts · a missed dependency holds position, it does not graduate the surface

How the CSR's day changes

In one research request	Today — no copilot	Build Alpha — suggest-only	Co-pilot — expanded
Find the source	From memory / individual skill; varies CSR to CSR.	Copilot proposes the source + a one-line "why."	+ jurisdiction-aware; Shepardize, secondary sources, forms.
Build the search	Hand-built Boolean with Lexis connectors.	Copilot drafts the Boolean + a Lexis+ Ask prompt; CSR edits.	Connector tuning per jurisdiction.
Clarify intent	Ad hoc, if the CSR thinks to ask.	Copilot suggests the clarifying question.	Same, refined from feedback.
Deliver	CSR pastes the permalink to the customer.	Unchanged — CSR still sends; nothing auto-sends.	Unchanged.
Feedback loop	None — nothing is captured.	Accept / edit / reject logged as the training signal.	Continuous fine-tune from that signal.
Channel	Chat, and voice (phone + emailed result), both manual.	Chat.	Still chat — more clusters. Voice is a later channel.

Integration architecture — what the copilot plugs into

Architecture split (working direction — not yet a built contract). Embedded Innovation builds the backend AI capability and calls the existing LexisNexis search API (managed by Jim Presto) to ground and produce the suggestion; both are handed to LN's team to fold into their stack at alpha, or at beta later. For the alpha, the CSR-facing experience runs side-by-side with LN's existing stack rather than integrated into it — a standalone surface that minimizes integration overhead, so we can put it in front of CSRs and gather early feedback fast, before committing to deeper in-stack integration. The preferred alpha path consumes the live chat feed turn-by-turn. If the external integration path cannot land in the alpha window, CSR copy/paste can be documented as a fallback learning path, but that is not the architectural end-state and should be aligned with Therese before it is positioned externally.

Integration & dependencies — what each phase needs

What every phase needs, who provides it, and what's blocking — in one place. Required must land before that phase starts; Blocking is the hard gate that, if unmet, stops the phase; inherited = carried from the prior phase; — = not needed yet. This replaces a separate "asks" list — the Owner column is what we need from you. Voice is a later channel (live voice-to-text HITL + emailed delivery), not one of these near-term phases; its dependencies are scoped when chat clusters are covered.

Integration / dependency	P1	P2	P3	P4	Owner & why
Real-time ingestion (live conversation feed — preferred alpha path)	—	Blocking	in use	in use	EI + LN (NICE). Blocking for the preferred live-feed alpha: consume the chat live, stay ahead of the CSR, and test cheaply before deeper integration spend. If external enablement cannot land, a manual CSR copy/paste fallback can support a narrower learning alpha, but it needs Therese alignment and does not replace the live/session architecture.
CSR delivery surface (side-by-side for alpha; AgentWeb later)	—	side-by-side	in use	Decide	EI's call. Alpha runs a side-by-side standalone surface next to LN's stack — fast, EI-owned, minimal integration to reach CSRs. That side-by-side surface can use the live feed if available, or a clearly limited CSR copy/paste fallback if approved. Deeper AgentWeb integration is the later / beta decision.
CX1 raw transcript access (rolling window, source-truth)	sample in hand	Required	inherited	inherited	Therese → LN Murthy. The sample tests the bet; training-grade fidelity needs a real rolling window (~200k–250k).
AgentWeb disposition feed (outcome labels)	—	Required	inherited	inherited	Therese (CS-IT). Pick-list dispositions + auto-summary joined to transcript ID → resolved/escalated/repeat labels. Read-only.
Knowledge base inventory (index + per-article metadata)	—	Required	inherited	inherited	Therese + KB owner. Lets the copilot ground its rationale in known-good content, not free-generate it.
Conversational platform decision (Amelia vs Cognigy)	—	Needed	inherited	inherited	Architecture (Ollie / Lakshmi). HITL-native platform; migration cost / lock-in / learning curve. Architecture split: EI builds the API, LN owns the NICE integration.
Voice channel (live voice-to-text HITL + emailed delivery)	—	—	—	—	Later channel — out of the near-term phases. Same suggest-only HITL as chat, but harder: live voice-to-text feeds a suggestion the CSR reviews on the call, then emails the result (email is voice's delivery step, not a separate channel). Instrumenting it = capturing that emailed deliverable + tying it to the call. Scoped when chat clusters are covered. Owner: Therese & Mark.
CS Ops success metrics (AHT · CSat · FCR · hold-time)	—	Required	inherited	inherited	Therese (CS Ops). Baselines already exist — AHT 28→21 min target (−25%, the ROI lever), CSat ~78%, FCR 71% (2026-05-29). Join to the pilot cohort to measure impact. AHT delta is provable only under steady CSR adoption, so it reads in live use, not at offline eval.
Customer engagement with the delivered result	—	—	Required	inherited	Owner depends on where it lives. Did the customer click the permalink and use what we sent? A chat transcript shows the link was sent, not clicked — so this needs click/engagement telemetry from somewhere we don't own: the chat platform (NICE/AgentWeb), an instrumented-link layer, or Lexis+ analytics for in-product use. Where it lives is open. Cross-team either way; the richest outcome signal, lands after the CS-side metrics above.
Customer delivery surface (in-product vs NICE)	—	—	—	TBD	Downstream team (not EI). In-product (Lexis Advance/Lexis+) vs. customer-facing chatbot via NICE — depends on feasibility + existing architecture.