Skip to main content
Skip to main content

Practical Guide to Microsoft Purview eDiscovery

Microsoft Purview eDiscovery provides a powerful, integrated toolkit for legal and compliance investigations. It delivers a closed-loop workflow: Preserve > Collect > Analyze > Produce across your entire Microsoft 365 environment, including Exchange, SharePoint, OneDrive, and Teams.

TL;DR
  • Add Case Data Sources up front. Immediately after you create a case, add known custodians and non-custodial data sources (Teams, Microsoft 365 Groups, SharePoint sites, shared mailboxes). This makes holds, collections, and review sets faster, cleaner, and less error-prone.
  • Use Premium Analytics (Near-Duplicates, Email Threading, Themes) for modern deduplication.
  • Prefer targeted, query-based eDiscovery Holds over broad mailbox-wide Litigation Holds.
  • Draft complex searches in the KeyQL Editor; run Advanced Indexing when partially indexed items are material.
  • The eDiscovery Export Tool still requires ClickOnce; Edge enables it by default in most tenants.

Background and Scope

This guide serves as a comprehensive, step-by-step playbook for Security Administrators, Compliance Officers, and Legal Personnel. It consolidates Microsoft's documentation, roadmap updates, and field-tested best practices into a single, actionable reference for running defensible, Zero-Trust investigations.

Prerequisites

Licensing Requirements

RequirementMin VersionNotes
eDiscovery (Standard)M365 E3/A3/E5/A5Included with these suites. Provides case management, holds, content searches, results preview, and data export.
eDiscovery (Premium)M365 E5/A5 or Compliance add-onAdds custodians/non-custodial sources, advanced indexing, review set analytics (near-duplicates, threading, themes), in-place redaction, and Graph API access.
Validation

Always confirm current licensing and feature availability against the Microsoft 365 Service Description and Message Center posts; capabilities shift periodically.

Roles and Permissions

Access to eDiscovery is controlled through specific role groups in the Purview portal. No other roles grant significant case access.

  • eDiscovery Manager: The standard operator role. Can create and manage cases they are a member of, place holds, run searches, and perform exports. They cannot access cases they haven't been added to.
  • eDiscovery Administrator: A global-level role. Can perform all tasks of an eDiscovery Manager but has access to all eDiscovery cases in the organization. They can also manage global eDiscovery settings.
  • Reviewer: A highly restricted role. Can only view and analyze documents within a Review set to which they have been explicitly granted access. They cannot create cases, holds, or searches.
Role Assignment

Assign users to the most restrictive role that allows them to perform their duties. Role assignments should be approved by a designated authority and documented according to organizational policy.

Core Principles for Effective eDiscovery

  • Front-load Case Data Sources: Add custodians and non-custodial data sources to the case immediately. This creates a curated universe of locations you can reuse for holds and collections, reduces misses, and speeds up scoping.
  • Targeted Preservation: Use eDiscovery case holds with specific queries whenever possible. Avoid broad Litigation Holds to minimize storage impact and focus preservation on relevant content.
  • Modern Deduplication: Leverage Analytics in eDiscovery (Premium) review sets—Near-Duplicate Detection, Email Threading, and Themes—to reduce data volume.
  • Precision Queries: Draft complex searches in the KeyQL Editor for precise, defensible search logic.
  • Index All Data: If your search report shows "partially indexed items" (for example >1%), run Advanced Indexing to reprocess content so it is fully searchable.
  • Respect the Limits: Be aware of service guardrails (documents per case, review set size, export volume).
  • Automate Repetitive Tasks: Use the Microsoft Graph eDiscovery API (beta) where appropriate.
  • Use Caution with Priority Cleanup: This feature permanently deletes data and overrides all holds; require dual approval (Security + Legal).
  • Export Tool dependency: The eDiscovery Export Tool requires ClickOnce; Edge ships with it enabled by default, but validate in your environment.

KeyQL Quick-Start (Syntax & Patterns)

Key rules

  • Property restrictions use property:value (not =).
  • Use straight quotes "like this" and avoid nested quotation marks.
  • Prefer participants: to match any of From/To/Cc/Bcc in mail and chat compliance items.
  • Use kind: to scope item types (e.g., email, im, microsoftteams).
  • Date ranges are inclusive. For email use received>=YYYY-MM-DD AND received<=YYYY-MM-DD.
  • Use full UPNs for people to avoid alias expansion surprises.

Common Building Blocks

# Email + Teams chat/meetings/calls for 1 person, keywords, date range
(
(kind:email OR kind:microsoftteams) AND
participants:"user@tamu.edu" AND
("Project Name" OR "Keyword" OR "Another Term") AND
received>=2025-01-01 AND received<=2025-08-11
)
# Email only (no Teams meetings/calls), same scope
(
kind:email AND participants:"user@tamu.edu" AND
("Keyword1" OR "Keyword2") AND
received>=2025-05-01 AND received<=2025-06-30
)
# Teams chats only (classic IM), exclude meetings/calls
(kind:im AND participants:"user@tamu.edu")
# Search specific fields
(subject:"Subject Line" AND attachmentnames:*.pptx)
Teams Scoping

kind:microsoftteams returns chats, meetings, and calls. Use kind:im to target chat conversations only. Anonymous meeting participants are not currently searchable.

Quote Handling

The KeyQL editor does not support nested quotes and will flag smart quotes. Paste plain-text queries and replace curly quotes with straight quotes.

The End-to-End eDiscovery Workflow

Step 1 - Create a Case
  1. Navigate to the Microsoft Purview portal.
  2. Go to eDiscovery (Premium).
  3. Select Create a case.
  4. Enter a unique Case name and optional Description.
  5. Configure case format/settings (for example, new Teams conversation format).
  6. (Premium) Under Settings, configure defaults for analytics (near-duplicates, themes), OCR, and text-to-ignore.
  7. Add investigators under Team members to grant access. The creator is added automatically.
  8. Save the case.

Outcome
A new, isolated eDiscovery case is created. All subsequent holds, collections, and review sets for this matter will be organized within this case.

Step 1.1 - Add Case Data Sources (Custodians & Non-custodial)

What this is
Your curated list of people and places you will investigate: custodians (mailbox + OneDrive) and non-custodial data sources (Teams, Microsoft 365 Groups, SharePoint sites, shared mailboxes, resource mailboxes, public folders if applicable).

Why it matters

  • Establishes the searchable universe before you build holds and collections.
  • Eliminates location ambiguity (aliases, renamed teams, duplicate sites).
  • Speeds up scoping—pick from known sources instead of manually typing paths each time.
  • Improves Review set quality downstream: better inputs plus cleaner analytics and deduplication.

How it works

  1. Open the case and go to Custodians. Add known users (mailbox + OneDrive). (Premium)
  2. Add Non-custodial data sources:
    • Microsoft Teams (including private/shared channels)
    • Microsoft 365 Groups (group mailbox + associated SharePoint site)
    • SharePoint sites (team sites, project sites, legacy sites)
    • Shared/Resource Mailboxes (rooms, services)
  3. Confirm each location resolves to an active directory object and has a clear, unambiguous display name.
  4. Maintain this list as facts emerge (new custodian, newly discovered site, etc.).
Quick Checklist: Add These Up Front
  • Primary custodians (mailbox + OneDrive)
  • Microsoft Teams (team + private/shared channels)
  • Microsoft 365 Groups (group mailbox + site)
  • Key SharePoint sites and legacy subsites
  • Shared/resource mailboxes; departmental shared mailboxes
Step 2 - Preserve Data with Holds

Holds prevent the modification or deletion of content relevant to an investigation by preserving a copy in a secure, hidden location.

Hold Types

Hold TypeBest ForNotes
eDiscovery Hold (Standard & Premium)Targeted, query-based preservationApplied within a case to specific mailboxes, sites, Teams/Groups. May use KeyQL to preserve only relevant content.
Custodian/Non-custodial Hold (Premium)Premium workflowApplies to locations you added in Case Data Sources (custodians and non-custodial). Unlocks custodian communications/tracking.
Litigation Hold (Exchange)Broad, indefinite preservationMailbox-level; avoid if a narrower eDiscovery hold is sufficient.

Step-by-Step: Create a Hold

  1. In your case, open Holds.
  2. Create a new hold policy.
  3. Name the hold clearly.
  4. Choose locations from your Case Data Sources pick-list (preferred) or specify ad hoc locations.
  5. (Optional) Add KeyQL conditions to scope preservation.
  6. Review and submit.
Propagation Time

Allow up to 24 hours for a hold to fully propagate.

Step 3 - Collect Relevant Data

A collection is a search against your defined data sources to find and gather potentially relevant content.

Build Your Search

  • Condition Builder for simple queries (keywords, dates, participants).
  • KeyQL Editor for complex logic. Example:
(kind:email OR kind:im) AND "Project Neptune" AND (participants:"user@domain.com" OR subject:"Confidential")

Step-by-Step: Create a Collection

  1. In the case, open Collections > New collection.
  2. Name and describe the collection.
  3. Select locations from your Case Data Sources (custodians and/or non-custodial). You can still specify ad hoc locations, but adding them to Case Data Sources first is recommended for reuse and clarity.
  4. Build your query using the Condition Builder or KeyQL Editor.
  5. Preview results to estimate volume and refine.
  6. (Premium) Commit the collection to a review set; (Standard) Save and run the search.
Partially Indexed Items

If a collection reports a significant number of partially indexed items, run Advanced Indexing (Premium) on the relevant custodians or locations, then recommit.

Step 4 - Analyze Data in Review Sets (Premium)

Once data is in a review set, use analytics to cull the dataset, identify duplicates, and organize content for efficient review. This is the modern approach to deduplication.

Analytics Features

  • Near-Duplicate Detection
  • Email Threading
  • Themes

Run Analytics

  1. Open Review sets in your Premium case and select the set.
  2. Analytics > Run document & email analytics.
  3. Enable Near-Duplicates, Email Threading, and Themes as needed.
  4. Run the job; then use the insights to filter and organize review.

Tagging & Redaction
Use tags (e.g., Responsive, Privileged) and the native redaction tool.

Step 5 - Manage Custodian Communications (Premium)

Automate legal hold notifications; track acknowledgments and reminders.

Step-by-Step

  1. Go to Communications > New communication (Issuance, Re-Issuance, Release).
  2. Select custodians (from Case Data Sources).
  3. Customize message and reminder/escalation schedules.
  4. Issue and track acknowledgments on the dashboard.
Step 6 - Export Data for Production

Export prepares the final data set for delivery.

Step-by-Step

  1. From a Review set (Premium) or Exports (Standard), start a new export.
  2. Name the export job and describe the scope.
  3. Configure options:
    • Content: all, selected, or tagged/filtered items
    • Deduplication (Premium): export only pivot items from near-duplicate groups
    • Deduplication (Standard): basic email-only, message-ID based
    • Output: PST, natives, load files, PDF
  4. Start the export.
  5. Download via the eDiscovery Export Tool.
ClickOnce Requirement

The Export Tool requires ClickOnce. Edge typically has it enabled by default; if not, enable ClickOnce or use IE mode. Break very large exports into smaller chunks to avoid timeouts.

Step 7 - Automate with the Graph API (Beta)
POST: Create eDiscovery case
POST https://graph.microsoft.com/beta/security/cases/ediscoveryCases
{
"displayName": "Breach-Investigation-0425",
"description": "Investigation into credential stuffing incident."
}
Step 8 - Close and Release the Case

Step-by-Step

  1. In Holds, release each hold.
  2. Confirm holds are off.
  3. Close the case.
  4. (Optional) Delete the case when policy permits.
Hold Release Grace Period

A 30-day delay-hold typically prevents immediate permanent deletion after a hold is removed.

Reference and Advanced Topics

Auditing Copilot for M365 Interactions

Copilot-generated content is discoverable if it resides in a location on hold. Interactions are also discoverable in the Purview Audit log.

  • How to discover: Include audit log data in your collections when relevant.
  • KeyQL for Copilot
    • operation:CopilotInteraction
    • CopilotInteractionLog.prompts:"confidential project"
    • AppUsed:Teams or AppUsed:Outlook

Understanding Service Limits

Be aware of limits that can affect large or complex cases.

LimitValue (example)Notes
Docs per Premium case40 MAcross all review sets
Load-set size1 TBPer add-to-review-set operation
Review sets per case20
Load sets per case200
Max export sizeVariesBreak large exports (< ~100 GB) into chunks
Daily export volumee.g., 5 TB/dayTenant-wide throttle
Holds per caseHighIf >100, consider PowerShell for bulk release

Troubleshooting Common Issues

SymptomPotential Root Cause(s)Suggested Fix(es)
Zero search hitsWrong locations; KeyQL error; overly tight queryVerify Case Data Sources; broaden KeyQL; confirm data exists in time range
Partially indexed itemsUnsupported/encrypted/large filesRun Advanced Indexing; remediate (remove password) and re-add
Export fails/stallsClickOnce disabled; network/timeoutsEnable ClickOnce; stabilize network; chunk exports
Location ambiguous during searchDuplicate/conflicting directory objectsUse Get-Recipient to find and fix conflicting objects
500 during exportTemporary service issue; permissions; .NETRetry; verify eDiscovery permissions; check .NET framework

Glossary (Selected)

TermDefinition
Case Data SourcesThe curated set of custodians and non-custodial data locations associated with a case
CustodianA user whose mailbox and OneDrive are within scope for a matter
Non-custodial data sourceA location (e.g., Teams, Group mailbox/site, SharePoint site) relevant to the case that is not tied to a specific custodian
KeyQLKeyword Query Language used for precise search queries
Review SetA container for collected data where analysis and tagging occurs