Practical Guide to Microsoft Purview eDiscovery
Microsoft Purview eDiscovery provides a powerful, integrated toolkit for legal and compliance investigations. It delivers a closed-loop workflow: Preserve > Collect > Analyze > Produce across your entire Microsoft 365 environment, including Exchange, SharePoint, OneDrive, and Teams.
- Add Case Data Sources up front. Immediately after you create a case, add known custodians and non-custodial data sources (Teams, Microsoft 365 Groups, SharePoint sites, shared mailboxes). This makes holds, collections, and review sets faster, cleaner, and less error-prone.
- Use Premium Analytics (Near-Duplicates, Email Threading, Themes) for modern deduplication.
- Prefer targeted, query-based eDiscovery Holds over broad mailbox-wide Litigation Holds.
- Draft complex searches in the KeyQL Editor; run Advanced Indexing when partially indexed items are material.
- The eDiscovery Export Tool still requires ClickOnce; Edge enables it by default in most tenants.
Background and Scope
This guide serves as a comprehensive, step-by-step playbook for Security Administrators, Compliance Officers, and Legal Personnel. It consolidates Microsoft's documentation, roadmap updates, and field-tested best practices into a single, actionable reference for running defensible, Zero-Trust investigations.
Prerequisites
Licensing Requirements
| Requirement | Min Version | Notes |
|---|---|---|
| eDiscovery (Standard) | M365 E3/A3/E5/A5 | Included with these suites. Provides case management, holds, content searches, results preview, and data export. |
| eDiscovery (Premium) | M365 E5/A5 or Compliance add-on | Adds custodians/non-custodial sources, advanced indexing, review set analytics (near-duplicates, threading, themes), in-place redaction, and Graph API access. |
Always confirm current licensing and feature availability against the Microsoft 365 Service Description and Message Center posts; capabilities shift periodically.
Roles and Permissions
Access to eDiscovery is controlled through specific role groups in the Purview portal. No other roles grant significant case access.
- eDiscovery Manager: The standard operator role. Can create and manage cases they are a member of, place holds, run searches, and perform exports. They cannot access cases they haven't been added to.
- eDiscovery Administrator: A global-level role. Can perform all tasks of an eDiscovery Manager but has access to all eDiscovery cases in the organization. They can also manage global eDiscovery settings.
- Reviewer: A highly restricted role. Can only view and analyze documents within a Review set to which they have been explicitly granted access. They cannot create cases, holds, or searches.
Assign users to the most restrictive role that allows them to perform their duties. Role assignments should be approved by a designated authority and documented according to organizational policy.
Core Principles for Effective eDiscovery
- Front-load Case Data Sources: Add custodians and non-custodial data sources to the case immediately. This creates a curated universe of locations you can reuse for holds and collections, reduces misses, and speeds up scoping.
- Targeted Preservation: Use eDiscovery case holds with specific queries whenever possible. Avoid broad Litigation Holds to minimize storage impact and focus preservation on relevant content.
- Modern Deduplication: Leverage Analytics in eDiscovery (Premium) review sets—Near-Duplicate Detection, Email Threading, and Themes—to reduce data volume.
- Precision Queries: Draft complex searches in the KeyQL Editor for precise, defensible search logic.
- Index All Data: If your search report shows "partially indexed items" (for example >1%), run Advanced Indexing to reprocess content so it is fully searchable.
- Respect the Limits: Be aware of service guardrails (documents per case, review set size, export volume).
- Automate Repetitive Tasks: Use the Microsoft Graph eDiscovery API (beta) where appropriate.
- Use Caution with Priority Cleanup: This feature permanently deletes data and overrides all holds; require dual approval (Security + Legal).
- Export Tool dependency: The eDiscovery Export Tool requires ClickOnce; Edge ships with it enabled by default, but validate in your environment.
KeyQL Quick-Start (Syntax & Patterns)
Key rules
- Property restrictions use
property:value(not=).- Use straight quotes
"like this"and avoid nested quotation marks.- Prefer
participants:to match any of From/To/Cc/Bcc in mail and chat compliance items.- Use
kind:to scope item types (e.g.,im,microsoftteams).- Date ranges are inclusive. For email use
received>=YYYY-MM-DD AND received<=YYYY-MM-DD.- Use full UPNs for people to avoid alias expansion surprises.
Common Building Blocks
# Email + Teams chat/meetings/calls for 1 person, keywords, date range
(
(kind:email OR kind:microsoftteams) AND
participants:"user@tamu.edu" AND
("Project Name" OR "Keyword" OR "Another Term") AND
received>=2025-01-01 AND received<=2025-08-11
)
# Email only (no Teams meetings/calls), same scope
(
kind:email AND participants:"user@tamu.edu" AND
("Keyword1" OR "Keyword2") AND
received>=2025-05-01 AND received<=2025-06-30
)
# Teams chats only (classic IM), exclude meetings/calls
(kind:im AND participants:"user@tamu.edu")
# Search specific fields
(subject:"Subject Line" AND attachmentnames:*.pptx)
kind:microsoftteams returns chats, meetings, and calls. Use kind:im to target chat conversations only. Anonymous meeting participants are not currently searchable.
The KeyQL editor does not support nested quotes and will flag smart quotes. Paste plain-text queries and replace curly quotes with straight quotes.
The End-to-End eDiscovery Workflow
Step 1 - Create a Case
- Navigate to the Microsoft Purview portal.
- Go to eDiscovery (Premium).
- Select Create a case.
- Enter a unique Case name and optional Description.
- Configure case format/settings (for example, new Teams conversation format).
- (Premium) Under Settings, configure defaults for analytics (near-duplicates, themes), OCR, and text-to-ignore.
- Add investigators under Team members to grant access. The creator is added automatically.
- Save the case.
Outcome
A new, isolated eDiscovery case is created. All subsequent holds, collections, and review sets for this matter will be organized within this case.
Step 1.1 - Add Case Data Sources (Custodians & Non-custodial)
What this is
Your curated list of people and places you will investigate: custodians (mailbox + OneDrive) and non-custodial data sources (Teams, Microsoft 365 Groups, SharePoint sites, shared mailboxes, resource mailboxes, public folders if applicable).
Why it matters
- Establishes the searchable universe before you build holds and collections.
- Eliminates location ambiguity (aliases, renamed teams, duplicate sites).
- Speeds up scoping—pick from known sources instead of manually typing paths each time.
- Improves Review set quality downstream: better inputs plus cleaner analytics and deduplication.
How it works
- Open the case and go to Custodians. Add known users (mailbox + OneDrive). (Premium)
- Add Non-custodial data sources:
- Microsoft Teams (including private/shared channels)
- Microsoft 365 Groups (group mailbox + associated SharePoint site)
- SharePoint sites (team sites, project sites, legacy sites)
- Shared/Resource Mailboxes (rooms, services)
- Confirm each location resolves to an active directory object and has a clear, unambiguous display name.
- Maintain this list as facts emerge (new custodian, newly discovered site, etc.).
- Primary custodians (mailbox + OneDrive)
- Microsoft Teams (team + private/shared channels)
- Microsoft 365 Groups (group mailbox + site)
- Key SharePoint sites and legacy subsites
- Shared/resource mailboxes; departmental shared mailboxes
Step 2 - Preserve Data with Holds
Holds prevent the modification or deletion of content relevant to an investigation by preserving a copy in a secure, hidden location.
Hold Types
| Hold Type | Best For | Notes |
|---|---|---|
| eDiscovery Hold (Standard & Premium) | Targeted, query-based preservation | Applied within a case to specific mailboxes, sites, Teams/Groups. May use KeyQL to preserve only relevant content. |
| Custodian/Non-custodial Hold (Premium) | Premium workflow | Applies to locations you added in Case Data Sources (custodians and non-custodial). Unlocks custodian communications/tracking. |
| Litigation Hold (Exchange) | Broad, indefinite preservation | Mailbox-level; avoid if a narrower eDiscovery hold is sufficient. |
Step-by-Step: Create a Hold
- In your case, open Holds.
- Create a new hold policy.
- Name the hold clearly.
- Choose locations from your Case Data Sources pick-list (preferred) or specify ad hoc locations.
- (Optional) Add KeyQL conditions to scope preservation.
- Review and submit.
Allow up to 24 hours for a hold to fully propagate.
Step 3 - Collect Relevant Data
A collection is a search against your defined data sources to find and gather potentially relevant content.
Build Your Search
- Condition Builder for simple queries (keywords, dates, participants).
- KeyQL Editor for complex logic. Example:
(kind:email OR kind:im) AND "Project Neptune" AND (participants:"user@domain.com" OR subject:"Confidential")
Step-by-Step: Create a Collection
- In the case, open Collections > New collection.
- Name and describe the collection.
- Select locations from your Case Data Sources (custodians and/or non-custodial). You can still specify ad hoc locations, but adding them to Case Data Sources first is recommended for reuse and clarity.
- Build your query using the Condition Builder or KeyQL Editor.
- Preview results to estimate volume and refine.
- (Premium) Commit the collection to a review set; (Standard) Save and run the search.
If a collection reports a significant number of partially indexed items, run Advanced Indexing (Premium) on the relevant custodians or locations, then recommit.
Step 4 - Analyze Data in Review Sets (Premium)
Once data is in a review set, use analytics to cull the dataset, identify duplicates, and organize content for efficient review. This is the modern approach to deduplication.
Analytics Features
- Near-Duplicate Detection
- Email Threading
- Themes
Run Analytics
- Open Review sets in your Premium case and select the set.
- Analytics > Run document & email analytics.
- Enable Near-Duplicates, Email Threading, and Themes as needed.
- Run the job; then use the insights to filter and organize review.
Tagging & Redaction
Use tags (e.g., Responsive, Privileged) and the native redaction tool.
Step 5 - Manage Custodian Communications (Premium)
Automate legal hold notifications; track acknowledgments and reminders.
Step-by-Step
- Go to Communications > New communication (Issuance, Re-Issuance, Release).
- Select custodians (from Case Data Sources).
- Customize message and reminder/escalation schedules.
- Issue and track acknowledgments on the dashboard.
Step 6 - Export Data for Production
Export prepares the final data set for delivery.
Step-by-Step
- From a Review set (Premium) or Exports (Standard), start a new export.
- Name the export job and describe the scope.
- Configure options:
- Content: all, selected, or tagged/filtered items
- Deduplication (Premium): export only pivot items from near-duplicate groups
- Deduplication (Standard): basic email-only, message-ID based
- Output: PST, natives, load files, PDF
- Start the export.
- Download via the eDiscovery Export Tool.
The Export Tool requires ClickOnce. Edge typically has it enabled by default; if not, enable ClickOnce or use IE mode. Break very large exports into smaller chunks to avoid timeouts.
Step 7 - Automate with the Graph API (Beta)
POST https://graph.microsoft.com/beta/security/cases/ediscoveryCases
{
"displayName": "Breach-Investigation-0425",
"description": "Investigation into credential stuffing incident."
}
Step 8 - Close and Release the Case
Step-by-Step
- In Holds, release each hold.
- Confirm holds are off.
- Close the case.
- (Optional) Delete the case when policy permits.
A 30-day delay-hold typically prevents immediate permanent deletion after a hold is removed.
Reference and Advanced Topics
Auditing Copilot for M365 Interactions
Copilot-generated content is discoverable if it resides in a location on hold. Interactions are also discoverable in the Purview Audit log.
- How to discover: Include audit log data in your collections when relevant.
- KeyQL for Copilot
operation:CopilotInteractionCopilotInteractionLog.prompts:"confidential project"AppUsed:TeamsorAppUsed:Outlook
Understanding Service Limits
Be aware of limits that can affect large or complex cases.
| Limit | Value (example) | Notes |
|---|---|---|
| Docs per Premium case | 40 M | Across all review sets |
| Load-set size | 1 TB | Per add-to-review-set operation |
| Review sets per case | 20 | |
| Load sets per case | 200 | |
| Max export size | Varies | Break large exports (< ~100 GB) into chunks |
| Daily export volume | e.g., 5 TB/day | Tenant-wide throttle |
| Holds per case | High | If >100, consider PowerShell for bulk release |
Troubleshooting Common Issues
| Symptom | Potential Root Cause(s) | Suggested Fix(es) |
|---|---|---|
| Zero search hits | Wrong locations; KeyQL error; overly tight query | Verify Case Data Sources; broaden KeyQL; confirm data exists in time range |
| Partially indexed items | Unsupported/encrypted/large files | Run Advanced Indexing; remediate (remove password) and re-add |
| Export fails/stalls | ClickOnce disabled; network/timeouts | Enable ClickOnce; stabilize network; chunk exports |
| Location ambiguous during search | Duplicate/conflicting directory objects | Use Get-Recipient to find and fix conflicting objects |
| 500 during export | Temporary service issue; permissions; .NET | Retry; verify eDiscovery permissions; check .NET framework |
Glossary (Selected)
| Term | Definition |
|---|---|
| Case Data Sources | The curated set of custodians and non-custodial data locations associated with a case |
| Custodian | A user whose mailbox and OneDrive are within scope for a matter |
| Non-custodial data source | A location (e.g., Teams, Group mailbox/site, SharePoint site) relevant to the case that is not tied to a specific custodian |
| KeyQL | Keyword Query Language used for precise search queries |
| Review Set | A container for collected data where analysis and tagging occurs |