Discovery & Posture Management
With sensitivity labels already deployed, you can now discover where sensitive data lives and measure your security posture with meaningful metrics.
Because you deployed sensitivity labels and auto-labeling in Classification, DSPM can now report:
- Labeled vs. unlabeled sensitive content (adoption metric)
- Under-protected data (detected but not labeled/encrypted)
- Overshared sensitive content (permissions too broad)
- Copilot exposure risk (sensitive data accessible to AI)
Without labels, DSPM only shows where sensitive data exists—not how well it's protected.
Sensitive Information Types & Classifiers {#sits-classifiers}
Refine your detection by adding custom Sensitive Information Types (SITs) and Trainable Classifiers. These feed into auto-labeling, DLP, and DSPM dashboards.
Why Discovery Matters for Higher Ed
| Challenge | Impact | Discovery Solution |
|---|---|---|
| Diverse Data Types | Student records (FERPA), research data (CUI/ITAR), health info (HIPAA) | Custom SITs for each regulatory domain |
| Distributed Ownership | Faculty/staff manage data across departments | Content Explorer visualizes distribution |
| Historical Accumulation | Years of ungoverned storage | DSPM identifies risk hotspots |
Prerequisites
| Requirement | Details |
|---|---|
| Phase 1 Complete | Audit logging enabled, roles assigned |
| Role / Permission | Compliance Administrator or Information Protection Admin |
| Time Allowance | 24-48 hours for Content Explorer initial scan |
Step 0 – Review Built-in Sensitive Information Types
Goal: Understand the 300+ built-in detectors before creating custom ones.
Key SITs for Higher Education:
| Built-in SIT | Regulation | Use Case |
|---|---|---|
| U.S. Social Security Number (SSN) | Multiple | Employee/student records |
| All Full Names | FERPA/HIPAA | Combined with other SITs |
| Credit Card Number | PCI-DSS | Bursar, payments |
| U.S. Passport Number | ITAR | International research |
| All Medical Terms and Conditions | HIPAA | Student health services |
Click-Ops:
- Navigate to Microsoft Purview portal → Solutions → Information Protection → Classifiers → Sensitive info types
- Review the list of built-in SITs
- Test a Built-in SIT:
- Click on U.S. Social Security Number (SSN)
- Click Test and enter:
123-45-6789 - Verify detection works as expected
Step 1 – Create Custom SIT for UIN
Goal: Create a classifier for your institution's unique identifier format.
Click-Ops (Purview Portal):
- Go to Information Protection > Classifiers > Sensitive info types
- Click + Create sensitive info type
- Name:
TAMU Employee/Student UIN - Description: Detects TAMU UINs with supporting keywords
- Patterns: Click + Create pattern
- Confidence level: High confidence
- Primary element: Regular Expression
- Value:
\b\d{3}-?00-?\d{4}\b
- Value:
- Supporting elements: Keyword list
- Add: "UIN", "Universal ID", "Student ID", "TAMU ID"
- Character proximity: Anywhere in the document
- Click Create
PowerShell:
Connect-IPPSSession
$RulePackageXML = @"
<?xml version="1.0" encoding="utf-8"?>
<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">
<RulePack id="$(New-Guid)">
<Version major="1" minor="0" build="0" revision="0"/>
<Publisher id="$(New-Guid)"/>
<Details defaultLangCode="en-us">
<LocalizedDetails langcode="en-us">
<PublisherName>Your University</PublisherName>
<Name>Custom SIT Package</Name>
<Description>Custom sensitive information types</Description>
</LocalizedDetails>
</Details>
</RulePack>
<Rules>
<Entity id="$(New-Guid)" patternsProximity="300" recommendedConfidence="85" relaxProximity="true">
<Pattern confidenceLevel="85">
<IdMatch idRef="Regex_UIN"/>
<Any minMatches="1">
<Match idRef="Keyword_UIN"/>
</Any>
</Pattern>
</Entity>
<Regex id="Regex_UIN">\b\d{3}-?00-?\d{4}\b</Regex>
<Keyword id="Keyword_UIN">
<Group matchStyle="word">
<Term>UIN</Term>
<Term>Universal ID</Term>
<Term>Student ID</Term>
</Group>
</Keyword>
</Rules>
</RulePackage>
"@
$TempFile = [System.IO.Path]::GetTempFileName() + ".xml"
$RulePackageXML | Out-File -FilePath $TempFile -Encoding UTF8
New-DlpSensitiveInformationTypeRulePackage -FileData ([System.IO.File]::ReadAllBytes($TempFile))
Remove-Item $TempFile
Step 2 – Create Custom SIT for Research Grants
Goal: Detect documents related to specific high-risk grants (CUI/ITAR).
Click-Ops:
- Navigate to Microsoft Purview portal → Solutions → Information Protection → Classifiers → Sensitive info types
- Click + Create sensitive info type
- Name:
Research Grant ID - Patterns:
- Primary element: Regular Expression
- Value:
\b[A-Z]{2,5}-\d{4}-\d{4,6}\b
- Value:
- Supporting elements: Keyword list
- Add: "Grant", "Award", "Principal Investigator", "PI", "NSF", "NIH", "DOD", "DOE", "DARPA"
- Primary element: Regular Expression
- Click Create
For grants involving CUI or ITAR, add keywords like: "ITAR", "EAR", "Export Controlled", "CUI", "Controlled Unclassified", "NOFORN"
Content Explorer & Activity {#content-explorer}
With SITs configured, use Content Explorer to visualize data distribution and Activity Explorer to monitor labeling activity.
Content Explorer - Data Visualization
Goal: Visualize where sensitive data lives across M365.
Click-Ops:
- Navigate to Microsoft Purview portal → Solutions → Information Protection → Content explorer
- Locate your custom sensitive information types
- Click on each SIT to analyze distribution:
- SharePoint: Are UINs appearing in "Public" sites?
- OneDrive: Are faculty storing student rosters personally?
- Exchange: Are UINs being emailed externally?
- Export summary report for leadership
Step 4 – Configure Trainable Classifiers
Goal: Use machine learning to detect document types that patterns can't identify.
Built-in Classifiers (A5):
| Classifier | Use Case |
|---|---|
| Resumes | HR, Career Services |
| Source Code | Research IP protection |
| Harassment | Communication Compliance |
| Threat | Communication Compliance |
Creating a Custom Classifier (e.g., Academic Transcripts):
- Navigate to Microsoft Purview portal → Solutions → Information Protection → Classifiers → Trainable classifiers
- Click + Create trainable classifier
- Name:
Academic Transcript - Seed Content:
- Create SharePoint site:
Trainable Classifier Seed Content - Upload 50+ positive examples (redacted transcripts)
- Create SharePoint site:
- Training:
- Wait 24-48 hours for processing
- Review items and mark as Match or Not a match
- Provide 200+ feedback responses
- Publish when accuracy reaches >80%
Use redacted or synthetic documents for training. If using real documents, restrict the SharePoint site and delete documents after training.
Step 5 – Enable Activity Explorer
Goal: Monitor what's happening to sensitive data in real-time.
What Activity Explorer Tracks:
| Activity | Example | Why It Matters |
|---|---|---|
| Label applied | User labeled file "Confidential - FERPA" | Tracks labeling adoption |
| Label changed | Downgrade from "Restricted" to "General" | Potential exfiltration |
| DLP policy matched | Email with SSN detected | Policy effectiveness |
| File copied to USB | Endpoint DLP detection | Data exfiltration |
Click-Ops:
- Navigate to Microsoft Purview portal → Solutions → Information Protection → Activity explorer
- Set date range (default: last 30 days)
- Filter by Activity:
LabelApplied,DLPRuleMatch - Filter by SIT: Your custom UIN type
- Export for compliance reporting
Step 6 (Advanced) – Exact Data Match for Student Records
Goal: Detect data by matching against an actual database of known values.
Why EDM for Higher Ed:
- SITs detect any pattern match
- EDM detects only values in your student database
- Dramatically reduces false positives
High-Level Process:
- Create EDM Schema: Define columns (UIN, Name, Email)
- Prepare and Hash Data: Export from Banner, hash with EDM Upload Agent
- Create EDM-based SIT: Link to your schema
- Test and Deploy: Use in DLP for highest accuracy
The CSV containing student UINs is FERPA data. Generate on secured systems, transfer encrypted, delete immediately after upload.
Data Security Posture Management (DSPM) {#dspm}
With labels deployed in Classification, DSPM now shows protection gaps, not just data locations. You can track:
- How much sensitive data is labeled (adoption)
- How much labeled data is encrypted (protection level)
- How much sensitive data is overshared (access risk)
DSPM Dashboard Components
| Element | What It Shows | Action |
|---|---|---|
| Posture Score | Data security health (0-100) | Track trend, set targets |
| Data at Risk | Sensitive content with inadequate protection | Prioritize for labeling |
| Overshared Data | Files with excessive permissions | Review and restrict |
| Recommendations | AI-generated remediation steps | Work priority order |
Configure DSPM Dashboard
Click-Ops:
- Navigate to Microsoft Purview portal → Solutions → Data Security Posture Management
- DSPM will aggregate data from other tools (24-48 hours)
- Configure Posture Score Goals:
- Click Settings
- Set target score (current + 10 points)
- Define priority data types (FERPA, Research)
- Review Data at Risk:
- Filter by sensitivity label status
- Export for remediation planning
- Analyze Oversharing:
- Review files shared with "Everyone"
- Flag for access review
Key Metrics:
| Metric | Target | Remediation |
|---|---|---|
| % sensitive data labeled | >80% | Auto-labeling (Phase 4) |
| Overshared sensitive files | <5% | Access reviews (Phase 6) |
| Unprotected high-sensitivity | 0 | DLP blocking (Phase 5) |
DSPM for Copilot Readiness
Goal: Assess Copilot exposure risk before enabling M365 Copilot.
Pre-Copilot Checklist:
- DSPM posture score ≥ 70
- <100 overshared sensitive files
- "Highly Confidential" content has restricted access
- Research data excluded from general access
DSPM findings feed into SharePoint Advanced Management for remediation. When DSPM identifies overshared content, use SAM's Data Access Governance for access reviews (Phase 6).
Validation Checklist
| # | Item | Test Method | Success Criteria |
|---|---|---|---|
| 1 | Built-in SITs Available | Purview > Classifiers > SITs | 300+ types visible |
| 2 | Custom SITs Created | Search SIT list | Custom UIN/Grant SITs appear |
| 3 | SIT Detection Test | Use "Test" function | High confidence match returned |
| 4 | Content Explorer Populated | Check after 24-48 hours | Data locations visible |
| 5 | DSPM Dashboard Active | View posture score | Score calculated |
Next Steps
With discovery complete and DSPM showing your actual security posture, proceed to Access Control to address oversharing and prepare for Copilot.