Discovery & Posture Management

With sensitivity labels already deployed, you can now discover where sensitive data lives and measure your security posture with meaningful metrics.

Why Discovery Comes After Labels

Because you deployed sensitivity labels and auto-labeling in Classification, DSPM can now report:

Labeled vs. unlabeled sensitive content (adoption metric)
Under-protected data (detected but not labeled/encrypted)
Overshared sensitive content (permissions too broad)
Copilot exposure risk (sensitive data accessible to AI)

Without labels, DSPM only shows where sensitive data exists—not how well it's protected.

Sensitive Information Types & Classifiers {#sits-classifiers}

Objective

Refine your detection by adding custom Sensitive Information Types (SITs) and Trainable Classifiers. These feed into auto-labeling, DLP, and DSPM dashboards.

Why Discovery Matters for Higher Ed

Challenge	Impact	Discovery Solution
Diverse Data Types	Student records (FERPA), research data (CUI/ITAR), health info (HIPAA)	Custom SITs for each regulatory domain
Distributed Ownership	Faculty/staff manage data across departments	Content Explorer visualizes distribution
Historical Accumulation	Years of ungoverned storage	DSPM identifies risk hotspots

Prerequisites

Requirement	Details
Phase 1 Complete	Audit logging enabled, roles assigned
Role / Permission	`Compliance Administrator` or `Information Protection Admin`
Time Allowance	24-48 hours for Content Explorer initial scan

Step 0 – Review Built-in Sensitive Information Types

Goal: Understand the 300+ built-in detectors before creating custom ones.

Key SITs for Higher Education:

Built-in SIT	Regulation	Use Case
U.S. Social Security Number (SSN)	Multiple	Employee/student records
All Full Names	FERPA/HIPAA	Combined with other SITs
Credit Card Number	PCI-DSS	Bursar, payments
U.S. Passport Number	ITAR	International research
All Medical Terms and Conditions	HIPAA	Student health services

Click-Ops:

Navigate to Microsoft Purview portal → Solutions → Information Protection → Classifiers → Sensitive info types
Review the list of built-in SITs
Test a Built-in SIT:
- Click on U.S. Social Security Number (SSN)
- Click Test and enter: 123-45-6789
- Verify detection works as expected

Step 1 – Create Custom SIT for UIN

Goal: Create a classifier for your institution's unique identifier format.

Click-Ops (Purview Portal):

Go to Information Protection > Classifiers > Sensitive info types
Click + Create sensitive info type
Name: TAMU Employee/Student UIN
Description: Detects TAMU UINs with supporting keywords
Patterns: Click + Create pattern
- Confidence level: High confidence
- Primary element: Regular Expression
  - Value: \b\d{3}-?00-?\d{4}\b
- Supporting elements: Keyword list
  - Add: "UIN", "Universal ID", "Student ID", "TAMU ID"
- Character proximity: Anywhere in the document
Click Create

PowerShell:

Connect-IPPSSession

$RulePackageXML = @"
<?xml version="1.0" encoding="utf-8"?>
<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">
  <RulePack id="$(New-Guid)">
    <Version major="1" minor="0" build="0" revision="0"/>
    <Publisher id="$(New-Guid)"/>
    <Details defaultLangCode="en-us">
      <LocalizedDetails langcode="en-us">
        <PublisherName>Your University</PublisherName>
        <Name>Custom SIT Package</Name>
        <Description>Custom sensitive information types</Description>
      </LocalizedDetails>
    </Details>
  </RulePack>
  
  <Rules>
    <Entity id="$(New-Guid)" patternsProximity="300" recommendedConfidence="85" relaxProximity="true">
      <Pattern confidenceLevel="85">
        <IdMatch idRef="Regex_UIN"/>
        <Any minMatches="1">
          <Match idRef="Keyword_UIN"/>
        </Any>
      </Pattern>
    </Entity>
    
    <Regex id="Regex_UIN">\b\d{3}-?00-?\d{4}\b</Regex>
    
    <Keyword id="Keyword_UIN">
      <Group matchStyle="word">
        <Term>UIN</Term>
        <Term>Universal ID</Term>
        <Term>Student ID</Term>
      </Group>
    </Keyword>
  </Rules>
</RulePackage>
"@

$TempFile = [System.IO.Path]::GetTempFileName() + ".xml"
$RulePackageXML | Out-File -FilePath $TempFile -Encoding UTF8
New-DlpSensitiveInformationTypeRulePackage -FileData ([System.IO.File]::ReadAllBytes($TempFile))
Remove-Item $TempFile

Step 2 – Create Custom SIT for Research Grants

Goal: Detect documents related to specific high-risk grants (CUI/ITAR).

Click-Ops:

Navigate to Microsoft Purview portal → Solutions → Information Protection → Classifiers → Sensitive info types
Click + Create sensitive info type
Name: Research Grant ID
Patterns:
- Primary element: Regular Expression
  - Value: \b[A-Z]{2,5}-\d{4}-\d{4,6}\b
- Supporting elements: Keyword list
  - Add: "Grant", "Award", "Principal Investigator", "PI", "NSF", "NIH", "DOD", "DOE", "DARPA"
Click Create

Expanding for Export Control

For grants involving CUI or ITAR, add keywords like: "ITAR", "EAR", "Export Controlled", "CUI", "Controlled Unclassified", "NOFORN"

Content Explorer & Activity {#content-explorer}

With SITs configured, use Content Explorer to visualize data distribution and Activity Explorer to monitor labeling activity.

Content Explorer - Data Visualization

Goal: Visualize where sensitive data lives across M365.

Click-Ops:

Navigate to Microsoft Purview portal → Solutions → Information Protection → Content explorer
Locate your custom sensitive information types
Click on each SIT to analyze distribution:
- SharePoint: Are UINs appearing in "Public" sites?
- OneDrive: Are faculty storing student rosters personally?
- Exchange: Are UINs being emailed externally?
Export summary report for leadership

Step 4 – Configure Trainable Classifiers

Goal: Use machine learning to detect document types that patterns can't identify.

Built-in Classifiers (A5):

Classifier	Use Case
Resumes	HR, Career Services
Source Code	Research IP protection
Harassment	Communication Compliance
Threat	Communication Compliance

Creating a Custom Classifier (e.g., Academic Transcripts):

Navigate to Microsoft Purview portal → Solutions → Information Protection → Classifiers → Trainable classifiers
Click + Create trainable classifier
Name: Academic Transcript
Seed Content:
- Create SharePoint site: Trainable Classifier Seed Content
- Upload 50+ positive examples (redacted transcripts)
Training:
- Wait 24-48 hours for processing
- Review items and mark as Match or Not a match
- Provide 200+ feedback responses
Publish when accuracy reaches >80%

Training Data Privacy

Use redacted or synthetic documents for training. If using real documents, restrict the SharePoint site and delete documents after training.

Step 5 – Enable Activity Explorer

Goal: Monitor what's happening to sensitive data in real-time.

What Activity Explorer Tracks:

Activity	Example	Why It Matters
Label applied	User labeled file "Confidential - FERPA"	Tracks labeling adoption
Label changed	Downgrade from "Restricted" to "General"	Potential exfiltration
DLP policy matched	Email with SSN detected	Policy effectiveness
File copied to USB	Endpoint DLP detection	Data exfiltration

Click-Ops:

Navigate to Microsoft Purview portal → Solutions → Information Protection → Activity explorer
Set date range (default: last 30 days)
Filter by Activity: LabelApplied, DLPRuleMatch
Filter by SIT: Your custom UIN type
Export for compliance reporting

Step 6 (Advanced) – Exact Data Match for Student Records

Goal: Detect data by matching against an actual database of known values.

Why EDM for Higher Ed:

SITs detect any pattern match
EDM detects only values in your student database
Dramatically reduces false positives

High-Level Process:

Create EDM Schema: Define columns (UIN, Name, Email)
Prepare and Hash Data: Export from Banner, hash with EDM Upload Agent
Create EDM-based SIT: Link to your schema
Test and Deploy: Use in DLP for highest accuracy

Data Handling

The CSV containing student UINs is FERPA data. Generate on secured systems, transfer encrypted, delete immediately after upload.

Data Security Posture Management (DSPM) {#dspm}

Labels Make DSPM Meaningful

With labels deployed in Classification, DSPM now shows protection gaps, not just data locations. You can track:

How much sensitive data is labeled (adoption)
How much labeled data is encrypted (protection level)
How much sensitive data is overshared (access risk)

DSPM Dashboard Components

Element	What It Shows	Action
Posture Score	Data security health (0-100)	Track trend, set targets
Data at Risk	Sensitive content with inadequate protection	Prioritize for labeling
Overshared Data	Files with excessive permissions	Review and restrict
Recommendations	AI-generated remediation steps	Work priority order

Configure DSPM Dashboard

Click-Ops:

Navigate to Microsoft Purview portal → Solutions → Data Security Posture Management
DSPM will aggregate data from other tools (24-48 hours)
Configure Posture Score Goals:
- Click Settings
- Set target score (current + 10 points)
- Define priority data types (FERPA, Research)
Review Data at Risk:
- Filter by sensitivity label status
- Export for remediation planning
Analyze Oversharing:
- Review files shared with "Everyone"
- Flag for access review

Key Metrics:

Metric	Target	Remediation
% sensitive data labeled	>80%	Auto-labeling (Phase 4)
Overshared sensitive files	<5%	Access reviews (Phase 6)
Unprotected high-sensitivity	0	DLP blocking (Phase 5)

DSPM for Copilot Readiness

Goal: Assess Copilot exposure risk before enabling M365 Copilot.

Pre-Copilot Checklist:

DSPM posture score ≥ 70
<100 overshared sensitive files
"Highly Confidential" content has restricted access
Research data excluded from general access

Integration with SAM

DSPM findings feed into SharePoint Advanced Management for remediation. When DSPM identifies overshared content, use SAM's Data Access Governance for access reviews (Phase 6).

Validation Checklist

#	Item	Test Method	Success Criteria
1	Built-in SITs Available	Purview > Classifiers > SITs	300+ types visible
2	Custom SITs Created	Search SIT list	Custom UIN/Grant SITs appear
3	SIT Detection Test	Use "Test" function	High confidence match returned
4	Content Explorer Populated	Check after 24-48 hours	Data locations visible
5	DSPM Dashboard Active	View posture score	Score calculated

Next Steps

With discovery complete and DSPM showing your actual security posture, proceed to Access Control to address oversharing and prepare for Copilot.

Continue to Access Control →