Texas A&M UniversityWork In Progress

Discover sensitive data with SITs and Trainable Classifiers, then measure your security posture with DSPM.

Discovery & Posture Management

With sensitivity labels already deployed, you can now discover where sensitive data lives and measure your security posture with meaningful metrics.

Why Discovery Comes After Labels

Because you deployed sensitivity labels and auto-labeling in Classification, DSPM can now report:

  • Labeled vs. unlabeled sensitive content (adoption metric)
  • Under-protected data (detected but not labeled/encrypted)
  • Overshared sensitive content (permissions too broad)
  • Copilot exposure risk (sensitive data accessible to AI)

Without labels, DSPM only shows where sensitive data exists—not how well it's protected.


Sensitive Information Types & Classifiers {#sits-classifiers}

Objective

Refine your detection by adding custom Sensitive Information Types (SITs) and Trainable Classifiers. These feed into auto-labeling, DLP, and DSPM dashboards.

Why Discovery Matters for Higher Ed

ChallengeImpactDiscovery Solution
Diverse Data TypesStudent records (FERPA), research data (CUI/ITAR), health info (HIPAA)Custom SITs for each regulatory domain
Distributed OwnershipFaculty/staff manage data across departmentsContent Explorer visualizes distribution
Historical AccumulationYears of ungoverned storageDSPM identifies risk hotspots

Prerequisites

RequirementDetails
Phase 1 CompleteAudit logging enabled, roles assigned
Role / PermissionCompliance Administrator or Information Protection Admin
Time Allowance24-48 hours for Content Explorer initial scan

Step 0 – Review Built-in Sensitive Information Types

Goal: Understand the 300+ built-in detectors before creating custom ones.

Key SITs for Higher Education:

Built-in SITRegulationUse Case
U.S. Social Security Number (SSN)MultipleEmployee/student records
All Full NamesFERPA/HIPAACombined with other SITs
Credit Card NumberPCI-DSSBursar, payments
U.S. Passport NumberITARInternational research
All Medical Terms and ConditionsHIPAAStudent health services

Click-Ops:

  1. Navigate to Microsoft Purview portalSolutionsInformation ProtectionClassifiersSensitive info types
  2. Review the list of built-in SITs
  3. Test a Built-in SIT:
    • Click on U.S. Social Security Number (SSN)
    • Click Test and enter: 123-45-6789
    • Verify detection works as expected
Step 1 – Create Custom SIT for UIN

Goal: Create a classifier for your institution's unique identifier format.

Click-Ops (Purview Portal):

  1. Go to Information Protection > Classifiers > Sensitive info types
  2. Click + Create sensitive info type
  3. Name: TAMU Employee/Student UIN
  4. Description: Detects TAMU UINs with supporting keywords
  5. Patterns: Click + Create pattern
    • Confidence level: High confidence
    • Primary element: Regular Expression
      • Value: \b\d{3}-?00-?\d{4}\b
    • Supporting elements: Keyword list
      • Add: "UIN", "Universal ID", "Student ID", "TAMU ID"
    • Character proximity: Anywhere in the document
  6. Click Create

PowerShell:

Connect-IPPSSession

$RulePackageXML = @"
<?xml version="1.0" encoding="utf-8"?>
<RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce">
  <RulePack id="$(New-Guid)">
    <Version major="1" minor="0" build="0" revision="0"/>
    <Publisher id="$(New-Guid)"/>
    <Details defaultLangCode="en-us">
      <LocalizedDetails langcode="en-us">
        <PublisherName>Your University</PublisherName>
        <Name>Custom SIT Package</Name>
        <Description>Custom sensitive information types</Description>
      </LocalizedDetails>
    </Details>
  </RulePack>
  
  <Rules>
    <Entity id="$(New-Guid)" patternsProximity="300" recommendedConfidence="85" relaxProximity="true">
      <Pattern confidenceLevel="85">
        <IdMatch idRef="Regex_UIN"/>
        <Any minMatches="1">
          <Match idRef="Keyword_UIN"/>
        </Any>
      </Pattern>
    </Entity>
    
    <Regex id="Regex_UIN">\b\d{3}-?00-?\d{4}\b</Regex>
    
    <Keyword id="Keyword_UIN">
      <Group matchStyle="word">
        <Term>UIN</Term>
        <Term>Universal ID</Term>
        <Term>Student ID</Term>
      </Group>
    </Keyword>
  </Rules>
</RulePackage>
"@

$TempFile = [System.IO.Path]::GetTempFileName() + ".xml"
$RulePackageXML | Out-File -FilePath $TempFile -Encoding UTF8
New-DlpSensitiveInformationTypeRulePackage -FileData ([System.IO.File]::ReadAllBytes($TempFile))
Remove-Item $TempFile
Step 2 – Create Custom SIT for Research Grants

Goal: Detect documents related to specific high-risk grants (CUI/ITAR).

Click-Ops:

  1. Navigate to Microsoft Purview portalSolutionsInformation ProtectionClassifiersSensitive info types
  2. Click + Create sensitive info type
  3. Name: Research Grant ID
  4. Patterns:
    • Primary element: Regular Expression
      • Value: \b[A-Z]{2,5}-\d{4}-\d{4,6}\b
    • Supporting elements: Keyword list
      • Add: "Grant", "Award", "Principal Investigator", "PI", "NSF", "NIH", "DOD", "DOE", "DARPA"
  5. Click Create
Expanding for Export Control

For grants involving CUI or ITAR, add keywords like: "ITAR", "EAR", "Export Controlled", "CUI", "Controlled Unclassified", "NOFORN"


Content Explorer & Activity {#content-explorer}

With SITs configured, use Content Explorer to visualize data distribution and Activity Explorer to monitor labeling activity.

Content Explorer - Data Visualization

Goal: Visualize where sensitive data lives across M365.

Click-Ops:

  1. Navigate to Microsoft Purview portalSolutionsInformation ProtectionContent explorer
  2. Locate your custom sensitive information types
  3. Click on each SIT to analyze distribution:
    • SharePoint: Are UINs appearing in "Public" sites?
    • OneDrive: Are faculty storing student rosters personally?
    • Exchange: Are UINs being emailed externally?
  4. Export summary report for leadership
Step 4 – Configure Trainable Classifiers

Goal: Use machine learning to detect document types that patterns can't identify.

Built-in Classifiers (A5):

ClassifierUse Case
ResumesHR, Career Services
Source CodeResearch IP protection
HarassmentCommunication Compliance
ThreatCommunication Compliance

Creating a Custom Classifier (e.g., Academic Transcripts):

  1. Navigate to Microsoft Purview portalSolutionsInformation ProtectionClassifiersTrainable classifiers
  2. Click + Create trainable classifier
  3. Name: Academic Transcript
  4. Seed Content:
    • Create SharePoint site: Trainable Classifier Seed Content
    • Upload 50+ positive examples (redacted transcripts)
  5. Training:
    • Wait 24-48 hours for processing
    • Review items and mark as Match or Not a match
    • Provide 200+ feedback responses
  6. Publish when accuracy reaches >80%
Step 5 – Enable Activity Explorer

Goal: Monitor what's happening to sensitive data in real-time.

What Activity Explorer Tracks:

ActivityExampleWhy It Matters
Label appliedUser labeled file "Confidential - FERPA"Tracks labeling adoption
Label changedDowngrade from "Restricted" to "General"Potential exfiltration
DLP policy matchedEmail with SSN detectedPolicy effectiveness
File copied to USBEndpoint DLP detectionData exfiltration

Click-Ops:

  1. Navigate to Microsoft Purview portalSolutionsInformation ProtectionActivity explorer
  2. Set date range (default: last 30 days)
  3. Filter by Activity: LabelApplied, DLPRuleMatch
  4. Filter by SIT: Your custom UIN type
  5. Export for compliance reporting
Step 6 (Advanced) – Exact Data Match for Student Records

Goal: Detect data by matching against an actual database of known values.

Why EDM for Higher Ed:

  • SITs detect any pattern match
  • EDM detects only values in your student database
  • Dramatically reduces false positives

High-Level Process:

  1. Create EDM Schema: Define columns (UIN, Name, Email)
  2. Prepare and Hash Data: Export from Banner, hash with EDM Upload Agent
  3. Create EDM-based SIT: Link to your schema
  4. Test and Deploy: Use in DLP for highest accuracy

Data Security Posture Management (DSPM) {#dspm}

Labels Make DSPM Meaningful

With labels deployed in Classification, DSPM now shows protection gaps, not just data locations. You can track:

  • How much sensitive data is labeled (adoption)
  • How much labeled data is encrypted (protection level)
  • How much sensitive data is overshared (access risk)

DSPM Dashboard Components

ElementWhat It ShowsAction
Posture ScoreData security health (0-100)Track trend, set targets
Data at RiskSensitive content with inadequate protectionPrioritize for labeling
Overshared DataFiles with excessive permissionsReview and restrict
RecommendationsAI-generated remediation stepsWork priority order
Configure DSPM Dashboard

Click-Ops:

  1. Navigate to Microsoft Purview portalSolutionsData Security Posture Management
  2. DSPM will aggregate data from other tools (24-48 hours)
  3. Configure Posture Score Goals:
    • Click Settings
    • Set target score (current + 10 points)
    • Define priority data types (FERPA, Research)
  4. Review Data at Risk:
    • Filter by sensitivity label status
    • Export for remediation planning
  5. Analyze Oversharing:
    • Review files shared with "Everyone"
    • Flag for access review

Key Metrics:

MetricTargetRemediation
% sensitive data labeled>80%Auto-labeling (Phase 4)
Overshared sensitive files<5%Access reviews (Phase 6)
Unprotected high-sensitivity0DLP blocking (Phase 5)
DSPM for Copilot Readiness

Goal: Assess Copilot exposure risk before enabling M365 Copilot.

Pre-Copilot Checklist:

  • DSPM posture score ≥ 70
  • <100 overshared sensitive files
  • "Highly Confidential" content has restricted access
  • Research data excluded from general access
Integration with SAM

DSPM findings feed into SharePoint Advanced Management for remediation. When DSPM identifies overshared content, use SAM's Data Access Governance for access reviews (Phase 6).


Validation Checklist

#ItemTest MethodSuccess Criteria
1Built-in SITs AvailablePurview > Classifiers > SITs300+ types visible
2Custom SITs CreatedSearch SIT listCustom UIN/Grant SITs appear
3SIT Detection TestUse "Test" functionHigh confidence match returned
4Content Explorer PopulatedCheck after 24-48 hoursData locations visible
5DSPM Dashboard ActiveView posture scoreScore calculated

Next Steps

With discovery complete and DSPM showing your actual security posture, proceed to Access Control to address oversharing and prepare for Copilot.

Continue to Access Control →