Use Context-Aware Data Classification for a Robust Data Security Posture

Nov 21, 2023
9 minutes
... views

Enterprises often interpret a data security mandate as identifying configuration issues or vulnerabilities in their data infrastructure. To improve security posture, though, the scope of data security activities must protect sensitive data assets, such as customer information, trade secrets, financial information or patents. DSPM-based data classification offers a granular view that helps define adequate policies for the type, context and sensitivity of the data.

Typical labeling practices (public, internal, confidential, secret) fail to capture the differences and nuances between types of data, such as the difference between R&D documents and customer payment information. In this blog post, we’ll present a set of data classification categories to help you gain context from your data for richer and more accurate labeling.

Understanding Data Classification

Classification is the process of labeling and categorizing data based on the type of information it holds. Data classification helps you to understand the value and sensitivity of your data, as well as the impact on your business if that data were exposed. From this information, you can set effective security policies.

Why Classification Is Key to Cloud Security

In addition to playing a major part in improving an organization’s security posture, data classification is explicitly required by HIPAA, SOC 2, ISO 27001, and other compliance frameworks. It also helps organizations to streamline governance, risk, and compliance (GRC) efforts in numerous ways:

  • Granular security policies: Data classification helps organizations define security policies (such as access controls) specific to the data they need to secure.
  • Incident management: Classification helps businesses prioritize incidents that involve sensitive or valuable data over issues that involve non-sensitive data.
  • Compliance and regulation: Classification allows organizations to identify, categorize and apply appropriate controls around regulated data like PII, PHI and credit card details (PCI) to meet compliance requirements. During audits and regulatory reviews, classification provides the ability to demonstrate compliance by showing how regulated data is handled.
  • Data detection and response (DDR) accuracy: Once data is classified, organizations can implement more effective real-time monitoring for data incidents, highlighting cases where sensitive data is put at risk and requires immediate response from security teams.
  • Reduced attack surface: Organizations can reduce their attack surface area by consolidating duplicated data and ensuring data is accessible in accord with least privilege  principles.
  • Prioritization: Not all data is created equal. Classification enables overworked security teams to focus their security efforts on the data assets that would have a larger impact in the event of a data breach or compliance violation.

Typical Data Classification Challenges in the Modern Enterprise

Data classification is only effective if carried out consistently at a company level. Today’s complex data infrastructure means that data often remains unclassified or inadequately classified, rendering downstream policies ineffective.

Common Data Classification Challenges

Data Fragmentation

Discovering and monitoring every repository where data needs classification becomes challenging when data spans services in hybrid environments, such as cloud-based or on-premises databases, big data platforms, data lakes, and collaboration systems.

Use of Unstructured Data

While structured data is queryable, its unstructured counterpart (documents, media files, PDFs and emails) requires more resources and frequent manual intervention to classify.

Shadow Data

The cloud’s elasticity that enables developers to spin services up and down with minimal friction also results in unknown, undiscovered and, implicitly unclassified data.

Mergers and Acquisitions

Differences in security policies, classification practices and IT architectures between distinct business entities means inconsistent classification and inadequate policy enforcement.

How to Classify? 5 Categories for Data Classification

To define rich and comprehensive security policies, data must be classified by type, context, subject and sensitivity.

1. Data Types

Data types are the most granular building block of classification to enable policy definition and enforcement. Some examples of data types include email addresses, social security numbers, country codes, payment card information, and the like. Data security posture management (DSPM) solutions will usually have prebuilt classifiers or data types, as well as custom data types based on business needs.

Using data types can correctly classify data that would otherwise be difficult to identify with simple techniques like regular expressions. Not all eight-number strings are social security numbers (SSN), for example. Regular expressions that query for eight-number strings to identify SSNs may produce false positives, in other words. More advanced classification engines use context analysis, validation functions and ML/AI models to validate accuracy. This should be achieved with low resource consumption, high performance, and without compromising accuracy.

2. Context

Simply labeling data by its type isn’t enough to derive appropriate policies. Some data types, after all, require different policies based on the business context. An email address, for example, requires different policies depending on who it belongs to and how it’s used. It can be associated with an employee or a customer, belong to someone from the US or the EU, or have a generic domain name such as @gmail.com or a sensitive one such as @gov.us.

Organizations can determine the context surrounding a data point by identifying metadata (e.g., timestamps, format, location) and by enriching the data — for example, by comparing it against other sources such as CRM or ERP.

Enrichment can also provide context by associating two disparate data points to extract the value and level of sensitivity. For example, a name and address are qualified as personally identifiable information and are subject to regulations such as GDPR. But a name, address and credit card number are also subject to the Payment Card Industry Data Security Standard (PCI DSS).

DSPM tools can automate the data classification process to identify and enrich data points with business, privacy and security attributes such as location, how the data was generated, modifications, residency, retention period and applicable laws.

3. Subject

Some types/instances/flavors of sensitive data can’t be accurately identified by predefined data types. For example, a contract might not match a specific PII pattern but still be considered sensitive due to trade secrets or intellectual property.

Sensitive data may be created and stored in a variety of file formats. The file’s subject offers a great deal of information about the type of data it holds. For example, these can be contracts, resumes, hospital discharge forms, patents, IT architecture documents, and even database tables.

Defining policies according to file subjects is both intuitive and rich. For example, IT architecture documents are entirely reserved for senior IT staff, such as architects. These are also highly sensitive documents, and data leaks would pose major cybersecurity concerns.

One challenge in using file subjects to define security policies is the inconsistency of naming conventions. For example, job applications may have associated files that can take multiple forms, such as ‘FirstName-LastName-Resume’ or ‘FirstName-LastName-CV,’ or even just ‘FirstName-LastName.’ Mature DSPM solutions can accurately classify these types of data across inconsistent naming conventions.

4. Sensitivity

Standards organizations, such as the International Standards Organization (ISO) and the National Institute of Standards and Technology (NIST), advise against practices that treat all data equally: Organizations are mandated by regulation to classify data and label data sensitivity, based on the contents of the data. The risk related to a specific dataset or record is determined based on the sensitivity and level of exposure.

Classifying data can help organizations determine the sensitivity levels associated with their data assets, which is often determined by the consequences of their exposure.

  • Regulatory fines: A leak of customer data may result in a GDPR breach fine.
  • Disruption to business operations: Failing to adhere to regulations such as the PCI standard can mean the withdrawal of the facility to take payment by credit and debit card.
  • Reputational damage: Customers and partners losing trust in the organization following a breach.
  • Commercial interests: Losing trade secrets or other classified documents.

Additionally, sensitivity is determined by the breadth and depth of the affected data. For example, a shallow and narrow data point can include just a list of first and family names. While this is considered PII, the impact of having this data compromised is low, and as such, the sensitivity is also low. As the information gets richer — adding a billing address, card number, transactions and the location of the transaction — the impact and associated sensitivity become much higher.

5. Microsoft Information Protection (MIP) Labels

Microsoft Information Protection is a system applicable to the whole Microsoft estate (as well as non-Microsoft resources) that assigns sensitivity labels to documents such as emails, Word documents, and spreadsheets. These labels are customizable by each customer but default to the following:

  • Non-business: User personal data
  • Public: Business data freely available and approved for public consumption
  • General: Business data for internal use and not meant for a public audience
  • Confidential: Business data that can cause harm if overshared
  • Highly confidential: Sensitive business reserved for certain persons

Each label has additional security measures, such as encryption read access controls, as well as restricted file sharing via email or uploaded to file servers or storage services. From the above, the default label assigned whenever a document is created is ‘general.’

Besides the default label assignment when a document is created, the MIP labels are static, meaning that any changes to the labels are often made manually or via limited automations, without adequate consideration of the content of the document. This poses an issue when a collaborative document labeled ‘general’ has confidential information added to it without a label change.

A mature DSPM solution can read and interpret the contents of an MIP-labeled document to alert the security teams of the mislabeled file and suggest an adequate sensitivity level.

Learn More

Using Prisma Cloud DSPM, organizations can conduct data discovery to identify the content and context of data stored in the cloud. Prisma Cloud analyzes the data contents, creating a highly accurate classification that allows organizations to prioritize risks effectively. With risk analysis for sensitive data, organizations can enforce policies and practices across the enterprise and multicloud infrastructure.

But understanding the posture of the data is only the beginning. Prisma Cloud also delivers Data Detection and Response (DDR) to detect changes in your cloud data security landscape as they happen, identifying risky behaviors and data exfiltration attempts.

Get the status of your cloud data with a free security assessment — and see first-hand how Prisma Cloud can help you protect your most valuable data assets. To learn more, download Securing the Data Landscape with DSPM and DDR.


Subscribe to Cloud Native Security Blogs!

Sign up to receive must-read articles, Playbooks of the Week, new feature announcements, and more.