Data Discovery and Classification by Prince Rastogi

This article first appeared in the SQLServerGeeks Magazine.
Author: Prince Rastogi
Subscribe to get your copy.

Data Security is one of the major important area of any business now days. We know, data size is growing rapidly in almost every type of business which makes it more important to protect data. There is various type of features in SQL Server which help us to apply security layers to protect data for example: Transparent Data Encryption, Backups Encryption, Always Encryption, Row level Security etc. Before implementing these features, it is also important to identity what type of data is stored in databases then we can decide what security layer is required to protect that data. This is the area where Data discovery and classifications comes into picture.

Data Discovery and classification is the process of scanning your data and classify that data by apply proper labels based on predefined set of patterns, keywords, or rules.

As a Database Administrator/Database Security Professional/Architect, it is also important to keep good documentation of what type of data is stored in databases and how critical it is. Once you have all this information then it will be easy for you to decide what type of security control needs to be placed around that data to achieve compliance. Data Discovery and Classification feature in SQL Server Management Studio can help us to achieve this. Microsoft Initially launched Data Discovery and Classification in SQL Server Management Studio 17.5 and this can be used for databases running on SQL Server 2012 and later.

To use this feature, you can right click on your database in SSMS -> Tasks -> Data Discovery and Classification.

Classify Data:
This option will show you all classified columns in the database (if any). This will also show you some classification recommendations based on patterns or rules defined to scan columns based on their names. Below is the screen shot for Classify Data on Adventure Works database:

Here we don’t have any column classified but we have some recommendations. We can get details of all recommendations by clicking on top or bottom message.

Predefined default Information types are as mention below. Using these information types you can categorize your data easily. if you want to change Information Type for any column in recommendation list then you can change that from drop down.

Once you have proper category for column then you can place a proper sensitivity label on that. Predefined sensitivity levels are as mention below. You can change this label using drop down if not properly identified under recommendation.

Once you are done by accepting/modifying recommendations then click on “Accept selected recommendations” button.

After Accepting Recommendations, save button will be enabled and click on “Save” to save all these recommendations.

If there is some column which is not in recommendation list, then can be added manually by clicking on “Add Classification” option next to “Save”.

Generate Report:
This is the second option under Data Discovery and Classification. Using this option report can be generated which can provide information about – How many columns have been classified out of total number of columns. How many tables contain sensitive data out of total number of tables etc.

Export Information Protection Policy:
Using this option, you can export default protection policy JSON file where all rules are defined. You can modify any existing labels or can add new labels based on your organization standards.

Same way information types can also be modified or add based on organization standard. Here under Information type, patterns are defined to provide recommendations. You can modify or add new patterns. This is something easy to manage.

Once all changes are done to this file then you can upload this file back using “Set Information Protection Policy File..” to overwrite default information protection file. If you want to go back to default rules then use “Reset Information Protection Policy to default..” under Data Discovery and Classification.

If you are using SQL Server 2019 then you can use sys.sensitivity_classifications system catalog view to get classified column details. You can also add new classified column(s) using ADD SENSITIVITY CLASSIFICATION syntax.

Conclusion:
Data Discovery & Classification in SQL Server Management Studio and SQL Server can play a very important role to categorize all your data inside databases. This will help in various ways to database security and policies.

This article first appeared in the SQLServerGeeks Magazine.
Author: Prince Rastogi
Subscribe to get your copy.

Leave a Reply

Your email address will not be published. Required fields are marked *