The first step to comply with the GDPR is to evaluate if this applies to your organization and if so, what data should be adjusted to meet it. To perform this analysis, it is necessary to know what data you have and where they are located. Having a data classification scheme can help answer requests for access to personal data in a faster and more controlled way.

SQL Server has several tools that can help you discover and classify this personal data:

  • You can execute queries to system tables and metadata to quickly identify if a column has personal data or not.
  • If you have free text fields, such as the typical “Observations”, you can use Full-Text Search to find personal data.
  • We can use the Extended Properties feature to tag and add a description to fields or tables that contain personal data. As of version 17.5 of SQL Server Management Studio, a functionality has been added that will help us to carry out this classification in a very easy way.

Discover

The best way to detect which columns can contain personal information is to have a good documentation of the database to know in a fast way the tables that are involved, unfortunately we are in the real world and 90% of the time this documentation does not exist or it is obsolete, so here comes SQL Server and its sys.columns table to help us.

If you did not know the sys.columns table, a quick description would be “It is a system table that holds information about all the columns of the objects in a database, such as tables or views”, if you need more information, here you have the link to the official documentation  sys.columns (Transact-SQL).

For example, to be able to know all the columns of a database that contain the word “name” we could execute the following query:

From this, we have created a small script that starts with a temporary table as a dictionary in which we will put all the words we want to search (in this case, candidate words to host personal information) and then using cursors will search for those words in all the database and save it in another temporary table that will be displayed at the end.

It is important to note that in the script you can add new words you want to search, no matter if they are uppercase or lowercase or with or without accents, it will find them.

Here we can see the results that the AdventureWorks2016 database would return:

Classify

Once we have discovered the columns that contain personal data, the next step we must take is to classify them according to the type of data they store, for this we have the Extended Properties of the columns.

The Extended Properties are available from SQL Server 2008 and it is a way to add a description or a classification to each column, in order to access those features we will only have to press the right button on a column and click on properties.

Now you are at that moment that you think that if you have to do this column by column you will take a century, but do not be alarmed, there is a solution that will make this process much faster and even a little automated and that is available from version 17.5 of SQL Server Management Studio.

We can access this option by right clicking on a database->Tasks->Classify Data…

Just open it, and automatically we will make a recommendation for the classification of certain columns.

When clicking, the list of columns will appear.

We can select the columns that we want and click on “Accept selected recommendations”. This action will do is add the Information Type and the Sensibility Label to the Extended Properties of the column.

Once we finish with the suggested columns, we will rely on the list that we will have extracted with our script in the discovery phase to classify other columns.

The classification will be done by choosing an Information type and a Sensitivity Label, inside each one we will find the following types:

Information Type:

  • Banking
  • Contact Info
  • Credentials
  • Credit Card
  • Date Of Birth
  • Financial
  • Health
  • Name
  • National ID
  • Networking
  • SSN
  • Other
  • [n/a]

Sensitivity Label:

  • Public
  • General
  • Confidential
  • Confidential – GDPR
  • Highly Confidential
  • Highly Confidential – GDPR
  • [n/a]

Now that we have discovered and classified our personal data, we can launch a report that shows us statistical data of this classification in a fast way.

We access this report by right clicking on a database->Reports->Standard Reports->Data Classification

In our case, we will only see in the report 4 columns, which are the ones we had previously marked with the Classify Data tool.

Now that we have all our columns perfectly classified, it’s time to go to the next point of the GDPR and see how to Manage this data.

GDPR Series:

  1. How to adapt SQL Server to the GDPR
  2. Discovering and classifying personal data with SQL Server
  3. Managing access and use of data with SQL Server

If you want us to help your business or company to comply with the GDPR contact us at info@aleson-itc.com or call us at +34 962 681 242

Recommended Posts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.