Economize Space with the Index ColumnStore Compression

By Miguel Ángel Motos Posted Tuesday March 26th, 2019 In SQL Server Spanish 0 0

We are going to do a small study and comparison of the indexes oriented in columns for SQL Server (Index ColumnStore).

These indexes can be up to 10 times faster than traditional indexes oriented to rows (Index RowStore) and, speaking of the size they occupy, data compression can be up to 7 times smaller in relation to uncompressed data.

1.- First analysis.

By focusing on one type of storage per column, many of the data in the columns are repeated, resulting in a very high compression rate. In fact, the storage indexes based on columns have a default compression, this makes the queries on the tables go faster because they need less memory to execute them.

A problem that we can find is a greater number of locks when trying to perform the INSERT, UPDATE and DELETE functions in the tables with ColumnStore Index. Hence, these indexes should be used in tables of type OLAP and not OLTP as we saw in the previous entry.

2.- Real Scenario.

To perform the following tests, we will use the ContosoBi database, downloadable from here. Once downloaded, we will execute the following command to see how much space the dbo.FactSales table occupies.

exec sp_spaceused '[dbo].[FactSales]', true;

Here is the result:

OK, so let’s create the ColumnStore Index.

create clustered columnstore index [PK_FactSales_SalesKey]
on dbo.[FactSales]
with ( DROP_EXISTING = ON );

ups.. What’s happened?

Msg 1907, Level 16, State 1, Line 1
Cannot recreate index ‘PK_FactSales_SalesKey’. The new index definition does not match the constraint being enforced by the existing index.

Simple, the creation of the index has failed since we must remove all the Foreign Keys (FK) and the Primary Key (PK) of the table. For this we will use the following script:

ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [FK_FactSales_DimCurrency]
GO
ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [FK_FactSales_DimChannel]
GO
ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [FK_FactSales_DimDate]
GO
ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [FK_FactSales_DimProduct]
GO
ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [FK_FactSales_DimPromotion]
GO
ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [FK_FactSales_DimStore]
GO

ALTER TABLE [dbo].[FactSales] DROP CONSTRAINT [PK_FactSales_FactSalesKey] WITH ( ONLINE = OFF )
GO

Once deleted, we proceed again to create the Index ColumnStore on the FactSales table.

First of all, during the creation of the index, we must consider that blockages may occur, which would slow parallel queries thrown against the database. We strongly recommend creating indexes in less active hours on the server to minimize risks. Also with the following query we will control the possible blockages that may exist in the database:

SELECT * FROM (
SELECT 
[Process ID] = p.spid
,[HostName] = p.hostname
,[User] = p.loginame 
,[Database] = ISNULL(db_name(p.dbid),N'')
,[Query] = sql1.text
,[Execution Plan] = dmeqp.query_plan
,[Command] = p.cmd 
,[Blocked By] = p.blocked
,[Blocked By Query] = sp.text
,[Blocking] = CASE 
WHEN (SELECT count(*) FROM master.dbo.sysprocesses pp WHERE p.spid=pp.blocked)>0 THEN 1
ELSE 0
END 
,[Wait Time Miliseconds] = p.waittime 
,[Wait Type] = CASE 
WHEN p.waittype = 0 THEN N'' 
ELSE p.lastwaittype 
END
,[Kill Command] = 'kill ' + CONVERT(varchar(10), p.spid)
FROM master.dbo.sysprocesses p
LEFT JOIN sys.dm_exec_requests dmer on dmer.session_id = p.spid
CROSS APPLY sys.dm_exec_sql_text(p.sql_handle) sql1
OUTER APPLY sys.dm_exec_query_plan(dmer.plan_handle) dmeqp
LEFT JOIN (SELECT mp.spid,sql2.text FROM master.dbo.sysprocesses mp
CROSS APPLY sys.dm_exec_sql_text(mp.sql_handle) sql2 ) sp ON p.blocked=sp.spid
) AS tabla
WHERE [Blocking] <> 0 or [Blocked By] <> 0 

ORDER BY [Process ID]

Ahora sí, teniendo en cuenta los posibles bloqueos y estando en una franja horaria de menor actividad en el servidor, creemos el índice:

Now, if we are in a time slot with less activity on the server and controlling the possible blockages, we create the index:

create clustered columnstore index [PK_FactSales_SalesKey]
on dbo.[FactSales];
GO

With this we already have our ColumnStore index created.

Let’s see now how much space it occupies with the same command used at the beginning:

exec sp_spaceused '[dbo].[FactSales]', true;

Awesome! the table has gone from occupying 151368KB to occupying 46856KB. We are talking about a reduction of approximately 66%.

3.- The maximun compression.

If you think that the compression given by ColumnStore index is not enough, we will make a bigger compression. This type of compression is called “archive” and, principally, it is useful for tables where the data is inserted, even less frequently. An example of a table suitable for us to perform the compression type “archive” would be a sales history, since the data in them could be inserted once a week.

In our case, we are going to perform this type of compression on the same table [dbo]. [FactSales], so we can see the difference between the default compression of the ColumnStore index and the archive compression. We execute the following command:

alter index all on dbo.[FactSales]
rebuild with ( DATA_COMPRESSION = COLUMNSTORE_ARCHIVE );

Una vez establecida la compresión archive del índice ColumnStore, de nuevo vamos a comprobar el espacio ocupado de la tabla:

Once the archive compression of the ColumnStore index, let’s check the occupied space of the table again:

exec sp_spaceused '[dbo].[FactSales]', true;

We see that the size of the table has gone from 46856KB to 42056KB. It does not seem like a big change, even so, if we think of an extremely large table, this compression would be significant.

If you want us to help your business or company contact us at info@aleson-itc.com or call us at +34 962 681 242