Managing 4+ Terabytes DW - A story about indexes and execution times
April 15, 2020
Alex Puiu

Working with a Data Warehouse with more than 4TB of data for the last couple of years was very challenging but also highly rewarding.

As I’m pretty much a person that works best when challenged and when I have to deal with new things, I am always up for a task that requires a lot of perseverance.

In this project I had my first contact with Columnstore Clustered indexes, a very interesting concept that got me engaged and I started reading about it. As I did not join the project from the beginning, there were already tables using this kind of index and I have to admit, performance and compression are really impressive.

Of course, the time came and I got the chance to work with this technology myself. It was me, a 800M records source table and a SSIS package meant to move the data from the source table to the DW table, of course, using a Columnstore Clustered index.

After the initial load, the recurrent process started working and then I had a bigger volume of data that was brought in the source table, around 25 Million records, nothing too odd, considering that other processes that were running with hundred thousands to millions, concurrently, without exceeding 30-40 minutes per execution, including lookups and updates, I did not expect something to exceed 1.5 or 2 hours.  

Oh, I was wrong. After a 16 hours execution, I was thinking of all the possible scenarios that could have happened and what could have caused that really odd execution, taking in consideration that it was a pretty simple ETL, without much logic happening on the process. So I started investigating the execution logs and then I understood one of the downsides of the compression gains and I got a better idea on how the Columnstore Clustered index works. Of course I reached Niko Neugebauer’s blogs about Columnstore indexes(, which are a great read, I always recommend them to anyone starting to get into this topic.

After couple of hours of reading and checking logs, I’ve noticed that once the row-groups were filled, the compression of the row groups and the extension of the file started, which had a really strange behavior, taking more to compress and extend the file, than the data movement itself.

One of the issues was that I was using a not so good batch size, I think it was 100K records, meaning that the engine would not store the data directly into compressed segments, but rather put the data into open delta stores, which took a lot of time for the engine to compress into row groups.  

Reading more about it, I found that there is a “magic number” for the bulk insert API that helps the engine store the data into a compressed segment, rather than using open delta stores that would need to be compressed at the next execution of the Tuple Mover.

Niko has performed some interesting tests that prove the “magic number” being used for storing the data into a compressed segment, using the bulk insert API and you can read more about those tests at the following link:

As a closing note, this was one of the experiences that really made me understand that no performance gain comes “for free” and made me more cautious about technologies and concepts that I have not used before. Also, after this experience, one of my biggest gains were Niko’s blogs, which are useful and helped me a lot understand the behinds of the Columnstore indexes, with their pros and cons.

Talk to the team