SQL Server Statistics Density - What is the purpose?

Hi Friends,

There is a table in AdventureWorks2008R2 called SalesOrderHeader which has a column called TotalDue. Let us create statistics dennsity for this column

USE [AdventureWorks2008R2]
GO
 
CREATE STATISTICS [st_TotalDue] ON [Sales].[SalesOrderHeader]([TotalDue])
GO

Now, let us display the statistics that we have just created:

DBCC Show_Statistics('Sales.SalesOrderHeader',st_TotalDue)
GO

We all know the purpose of statistics, right? It helps the SQL Query Optimizer in cardinality estimation. In other words, it helps the optimizer to produce an efficient query execution plan.

What exactly is the purpose of DENSITY factor here?

1_SQL_Server_What_is_the_purpose_of_DENSITY_in_STATISTICS

Density can be calculated as 1/ distinct values and can be cross verified using the below query.

Select [DistinctValue]=Count(Distinct TotalDue),
[Density]=1/CAST(Count(Distinct TotalDue) AS Float) 
from Sales.SalesOrderHeader 
GO

2_SQL_Server_What_is_the_purpose_of_DENSITY_in_STATISTICS

But what does this mean and where does it help really?
Density, just like other items in statistics describes the data. It is a calculation of uniqueness of data in that column. It answers the question: “How often duplicate values occur in TotalDue column?”

High density would mean less unique data. And low density would mean greater uniqueness. Density can only be between 0 and 1. So what if density is 1? It means 100% duplicate records.

Density information is vital to improve the Query Optimizer’s estimates for GROUP BY operations, and that’s just one of the uses.

Let us consider the same example as above, the number of distinct values in TotalDue column of Sales.SalesOrderHeader table is 4754. Density can be computed, as mentioned earlier, as 1 / “number of distinct values,” which will be 1 / 4754, which is 0.000210349179638199 as shown in the above image.

The Query Optimizer can benefit from already available density information to estimate the cardinality of GROUP BY queries (estimated number of distinct values). Since this density information is already available, now all the optimizer needs to do is to calculate the estimated number of distinct values by calculating the reciprocal of the density value. As shown earlier in the verification code, to estimate the cardinality of the following query using GROUP BY TotalDue, we can calculate the reciprocal of the TotalDue density which will be 1 / 0.000210349179638199, that gives us 4754, and that is the estimated number of rows as shown below:

[Turn on Actual Execution Plan]

SELECT TotalDue from Sales.SalesOrderHeader
GROUP BY TotalDue

3_SQL_Server_What_is_the_purpose_of_DENSITY_in_STATISTICS

In my next post I shall show how density information can be used to estimate the cardinality of a query using local variables.

2 Comments on “SQL Server Statistics Density – What is the purpose?”

Kelly Fitzgerald says:

July 18, 2014 at 11:53 am

So density is equl to cardinality?!?! Why change the name? Microsoft and Oracle need to use common terms very frustrating when they change it.

Daniel D says:

July 18, 2014 at 11:54 am

Thank you Amit. Great write up on this aspect of SQL Server. Very clear and concise. Excellent visuals and examples.

SQL Server Statistics Density – What is the purpose?

About Amit Bansal

2 Comments on “SQL Server Statistics Density – What is the purpose?”

Leave a Reply Cancel reply

Related Posts

SQL Server Query Cost, Memory Grant & SQLRESERVATIONS Clerk

SQL Server Memory Troubleshooting

Performance Monitor Counters

About Amit Bansal

2 Comments on “SQL Server Statistics Density – What is the purpose?”

Leave a Reply Cancel reply