SQL Server – The Concept of Density

In this blog, we are going to be diving into The Concept of Density.

What is Density?
Density is defined as the measure of the uniqueness of the data in a column. This implies that density is measured on a per column basis. In other words, density can also be defined based on how often we encounter duplicate values.
The value for Density ranges from 0 to 1.0, which can be obtained using the following formula
Density = 1/(number of distinct values in a column)
(Or)
Density = Average number of duplicates for a certain value/Total row count.
High density would imply that there is less unique data in the column, whereas low density would imply that the data present in the column is more unique.
We will be using AdventureWorks2014 database for this demo.
The table named SalesOrderHeader has two particular columns of special interest to us that are RevisionNumber & TotalDue.

USE AdventureWorks2014
GO

SELECT RevisionNumber
      ,TotalDue 
FROM Sales.SalesOrderHeader

density

RevisionNumber has multiple duplicate values, while TotalDue contains a variety of values thus making it the more unique of the two.

First, let’s check the total number of rows in the table. There are 31465 records in the table.

SELECT COUNT(*) NoOfRecords FROM Sales.SalesOrderHeader

density

Next, a SELECT statement is executed (as shown below), to count the number of distinct values present in the column TotalDue.

SELECT COUNT( DISTINCT TotalDue) AS DistinctTotalDue 
FROM Sales.SalesOrderHeader

density

There are 4754 unique entries in the column TotalDue. Having obtained the value, it can be applied in the formula to get the Density of that specific column.

Density = 1/4754
= 0.000210

0.000210 is the density factor for the column TotalDue.

Next, repeat the same process on RevisionNumber, it is seen that there are only 2 distinct values in the entire column and thus the density factor will be greater than the previous case.

SELECT COUNT( DISTINCT RevisionNumber) AS DistinctRevisionNumber
FROM Sales.SalesOrderHeader

Density = 1/2
= 0.50

0.50 is the density factor for the column RevisionNumber.

Density is used by the query optimizer when it is evaluating multiple plans and of course also in the case of cardinality estimation. For this blog, we will take the column TotalDue as an example and attempt to understand where exactly is the value of density stored.

Let’s manually create STATISTICS for the column TotalDue.

CREATE STATISTICS TotalDue ON Sales.SalesOrderHeader(TotalDue)
GO

Next, execute a statement to display the statistics for TotalDue as shown below.

DBCC SHOW_STATISTICS (N'Sales.SalesOrderHeader', TotalDue)
GO

density

There are multiple columns of information here, including a particular record that provides us with the density, which is 0.0002103492, the same number obtained from the calculations earlier.

How the query optimizer uses this piece of information to calculate the Cardinality Estimation is beyond the scope of this blog, but rest assured there will be a blog soon addressing this particular topic. So, stay tuned!

Let’s drop the statistics we created earlier for this demo.

DROP STATISTICS Sales.SalesOrderHeader.TotalDue

SQL Server – The Concept of Density

About Amit Bansal

Leave a Reply Cancel reply

Related Posts

SQL Server Query Cost, Memory Grant & SQLRESERVATIONS Clerk

SQL Server Memory Troubleshooting

Performance Monitor Counters

About Amit Bansal

Leave a Reply Cancel reply