In today’s blog, we will discuss SQL Server Statistics. This is a broad category comprising of multiple concepts which will be covered in future blogs as a series. This blog will be focusing on the fundamentals of SQL Server Statistics before diving into the more complex concepts.
Statistics is an object in SQL Server that contains a large amount of data about the contents present in a table. When a query is submitted to SQL Server, the optimizer generates an execution plan and provides the result. This execution plan is heavily dependent on statistics. In other words, incorrect statistics lead to an inefficient execution plan. The execution plan is comprised of multiple iterators, each dealing with multiple rows. If the optimizer knows the number of rows it has to deal with, then it can choose the most appropriate iterator and thus create the most efficient execution plan. To know the estimated number of rows, it relies on statistical data, which is termed as Cardinality Estimation (Estimated No.of Rows vs Actual No.of Rows).
For this demo, we will be using Adventureworks2012 database that contains a table named Person.Person and a copy, Person.Person2.
Running a SELECT* statement without a WHERE clause returns all the data from the table by scanning all the data pages (Table Scan operation).
SP_HELPSTATS statement confirms that there are no indexes currently present on this table.
USE AdventureWorks2012 GO SELECT * INTO Person.Person2 FROM Person.Person GO SELECT * FROM Person.Person2 SP_HELPSTATS N'Person.Person2', 'ALL' GO
From 2012 onwards, the Microsoft SQL Server team introduced another DMV – sys.dm_db_stats_properties. Joining this DMV with the Sys.Stats system catalog gives more insights on statistical data. Execute the below query and we get an empty result set since there are no statistics on Person.Person2 table.
SELECT OBJECT_NAME([sp].[object_id]) AS "Table", [sp].[stats_id] AS "Statistic ID", [s].[name] AS "Statistic", [sp].[last_updated] AS "Last Updated", [sp].[rows], [sp].[rows_sampled], [sp].[unfiltered_rows], [sp].[modification_counter] AS "Modifications" FROM [sys].[stats] AS [s] OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp] WHERE [s].[object_id] = OBJECT_ID(N'Person.Person2');
Now, Let’s run a simple SELECT query again, but this time with a WHERE predicate which returns an empty result set.
SELECT * FROM Person.Person2 WHERE LastName = 'bansal'
Now let’s run SP_HELPSTATS to check whether SQL Server has created any statistics object on LastName column. Yes, it is.
This happened because SQL Server wanted to estimate the number of matching records with LastName as ‘Bansal’ to develop an efficient execution plan. In case the statistics object does not exist, then SQL Server tends to create one automatically provided Auto_Create_Statistics is ON which is ON by default.
Let’s turn off the Auto_Create_Statistics
ALTER DATABASE AdventureWorks2012 SET AUTO_CREATE_STATISTICS OFF
Now, let’s execute the below query and then check whether SQL Server automatically created any statistics or not on MiddleName and LastName columns. Turn ON the actual execution plan before executing the query.
SELECT * FROM Person.Person2 WHERE FirstName = 'Terri' AND MiddleName = 'Lee' AND LastName = 'Duffy'
From the execution plan, observe that SQL Server is performing a Table Scan operation. Also, observe that there is a warning symbol on the table scan operator indicating “Columns With No Statistics”.
Also, the Estimated Number of Rows is 775.079 whereas the Actual Number of Rows is 1. Incorrect estimates are due to a lack of statistics.
After executing the SELECT statement, check for the statistics. Since Auto_Create_Statistics is OFF. SQL Server does not create statistics on FirstName & MiddleName columns. The existing stats is on LastName column.
SELECT OBJECT_NAME([sp].[object_id]) AS "Table", [sp].[stats_id] AS "Statistic ID", [s].[name] AS "Statistic", [sp].[last_updated] AS "Last Updated", [sp].[rows], [sp].[rows_sampled], [sp].[unfiltered_rows], [sp].[modification_counter] AS "Modifications" FROM [sys].[stats] AS [s] OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp] WHERE [s].[object_id] = OBJECT_ID(N'Person.Person2');
Next, Auto_Create_Statistics is turned on and the same operation is performed once again.
Look at the execution plan, parallelism is taking place along with a Table Scan. Also, the Estimated Number of Rows and Actual Number of Rows both have a value of 1. Since SQL Server optimizer has statistical data to make an accurate calculation, it achieves a perfect Cardinality Estimation.
ALTER DATABASE AdventureWorks2012 SET AUTO_CREATE_STATISTICS ON SELECT * FROM Person.Person2 WHERE FirstName = 'Terri' AND MiddleName = 'Lee' AND LastName = 'Duffy'
Execute the below query and observe that two more statistics objects are created as a result of the above SELECT query, one for each column.
SELECT OBJECT_NAME([sp].[object_id]) AS "Table", [sp].[stats_id] AS "Statistic ID", [s].[name] AS "Statistic", [sp].[last_updated] AS "Last Updated", [sp].[rows], [sp].[rows_sampled], [sp].[unfiltered_rows], [sp].[modification_counter] AS "Modifications" FROM [sys].[stats] AS [s] OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp] WHERE [s].[object_id] = OBJECT_ID(N'Person.Person2');
SQL Server can automatically create single-column statistics only. To create multi-column statistics, we either need to create a multi-column index or manually create them using CREATE STATISTICS statement. Creating an index automatically creates statistics on those columns.
The next section of this blog will be dealing with the concept of updating statistics.
The statistics object gets created before the execution plan. Naturally, over some time the data may change and when it does, SQL Server needs to update statistics automatically. For this to take place, at least 20% of the data and a minimum of 500 rows need to change.
In the table Person.Person2, there are a total of 19972 rows. Applying the above metrics, a total of 4994 rows need to change for SQL Server to automatically perform an update on statistics (Only if Auto_Update_Statistics is ON). Before this takes place, it is important to take note of the initial state of the statistics object as shown in the image below.
All values in the Modifications column hold a value of 0, stating that no updates have taken place in the data on any of the columns so far.
Execute the below query to make changes to less than 20% of the data and observe the statistics.
UPDATE Person.Person2 SET FirstName = 'Amit' WHERE BusinessEntityID <=4490
Two things can be noticed from the above screenshot. Firstly, the value of one of the entries in the Modifications column has changed. Secondly, a fourth statistics object has been created, since the query that was executed has a predicate on BusinessEntityID.
Execute the SELECT query again and observe the statistics. The query yields no results because there is no matching record because of the UPDATE statement we ran.
SELECT * FROM Person.Person2 WHERE FirstName = 'Terri' AND MiddleName = 'Lee' AND LastName = 'Duffy'
Check the statistics now. The Last Updated column still shows the old date which confirms that the statistics are not updated. This is because the criteria to trigger auto-update stats are not met.
SELECT OBJECT_NAME([sp].[object_id]) AS "Table", [sp].[stats_id] AS "Statistic ID", [s].[name] AS "Statistic", [sp].[last_updated] AS "Last Updated", [sp].[rows], [sp].[rows_sampled], [sp].[unfiltered_rows], [sp].[modification_counter] AS "Modifications" FROM [sys].[stats] AS [s] OUTER APPLY sys.dm_db_stats_properties ([s].[object_id],[s].[stats_id]) AS [sp] WHERE [s].[object_id] = OBJECT_ID(N'Person.Person2');
When auto-update stats are triggered, the statistics will be automatically updated and a new execution plan will be created based on the new data.
Let’s UPDATE a few more records to trigger an auto-update.
UPDATE Person.Person2 SET FirstName = 'Amit2' WHERE BusinessEntityID <=2000
Now, Check the stats again. The Modifications column shows that 4975 rows got modified but still the Last Updated column shows the old date. This is where most of the SQL professionals get confused. Even though the 20% + 500 criteria are met, statistics are not updated.
The statistics will only get updated if we execute the SELECT query once again (Any query with FirstName column in the predicate). Let’s execute the below query and then check the statistics. See the change in Last Updated column and Modification is also 0 now.
SELECT * FROM Person.Person2 WHERE FirstName = 'Terri' AND MiddleName = 'Lee' AND LastName = 'Duffy'
This sums up our first blog in the series. SQL Server Statistics – dealing with creating and updating statistics. There is surely more to come, given how vast this category is. So, stay tuned, as we continue to deliver quality content on this topic and many more.