SQL Server Optimizing Update Queries for Large Data Volumes

February 5, 2012December 23, 2019 - by SQL Geek - 14 Comments.

Updating very large tables can be a time taking task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues.

Here are few tips to SQL Server Optimizing the updates on large data volumes.

Removing index on the column to be updated.
Executing the update in smaller batches.
Disabling Delete triggers.
Replacing Update statement with a Bulk-Insert operation.

With that being said, let’s apply the above points to optimize an update query.

The code below creates a dummy table with 200,000 rows and required indexes.

CREATE TABLE tblverylargetable 
  ( 
     sno  INT IDENTITY, 
     col1 CHAR(800), 
     col2 CHAR(800), 
     col3 CHAR(800) 
  ) 
GO 
DECLARE @i INT=0 
WHILE( @i < 200000 ) 
  BEGIN
      INSERT INTO tblverylargetable 
      VALUES     ('Dummy', 
                  Replicate('Dummy', 160), 
                  Replicate('Dummy', 160)) 
      SET @i=@i + 1 
  END
GO
CREATE INDEX ix_col1 
  ON tblverylargetable(col1) 
GO 
CREATE INDEX ix_col2_col3 
  ON tblverylargetable(col2) 
  INCLUDE(col3)

Consider the following update query which is to be optimized. It’s a very straight forward query to update a single column.

UPDATE tblverylargetable 
SET    col1 = 'D'
WHERE  col1 = 'Dummy'

The query takes 2:19 minutes to execute.

Let’s look at the execution plan of the query shown below. In addition to the clustered index update, the index ix_col1 is also updated. The index update and Sort operation together take 64% of the execution cost.

1_SQL_Server_Optimizing_Update_Queries_for_Large_Data_Volumes

1. Removing index on the column to be updated

The same query takes 14-18 seconds when there isn’t any index on col1. Thus, an update query runs faster if the column to be updated is not an index key column. The index can always be created once the update completes.

2. Executing the update in smaller batches

The query can be further optimized by executing it in smaller batches. This is generally faster. The code below updates the records in batches of 20000.

DECLARE @i INT=1 
WHILE( @i <= 10 ) 
  BEGIN
      UPDATE TOP(20000) tblverylargetable 
      SET    col1 = 'D'
      WHERE  col1 = 'Dummy'
      SET @i=@i + 1 
  END

The above query takes 6-8 seconds to execute. When updating in batches, even if the update fails or it needs to be stopped, only rows from the current batch are rolled back.

3. Disabling Delete triggers

Triggers with cursors can extremely slow down the performance of a delete query. Disabling After delete triggers will considerably increase the query performance.

4. Replacing Update statement with a Bulk-Insert operation

An update statement is a fully logged operation and thus it will certainly take considerable amount of time if millions of rows are to be updated.The fastest way to speed up the update query is to replace it with a bulk-insert operation. It is a minimally logged operation in simple and Bulk-logged recovery model. This can be done easily by doing a bulk-insert in a new table and then rename the table to original one. The required indexes and constraint can be created on a new table as required.

The code below shows how the update can be converted to a bulk-insert operation. It takes 4 seconds to execute.

SELECT sno, 
       CASE col1 
         WHEN 'Dummy' THEN 'D'
         ELSE col1 
       END AS col1, 
       col2, 
       col3 
INTO   tblverylargetabletemp 
FROM   tblverylargetable

The bulk-insert can then be further optimized to get additional performance boost.

Hope this helps!!!

Like us on FaceBook | Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

About SQL Geek

View all posts by SQL Geek →

14 Comments on “SQL Server Optimizing Update Queries for Large Data Volumes”

Mohit says:

July 1, 2014 at 8:18 am

Good General practices…

I don’t think you need to delete the index in question, just DISABLE it. This way you do not have to backup the defination.

Also consider the effect of Transaction Log, with large data modifcations or inserts. Consider the performance impact of Autocommit vs Explicit Commit transaction models. Have a look at http://blogs.msdn.com/b/sqlsakthi/archive/2011/04/17/what-is-writelog-waittype-and-how-to-troubleshoot-and-fix-this-wait-in-sql-server.aspx. Even though his talking about bottle neck on writelog, it still shows interseting point on affects of autocommit on large inserts.

Reply
1. Anil says:
  
  November 14, 2014 at 8:53 am
  
  HI Mohit,
  
  Clustered index can not be disabled. And if disabled, underlying table won’t be accessible.
  
  Br,
  Anil
  
  Reply
Ahmad Osama says:

July 1, 2014 at 8:19 am

Thanks Mohit – yes..disable will be a better option, missed it. I didn’t find significant performance gain with Explicit commit in case of update although it works great for inserts.

Thanks,

Ahmad

Reply
Dennis Namovice says:

July 1, 2014 at 8:19 am

I believe you will see a gain in performance if you sort the incoming data by the primary key of the data being updated. I hope this helps.

Reply
Ahmad Osama says:

July 1, 2014 at 8:20 am

Hi Dennis – if you are referring to bulk-insert, yes it will increase performance.

Thanks,

Ahmad

Reply
Nicolas Souquet says:

July 1, 2014 at 8:20 am

Thanks for this post.

For the batch approach, if I need to update a couple of million rows, I then :
– store only the PRIMARY KEY of the tableto get UPDATEd, T1, in another table, T2
– cluster-index that column in T2
– then proceed by taking the TOP X thousand rows ORDER BY the T2 PRIMARY KEY column, UPDATE those in T1, then DELETE those from T2.

This way I can get a constant speed for the UPDATEs 😉

Reply
Ahmad Osama says:

July 1, 2014 at 8:22 am

your welcome Nicolas.. only for couple of million rows.. however update+delete will generate lot of t-logs..

Reply
Nadia says:

July 1, 2014 at 8:25 am

want to update a huge table with more than 2000000 row .

here is my code :
DECLARE @i INTEGER
DECLARE @TotalNum Int
DECLARE @Step int
select @Step = round((select count(*) from T1) / 500000,0)
set @i=1
WHILE( @i <=@Step)
BEGIN
Update T2
set SellingPrice = A.maxprice
from T2 inner join
(SELECT MAX(SellingPrice) AS maxprice, Product, Color, Size, DimensionCode, WarehouseCode
FROM T1
GROUP BY Product, Color, Size, DimensionCode, WarehouseCode) as A
on A.Product = T2.Product
and A.color = T2.Color
and A.Size=T2.Size
and A.DimensionCode = T2.DimensionCode
and A.WarehouseCode = T2.WarehouseCode
where T2.SellingPrice =0 or T2.SellingPrice is null
SET @i=@i + 1
END

but this code is still slow
do you have any idea

Reply
Ahmad Osama says:

July 1, 2014 at 8:26 am

Hi Nadia – Please share the execution plan for the given query.. also check how much time does it take to update a single record using the query you mentioned, if it takes time then the query needs to be optimized.

Thanks,
Ahmad Osama

Reply
Megha says:

July 1, 2014 at 8:27 am

Hi Ahmad,

I need to update two columns in three tables, and aprroximate records will be 20K.

So, which approach is better? to use a Stored Procedure or Batch Process?

Reply
Ahmad Osama says:

July 1, 2014 at 8:28 am

Hi Megha – you can use any of the above method whether in a procedure or as adhoc parameterised sql query. the batch process can be easily wrapped into a procedure.

Reply
Ankush Chauhan says:

July 1, 2014 at 8:28 am

Thanks! that was indeed quite informative.

Reply
Hemendra says:

January 9, 2015 at 12:24 pm

I need to update target table that is around 200 million rows. Source table has around 2 million rows. Both table are matched on the basis of primary key and then update all columns. Total column count is 75 those are being updated.
What can be the best approach to optimize this operation?

Reply
Pingback: SQLServerGeeks and the MVP Journey | Ahmad Osama

Related Posts

SQL Server Query Cost, Memory Grant & SQLRESERVATIONS Clerk

SQL Server Memory Troubleshooting

Performance Monitor Counters

About SQL Geek

14 Comments on “SQL Server Optimizing Update Queries for Large Data Volumes”

Leave a Reply Cancel reply