2/5/2012 10:50:23 AM
Ahmad Osama -
Updating very large tables can be a time taking task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues.
Here are few tips to optimize the updates on large data volumes.
With that being said, let’s apply the above points to optimize an update query.
The code below creates a dummy table with 200,000 rows and required indexes.
CREATE TABLE tblverylargetable
sno INT IDENTITY,
DECLARE @i INT=0
WHILE( @i < 200000 )
INSERT INTO tblverylargetable
SET @i=@i + 1
CREATE INDEX ix_col1
CREATE INDEX ix_col2_col3
Consider the following update query which is to be optimized. It’s a very straight forward query to update a single column.
SET col1 = 'D'
WHERE col1 = 'Dummy'
The query takes 2:19 minutes to execute.
Let’s look at the execution plan of the query shown below. In addition to the clustered index update, the index ix_col1 is also updated. The index update and Sort operation together take 64% of the execution cost.
1. Removing index on the column to be updated
The same query takes 14-18 seconds when there isn’t any index on col1. Thus, an update query runs faster if the column to be updated is not an index key column. The index can always be created once the update completes.
2. Executing the update in smaller batches
The query can be further optimized by executing it in smaller batches. This is generally faster. The code below updates the records in batches of 20000.
DECLARE @i INT=1
WHILE( @i <= 10 )
UPDATE TOP(20000) tblverylargetable
SET col1 = 'D'
WHERE col1 = 'Dummy'
SET @i=@i + 1
The above query takes 6-8 seconds to execute. When updating in batches, even if the update fails or it needs to be stopped, only rows from the current batch are rolled back.
3. Disabling Delete triggers
Triggers with cursors can extremely slow down the performance of a delete query. Disabling After delete triggers will considerably increase the query performance.
4. Replacing Update statement with a Bulk-Insert operation
An update statement is a fully logged operation and thus it will certainly take considerable amount of time if millions of rows are to be updated.The fastest way to speed up the update query is to replace it with a bulk-insert operation. It is a minimally logged operation in simple and Bulk-logged recovery model. This can be done easily by doing a bulk-insert in a new table and then rename the table to original one. The required indexes and constraint can be created on a new table as required.
The code below shows how the update can be converted to a bulk-insert operation. It takes 4 seconds to execute.
WHEN 'Dummy' THEN 'D'
END AS col1,
The bulk-insert can then be further optimized to get additional performance boost.
Hope this helps!!!
If you liked the posts, do like us on Facebook at http://www.FaceBook.com/SQLServerGeeks.
Ahmad Osama (Member since: 1/25/2012 1:03:07 PM)
Ahmad Osama is a MCP Database Administrator/Developer, an avid gamer and a chicken lover. Ahmad started his career in the sales industry working as database executive; responsible for report writing, application development and basic database administration. In 2008 he joined World Fashion Exchange as Database Administrator. While in this role he focused on troubleshooting and performance tuning. In 2010 he joined The Perfect Future Technologies and has been awarded as best new comer and expert of the year working as database administrator on one of largest political database in the world.
Ahmad has experience in database administration, SSIS, performance tuning, VLDBs and web development.
When not working on SQL Server, he can be found glued to his Xbox.
View Ahmad Osama 's profile
Good General practices...
I don't think you need to delete the index in question, just DISABLE it. This way you do not have to backup the defination.
Also consider the effect of Transaction Log, with large data modifcations or inserts. Consider the performance impact of Autocommit vs Explicit Commit transaction models. Have a look at http://blogs.msdn.com/b/sqlsakthi/archive/2011/04/17/what-is-writelog-waittype-and-how-to-troubleshoot-and-fix-this-wait-in-sql-server.aspx. Even though his talking about bottle neck on writelog, it still shows interseting point on affects of autocommit on large inserts.
Thanks Mohit - yes..disable will be a better option, missed it. I didn't find significant performance gain with Explicit commit in case of update although it works great for inserts.
I believe you will see a gain in performance if you sort the incoming data by the primary key of the data being updated. I hope this helps.
Hi Dennis - if you are referring to bulk-insert, yes it will increase performance.
Thanks for this post.
For the batch approach, if I need to update a couple of million rows, I then :
- store only the PRIMARY KEY of the tableto get UPDATEd, T1, in another table, T2
- cluster-index that column in T2
- then proceed by taking the TOP X thousand rows ORDER BY the T2 PRIMARY KEY column, UPDATE those in T1, then DELETE those from T2.
This way I can get a constant speed for the UPDATEs ;)
your welcome Nicolas.. only for couple of million rows.. however update+delete will generate lot of t-logs..
Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).
Next to SSIS, there exists also a lightweight more generic and engine focused approach ;-)The concept of SQL Parallel Boost is a native T-SQL based solution an allows a performance gain up to factor 10 (!) in a 8-16 CPU core environment.
The purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The solutioncan also be used to execute multiple SQL statements in parallel initiated thru the same connection.
SQL Parallel Boost improves large volume data modification operations up to factor 10 (!). The free 'Community Edition' of SQL Parallel Boost can be downloaded at http://sqlparallelboost.codeplex.com
Hi Nadia - Please share the execution plan for the given query.. also check how much time does it take to update a single record using the query you mentioned, if it takes time then the query needs to be optimized.
I need to update two columns in three tables, and aprroximate records will be 20K.
So, which approach is better? to use a Stored Procedure or Batch Process?
Thanks! that was indeed quite informative.
Leave a comment