SQL Server : Optimizing Update Queries for Large Data Volumes

Who is online?  106 guests and 0 members
home  »  blogs  »  Ahmad Osama  »  SQL Server : Optimizing Update Queries for Large Data Volumes
  Rate This Blog Entry:  register  or  login

Author

AhmadOsama Ahmad Osama (Member since: 1/25/2012 1:03:07 PM)
Ahmad Osama is a MCP Database Administrator/Developer, an avid gamer and a chicken lover. Ahmad started his career in the sales industry working as database executive; responsible for report writing, application development and basic database administration. In 2008 he joined World Fashion Exchange as Database Administrator. While in this role he focused on troubleshooting and performance tuning. In 2010 he joined The Perfect Future Technologies and has been awarded as best new comer and expert of the year working as database administrator on one of largest political database in the world. Ahmad has experience in database administration, SSIS, performance tuning, VLDBs and web development. When not working on SQL Server, he can be found glued to his Xbox.

View Ahmad Osama 's profile

Comments (13)

Mohit
2/6/2012 4:53:13 PM Mohit said:

Good General practices...

I don't think you need to delete the index in question, just DISABLE it.  This way you do not have to backup the defination.

Also consider the effect of Transaction Log, with large data modifcations or inserts.  Consider the performance impact of Autocommit vs Explicit Commit transaction models.  Have a look at http://blogs.msdn.com/b/sqlsakthi/archive/2011/04/17/what-is-writelog-waittype-and-how-to-troubleshoot-and-fix-this-wait-in-sql-server.aspx.  Even though his talking about bottle neck on writelog, it still shows interseting point on affects of autocommit on large inserts.

by
AhmadOsama
2/7/2012 7:11:09 AM Ahmad Osama said:

Thanks Mohit - yes..disable will be a better option, missed it. I didn't find significant performance gain with Explicit commit in case of update although it works great for inserts.

Thanks,

Ahmad

by
Dennis Namovice
2/7/2012 3:28:15 PM Dennis Namovice said:

I believe you will see a gain in performance if you sort the incoming data by the primary key of the data being updated.  I hope this helps.

by
AhmadOsama
2/7/2012 4:32:44 PM Ahmad Osama said:

Hi Dennis - if you are referring to bulk-insert, yes it will increase performance.

Thanks,

Ahmad

by
Nicolas Souquet
2/18/2012 6:54:14 PM Nicolas Souquet said:

Thanks for this post.

For the batch approach, if I need to update a couple of million rows, I then :

- store only the PRIMARY KEY of the tableto get UPDATEd, T1, in another table, T2

- cluster-index that column in T2

- then proceed by taking the TOP X thousand rows ORDER BY the T2 PRIMARY KEY column, UPDATE those in T1, then DELETE those from T2.

This way I can get a constant speed for the UPDATEs ;)

by
AhmadOsama
2/19/2012 8:34:16 AM Ahmad Osama said:

your welcome Nicolas.. only for couple of million rows.. however update+delete will generate lot of t-logs.. 

by
Michael Brönnimann
3/22/2012 6:40:06 AM Michael Brönnimann said:

Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).

Next to SSIS, there exists also a lightweight more generic and engine focused approach ;-)
The concept of SQL Parallel Boost is a native T-SQL based solution an allows a performance gain up to factor 10 (!) in a 8-16 CPU core environment.

The purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The solutioncan also be used to execute multiple SQL statements in parallel initiated thru the same connection.

SQL Parallel Boost improves large volume data modification operations up to factor 10 (!).
The free 'Community Edition' of SQL Parallel Boost can be downloaded at http://sqlparallelboost.codeplex.com

 

by
Michael Brönnimann
3/22/2012 6:40:56 AM Michael Brönnimann said:

Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).

Next to SSIS, there exists also a lightweight more generic and engine focused approach ;-)
The concept of SQL Parallel Boost is a native T-SQL based solution an allows a performance gain up to factor 10 (!) in a 8-16 CPU core environment.

The purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The solutioncan also be used to execute multiple SQL statements in parallel initiated thru the same connection.

SQL Parallel Boost improves large volume data modification operations up to factor 10 (!).
The free 'Community Edition' of SQL Parallel Boost can be downloaded at http://sqlparallelboost.codeplex.com

 

by
nadia
6/5/2013 7:42:43 PM nadia said:
I want to update a huge table with more than 2000000 row .
here is my code :
 
DECLARE @i INTEGER
DECLARE @TotalNum Int
 
DECLARE @Step int
 
select @Step = round((select count(*) from T1) / 500000,0)
 
set @i=1
 
WHILE( @i <=@Step)
 
BEGIN
Update T2
 
set SellingPrice = A.maxprice
from T2 inner join
 
(SELECT MAX(SellingPrice) AS maxprice, Product, Color, Size, DimensionCode, WarehouseCode
 
FROM T1
 
GROUP BY Product, Color, Size, DimensionCode, WarehouseCode) as A
 
on A.Product = T2.Product
and A.color = T2.Color
 
and A.Size=T2.Size
 
and A.DimensionCode = T2.DimensionCode
 
and A.WarehouseCode = T2.WarehouseCode
 
where T2.SellingPrice =0 or T2.SellingPrice is null
SET @i=@i + 1
 
END
 
 
but this code is still slow .
do you have any idea
 
 
 
 
 
 
 
by
AhmadOsama
6/5/2013 10:34:43 PM Ahmad Osama said:

Hi Nadia - Please share the execution plan for the given query.. also check how much time does it take to update a single record using the query you mentioned, if it takes time then the query needs to be optimized.

Thanks,

Ahmad Osama

by
megha
6/24/2013 7:20:31 AM megha said:

Hi Ahmad,

I need to update two columns in three tables, and aprroximate records will be 20K.

So, which approach is better? to use a Stored Procedure or Batch Process?

by
AhmadOsama
6/24/2013 5:20:56 PM Ahmad Osama said:
Hi Megha - you can use any of the above method whether in a procedure or as adhoc parameterised sql query. the batch process can be easily wrapped into a procedure.
by
ankushchauhan
8/8/2013 1:30:08 PM Ankush Chauhan said:

Thanks! that was indeed quite informative.

by

Leave a comment

Name:
  *  
Email:  (your email is kept secure)
 *  
Website:

example: "http://www.SQLServerGeeks.com"

 
Type the characters:
 *
 
   

SQLServerGeeks FaceBook group

SQLServerGeeks Events

Training on Microsoft Products & Technologies

Email Subscriptions

   Get the Most Recent Blogs in your inbox
Blogs RSS Feed

Ahmad Osama's latest blog posts

Blogs RSS Feed

Latest community blog posts

  • Hi Friends, In my earlier blog, I just explain about the importance of filtered indexes. Link for that blog is mention below: http://sqlservergeeks.com/blogs/princerastogi/personal/599/filtered-indexe...
  • A few words about the technology of our site. It's easy to take technology for granted, to focus on content and pay no attention to how it is being delivered. Typically we would encourage this focus. ...
  • Hello, Constraints can be defined as rules that govern the values that are inserted into a column. These rules help in enforcing data integrity. For example if a NOT NULL constraint is defined over a ...