SQL Server : Optimizing Update Queries for Large Data Volumes

Who is online?  135 guests and 0 members
home  »  blogs  »  Ahmad Osama  »  SQL Server : Optimizing Update Queries for Large Data Volumes
  Rate This Blog Entry:  register  or  login

Author

AhmadOsama Ahmad Osama (Member since: 1/25/2012 1:03:07 PM)
Ahmad Osama is a MCP Database Administrator/Developer, an avid gamer and a chicken lover. Ahmad started his career in the sales industry working as database executive; responsible for report writing, application development and basic database administration. In 2008 he joined World Fashion Exchange as Database Administrator. While in this role he focused on troubleshooting and performance tuning. In 2010 he joined The Perfect Future Technologies and has been awarded as best new comer and expert of the year working as database administrator on one of largest political database in the world. Ahmad has experience in database administration, SSIS, performance tuning, VLDBs and web development. When not working on SQL Server, he can be found glued to his Xbox.

View Ahmad Osama 's profile

Comments (13)

Mohit
2/6/2012 4:53:13 PM Mohit said:

Good General practices...

I don't think you need to delete the index in question, just DISABLE it.  This way you do not have to backup the defination.

Also consider the effect of Transaction Log, with large data modifcations or inserts.  Consider the performance impact of Autocommit vs Explicit Commit transaction models.  Have a look at http://blogs.msdn.com/b/sqlsakthi/archive/2011/04/17/what-is-writelog-waittype-and-how-to-troubleshoot-and-fix-this-wait-in-sql-server.aspx.  Even though his talking about bottle neck on writelog, it still shows interseting point on affects of autocommit on large inserts.

by
AhmadOsama
2/7/2012 7:11:09 AM Ahmad Osama said:

Thanks Mohit - yes..disable will be a better option, missed it. I didn't find significant performance gain with Explicit commit in case of update although it works great for inserts.

Thanks,

Ahmad

by
Dennis Namovice
2/7/2012 3:28:15 PM Dennis Namovice said:

I believe you will see a gain in performance if you sort the incoming data by the primary key of the data being updated.  I hope this helps.

by
AhmadOsama
2/7/2012 4:32:44 PM Ahmad Osama said:

Hi Dennis - if you are referring to bulk-insert, yes it will increase performance.

Thanks,

Ahmad

by
Nicolas Souquet
2/18/2012 6:54:14 PM Nicolas Souquet said:

Thanks for this post.

For the batch approach, if I need to update a couple of million rows, I then :

- store only the PRIMARY KEY of the tableto get UPDATEd, T1, in another table, T2

- cluster-index that column in T2

- then proceed by taking the TOP X thousand rows ORDER BY the T2 PRIMARY KEY column, UPDATE those in T1, then DELETE those from T2.

This way I can get a constant speed for the UPDATEs ;)

by
AhmadOsama
2/19/2012 8:34:16 AM Ahmad Osama said:

your welcome Nicolas.. only for couple of million rows.. however update+delete will generate lot of t-logs.. 

by
Michael Brönnimann
3/22/2012 6:40:06 AM Michael Brönnimann said:

Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).

Next to SSIS, there exists also a lightweight more generic and engine focused approach ;-)
The concept of SQL Parallel Boost is a native T-SQL based solution an allows a performance gain up to factor 10 (!) in a 8-16 CPU core environment.

The purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The solutioncan also be used to execute multiple SQL statements in parallel initiated thru the same connection.

SQL Parallel Boost improves large volume data modification operations up to factor 10 (!).
The free 'Community Edition' of SQL Parallel Boost can be downloaded at http://sqlparallelboost.codeplex.com

 

by
Michael Brönnimann
3/22/2012 6:40:56 AM Michael Brönnimann said:

Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).

Next to SSIS, there exists also a lightweight more generic and engine focused approach ;-)
The concept of SQL Parallel Boost is a native T-SQL based solution an allows a performance gain up to factor 10 (!) in a 8-16 CPU core environment.

The purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no 'external' components like SSIS involved, Furthermore it's the best performing solution regarding task splitting and synchronization, as it hasn't potential connection and communication overhead. The solutioncan also be used to execute multiple SQL statements in parallel initiated thru the same connection.

SQL Parallel Boost improves large volume data modification operations up to factor 10 (!).
The free 'Community Edition' of SQL Parallel Boost can be downloaded at http://sqlparallelboost.codeplex.com

 

by
nadia
6/5/2013 7:42:43 PM nadia said:
I want to update a huge table with more than 2000000 row .
here is my code :
 
DECLARE @i INTEGER
DECLARE @TotalNum Int
 
DECLARE @Step int
 
select @Step = round((select count(*) from T1) / 500000,0)
 
set @i=1
 
WHILE( @i <=@Step)
 
BEGIN
Update T2
 
set SellingPrice = A.maxprice
from T2 inner join
 
(SELECT MAX(SellingPrice) AS maxprice, Product, Color, Size, DimensionCode, WarehouseCode
 
FROM T1
 
GROUP BY Product, Color, Size, DimensionCode, WarehouseCode) as A
 
on A.Product = T2.Product
and A.color = T2.Color
 
and A.Size=T2.Size
 
and A.DimensionCode = T2.DimensionCode
 
and A.WarehouseCode = T2.WarehouseCode
 
where T2.SellingPrice =0 or T2.SellingPrice is null
SET @i=@i + 1
 
END
 
 
but this code is still slow .
do you have any idea
 
 
 
 
 
 
 
by
AhmadOsama
6/5/2013 10:34:43 PM Ahmad Osama said:

Hi Nadia - Please share the execution plan for the given query.. also check how much time does it take to update a single record using the query you mentioned, if it takes time then the query needs to be optimized.

Thanks,

Ahmad Osama

by
megha
6/24/2013 7:20:31 AM megha said:

Hi Ahmad,

I need to update two columns in three tables, and aprroximate records will be 20K.

So, which approach is better? to use a Stored Procedure or Batch Process?

by
AhmadOsama
6/24/2013 5:20:56 PM Ahmad Osama said:
Hi Megha - you can use any of the above method whether in a procedure or as adhoc parameterised sql query. the batch process can be easily wrapped into a procedure.
by
ankushchauhan
8/8/2013 1:30:08 PM Ankush Chauhan said:

Thanks! that was indeed quite informative.

by

Leave a comment

Name:
  *  
Email:  (your email is kept secure)
 *  
Website:

example: "http://www.SQLServerGeeks.com"

 
Type the characters:
 *
 
   

SQLServerGeeks FaceBook group

SQLServerGeeks Events

Training on Microsoft Products & Technologies

Email Subscriptions

   Get the Most Recent Blogs in your inbox
Blogs RSS Feed

Ahmad Osama's latest blog posts

  • I have written a SQL Server Change Data Capture GUI interface to facilitate for CDC operations. The project is hosted here https://sqlcdcapp.codeplex.com/ . Feel free to download, review and suggest c...
  • Recently I came across an interesting deadlock scenario I wasn’t aware off. I didn’t have idea that foreign key constraint can also result in deadlocks. The detailed steps to replicate the...
  • Migrating databases between sql instances is a time taking and a tedious task which becomes even more complicated when one has more than 10-12 databases to migrate. This PowerShell module aims to auto...
  • SQLServerGeeks.com started 2014 in Delhi with a refreshing dose of SQL Server on 11 th January 2014 at yet another awesome SQL Server Day, witnessing over 100+ attendees. The good thing about SQL Serv...
  • 26 th October 2013, 3 Cities – Delhi,Hyderabad and Bangalore, 6+ speakers, 500 + SqlServerGeeks turned up to make history. Yes, first time ever in India’s SQL Server Community history, the...
  • This is an update on DB-Migrate Module. The change log is mentioned below. Changes in DB-Migrate V4 on 27th Dec 2012 Version Check. Move-Database function now checks for Source and Destination version...
  • This is an update on DB-Migrate Module. The change log is mentioned below. Changes in DB-Migrate V3 on 25th Dec 2012 Reviewed by Aman Dhally – Powershell MVP. Thanks a lot Aman for review and yo...
  • I came across a scenario where in I had to move database from SQL Server 2008R2 to SQL Server 2005. As most of you must be aware that Detach/Attach and Backup/Restore doesn’t works in this case....
  • Hi Guys - here's a quickie Recently, I came across a scenario where in I was informed that a table doesn’t has a name; yes, the sys.objects table returned every other detail of the table, howeve...
  • One of my friends had a problem altering a column data type and asked me for a solution. The problem statement is given below. He had a table test defined as below CREATE table test ( Col1 varchar(100...
  • Here's a T-SQL to send automated email alerts whenever a Transaction log size exceeds a specified threshold. SET NOCOUNT ON DECLARE @threshold int=5 -- step 1: Create temp table and record sqlperf dat...
  • Recently, I came across a situation where on one of the server I manage, there was a full text query which took hours to complete and many times caused overall system slowness. It was difficult to mon...
  • A few days back I was resolving a memory issue which caused sql server to stop responding. I was able to figure out the issue by looking into DBCC MemoryStatus output. It was the full text search whic...
  • This is an update to the DB-Migrate ( http://www.sqlservergeeks.com/blogs/AhmadOsama/personal/797/db-migrate-a-powershell-module-to-migrate-databases ) powershell module to migrate database between sq...
  • This blog is part of the series The TSQL Classes: Stored Procedure The links to previous post are given below. http://sqlservergeeks.com/blogs/AhmadOsama/personal/762/the-tsql-classes-stored-procedure...
  • Recently I wrote powershell scripts to move databases, logins and SQL Agent Jobs between instances. Another one I wrote was to fix orphan users. I have now combined all of these different functions in...
  • This blog is part of the series The TSQL Classes: Stored Procedure The link to previous post are given below. http://sqlservergeeks.com/blogs/AhmadOsama/personal/762/the-tsql-classes-stored-procedures...
  • This blog is part of the series The TSQL Classes: Stored Procedure The link to previous post are given below. http://www.sqlservergeeks.com/blogs/AhmadOsama/personal/764/the-tsql-classes-stored-proced...
  • Migrating jobs from one server to other can be easily done with powershell. Let’s look into the script. At first, include reference to the relevant SMO assemblies and then create the source and ...
  • Here's a simple powershell script to shrink log files across database. Shrinking T-log isn't a good practise, however at times we might need to release space to OS in development/UAT servers. The scri...
  • One of the major migration activities is to fix orphan users. Though, it can be easily fixed by sp_change_users_login procedure, however what if you need to do this for multiple servers. A powershell ...
  • This blog is part of the series The TSQL Classes: Stored Procedure. http://www.sqlservergeeks.com/blogs/AhmadOsama/personal/764/the-tsql-classes-stored-procedures-nested-proc-recursion-and-plan-cachin...
  • It’s sometime needed to move database between instances as in when migrating to a different instance. Let’s see how this can be achieved with powershell. This script assumes that both the ...
  • Moving logins between instances is one of the critical migration activities. Let’s see how it can be done in powershell with the help of SMO.When transferring logins, two things need to be looke...
  • Recently I was involved in a project where in non-clustered indexes were to be moved to a different file group for the sake of performance benefit. It has to be done for multiple databases. I came up ...
Blogs RSS Feed

Latest community blog posts

  • Hi Friends, In this blog post, let’s have an insight on how Auditing can be done in SSIS packages? The basic auditing in package includes measures like How many rows were inserted, updated or de...
  • Hi SQL Geeks, Here are the blog posts by Piyush Bajaj for the month of May. We would expect more of his informative blogs this month. Happy Learning! If you like our blogs do like us on Facebook at ht...
  • Hi SQL Geeks, Here are the blog posts by Ahmad Osama for the month of May. We would expect more of his informative blogs this month. Happy Learning! If you like our blogs do like us on Facebook at htt...
  • Hi Friends, Are you facing any issue on SQL Server 2014 while running the same query which is running fine on previous version of SQL Server? If answer is yes then this may be due to the change in SQL...
  • I have written a SQL Server Change Data Capture GUI interface to facilitate for CDC operations. The project is hosted here https://sqlcdcapp.codeplex.com/ . Feel free to download, review and suggest c...
  • Recently I came across an interesting deadlock scenario I wasn’t aware off. I didn’t have idea that foreign key constraint can also result in deadlocks. The detailed steps to replicate the...
  • Hi Friends, In my earlier blog, I just explain about the importance of filtered indexes. Link for that blog is mention below: http://sqlservergeeks.com/blogs/princerastogi/personal/599/filtered-indexe...
  • A few words about the technology of our site. It's easy to take technology for granted, to focus on content and pay no attention to how it is being delivered. Typically we would encourage this focus. ...