SQL Server Job Performing Completion Actions Blocked: An Interesting case

Hello,

Recently we had an issue with chain blocking and most of the jobs are complete but do not finish. When observed in Job Activity Monitor most of the jobs say “SQL Server Job Performing Completion Actions”. This doesn’t give you any idea about what is the job doing in the backend to finish. When checked in sys.sysprocesses table we will observe that the blocking and the blocker statements are coming from the program “SQLAgent – Job Manager”. Again this doesn’t tell you which job is being blocked and blocking. The interesting things to note will be the sql_handle from the sys.sysprocesses when the blocking is happening.

So let’s start with querying sys.sysprocesses. You will observe the lastwaittype to be LCK_M_X and the waitresource is TAB: 4:xxxxxxxxx:xx. When you query for this object name in MSDB database (DB ID 4 from wait resource) it will return ‘sysjobhistory’.

1_An_Interesting_case_of_Performing_Job_Completion_Action_Blocked

2_An_Interesting_case_of_Performing_Job_Completion_Action_Blocked

It is evident that the blocking is happening when trying to clear the history of the job once it is completed. When we check the sql_handle and get the statement it is actually running the procedure “sp_jobhistory_row_limiter”.

3_An_Interesting_case_of_Performing_Job_Completion_Action_Blocked

Interesting? Not yet. When we check the code of this procedure it actually runs the below statement where it holds an exclusive table lock (TABLOCKX) in a TRANSACTION which is more concerning.

BEGIN TRANSACTION     
SELECT @current_rows_per_job = COUNT(*)    
FROM msdb.dbo.sysjobhistory with (TABLOCKX)    
WHERE (job_id = @job_id)      
-- Delete the oldest history row(s) for the job being inserted if the new row has    
-- pushed us over the per-job row limit (MaxJobHistoryRows)    
SELECT @rows_to_delete = @current_rows_per_job - @max_rows_per_job      
IF (@rows_to_delete > 0)    
BEGIN     
WITH RowsToDelete AS (        
SELECT TOP (@rows_to_delete) *        
FROM msdb.dbo.sysjobhistory        
WHERE (job_id = @job_id)        
ORDER BY instance_id      
)      
DELETE FROM RowsToDelete;    
END     
 
-- Delete the oldest history row(s) if inserting the new row has pushed us over the    
-- global MaxJobHistoryTableRows limit.    
SELECT @current_rows = COUNT(*)    
FROM msdb.dbo.sysjobhistory      
SELECT @rows_to_delete = @current_rows - @max_total_rows      
IF (@rows_to_delete > 0)    
BEGIN     
WITH RowsToDelete AS (        
SELECT TOP (@rows_to_delete) *        
FROM msdb.dbo.sysjobhistory        
ORDER BY instance_id      
)      
DELETE FROM RowsToDelete;    
END     
IF (@@trancount > 0)      
COMMIT TRANSACTION

So if we observe there are two parameters which are needed in above code.

  1. @max_rows_per_job
  2. @max_total_rows

Both these values are actually what we set on the agent properties -> history tab.

   

4_An_Interesting_case_of_Performing_Job_Completion_Action_Blocked

You can also check these values from registry from below keys. Actually this is where the procedure reads the values from.

5_An_Interesting_case_of_Performing_Job_Completion_Action_Blocked

Now we understand what is actually happening in the back ground when the job is performing completion action.

But our question is still not answered. Why are we observing huge blocking only now if this is default behavior? So let’s go back to the table on which this is happening, sysjobhistory. By default there will be two indexes on this table.

6_An_Interesting_case_of_Performing_Job_Completion_Action_Blocked

Now we understand what is actually happening in the back ground when the job is performing completion action.

But our question is still not answered. Why are we observing huge blocking only now if this is default behavior? So let’s go back to the table on which this is happening, sysjobhistory. By default there will be two indexes on this table.

When we checked the index fragmentation on this table the non-clustered index nc1 was 98% fragmented. The solution is simple. We have to do an ONLINE REBUILD of the index. But how did we got there? On further analysis we came to the root cause. This SQL Server Agent was not set with any history limits initially when the system was setup. So the sysjobhistory had piled up and when we changed the row limit settings per job and total limit, it started cleaning old data and led to the fragmentation. SO before I end this blog post lets summarize on few points that can help us avoid such problems.

  1. Always limit the rows per job and total rows for all jobs.
  2. Have a maintenance job to check and rebuild/reorganize indexes on the sysjobhistory table.

Happy Learning 🙂

Regards

Manohar Punna

Like us on FaceBook Join the fastest growing SQL Server group on FaceBook

Follow me on TwitterFollow me on FaceBook

   

About Manohar Punna

Manohar Punna is a Microsoft Data Platform Consultant and a Data Platform MVP. In his day to day job he works on building database and BI systems, automation, building monitoring systems, helping customers to make value from their data & explore and learn. By passion Manohar is a blogger, speaker and Vice President of DataPlatformGeeks. He is a community enthusiast and believes strongly in the concept of giving back to the community. Manohar is a speaker at various Data Platform events from SQL Server Day, various user groups, SQLSaturdays, SQLBits, MS Ignite, SSGAS & DPS. His One DMV a Day series is the longest one day series on any topic related to SQL Server available so far.

View all posts by Manohar Punna →

Leave a Reply

Your email address will not be published.