I was recently asked to set up Transactional Replication comprising of 29 publishers as well as same number of subscribers. One interesting challenge was that the size of the distribution database suddenly increased from few MB’s to around 62 GB and as a result the disk on which the production database was hosted quickly filled up. The DML operations on the tables on the replicated database was continuous. After investigating this, I noticed there was a table named MSRepl_Commands present in the distribution database consisting of a large number of records which was the major culprit for the drastic increase in size. Going deeper, I also found that a job named Distribution clean up: distribution was taking a considerably long time to execute (i.e. around 4 hrs and 35 minutes) which was very strange.
There are 2 jobs in SQL Server which are responsible for the cleanup of tables present in the distribution database:
- Agent history cleanup: distribution
- Distribution clean up:distribution
These two jobs can be found by expanding the Jobs folder present under SQL Server Agent node in the SQL Server Management Studio.
The job named Distribution clean up: distribution itself has one stored procedure namedsp_MSdistribution_cleanup executing internally which had two parameters , namely @min_distretentionand @max_distretention.
I decided to manually execute this stored procedure against the distribution database by specifying the values of @min_distretention and @max_distretention using the syntax below.
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72
Before executing the stored procedure, I manually disabled both the above jobs otherwise there would be a possibility of some deadlock occurring.
The stored procedure took about 40 minutes to execute, after executing it output the below message:
Removed 0 replicated transactions consisting of 8771 statements in 2171 seconds (4 rows/sec)
The main issue why the distribution database was getting larger and larger in size was that the clean up job was not removing the replicated commands from MSRepl_Commands table. This table was around 26 GB in size with around 6.5 million records in it.
I then had a look at the retention period of the publisher and identified that it was set to Never Expire, as per the below screen capture:
This was applicable to all the 29 publishers.
I then changed the retention period as 480 hours i.e. 20 days and then executed the stored procedure namedsp_MSdistribution_cleanup against the distribution database as shown below:
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 480
Once the above query was executed, the output was as follows:
I then checked the MSRepl_Commands table and found that the number of rows decreased from 6.45 million to 2.5 million. I then manually shrunk the mdf file of the database involved in replication by around 28 GB and the free space available on the disk grew to around 54 GB. I also changed the recovery model of the database to Simple to minimize the log file growth. The cleanup job now takes a few minutes to execute. After performing these tasks, to date there hasn’t been any issue with the size of the distribution database and the replication as a whole is working fine.
Many Thanks to all the readers for giving their valuable time in reading the blog post.My sincere thanks to Amit Bansal and Sarabpreet Singh for always motivating me to present some exciting scenarios to the SQL Server community. My sincere thanks to the entire iPlus Support team at L&T infotech, their support and motivation have always helped me to progress a lot.
Regards
Satnam Singh
Like us on FaceBook | Follow us on Twitter
Follow me on FaceBook| Join the fastest growing SQL Server group on FaceBook
Admirable one …
Good reverse Engineered troubleshooting, Especially when the DB is a production one!