SQL Server – SQL Server has detected an unreported OS/hardware level read or write problem on page (1:74098) of database 10

Dear Friends,

Sometime back while auditing error logs in one of the SQL Server instances I came across following message;

SQL Server has detected an unreported OS/hardware level read or write problem on page (1:74098) of database 10

LSN returned (59991:12571:175), LSN expected (59991:12571:600)

Contact the hardware vendor and consider disabling caching mechanisms to correct the problem

As can be seen on error message itself there is an indication that the LSN value stored in memory isn’t matching with the LSN returned by OS i.e. either data read is not as per what is expected or somehow data written to disk is lost or not written at all. On this note, let’s have a quick look at what Stale Read and Lost Writes are?

A Stale read occurs when SQL Server writes a modified page to storage but storage system returns a different value may be a different version of the page from hardware cache. A Lost write is something when SQL Server modifies a page and writes to disk but this value never stored in storage system so you get to see previous version while read operation. At times you may observe that issue is resolved when system is rebooted as this clears the cache. For us it worked and we involved HW vendor for appropriate checks but should not be considered to be generic solution across all environments.

This may not be detected by CHECKSUM as page is valid based on checksum value but hardware is retuning a different version. To enable additional diagnostics for this type of problems, SQL Server has added TRACE FLAG 818. You can specify TRACE FLAG 818 either as a startup parameter -T818 for the computer that is running SQL Server or by running the following statement:

DBCC TRACEON (818, -1)

   

TRACE FLAG 818 enables an in-memory ring buffer that is used for tracking the last 2,048 successful write operations that are performed by the computer on which you are running SQL Server not including sort and workfile I/Os. When errors such as Error 605, 823 or 3448 occur the incoming buffer’s LSN value is compared to the recent write list. If the LSN that is retrieved during the read operation is older than the one specified during the write operation a new error message is logged in the SQL Server error log. You can refer http://support.microsoft.com/kb/826433 to know more details on this.

From SQL Server 2000 SP4, SQL Server 2005 this logic enhanced the behavior of this trace flag to perform LSN check on every read of a page and to store the LSN in hash table design. This is not default and you need to enable TRACE FLAG 818 explicitly when SQL starts up.

Leave a comment if you faced similar issue in your environments.

 

Regards

Kanchan Bhattacharyya

Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

Follow me on TwitterFollow me on FaceBook

   

About Kanchan Bhattacharyya

Kanchan is an astute IT professional, a seasoned SQL Database Administrator with 13+ years of industry experience. A calculated risk taker with deep technical knowledge and has gained proficiency in database consulting and solutions across different environments. Kanchan holds MBA degree in IT and an International Executive MBA in Project Management. He is very passionate about SQL Server and holds MCSE (Data Platform), MCSA – SQL 2012, MCITP – SQL 2008 certifications. Currently he is focusing on cloud and latest releases of SQL Server. When not working, Kanchan likes to spend his time reading on new technical developments specifically on SQL Server and other related technologies.

View all posts by Kanchan Bhattacharyya →

Leave a Reply

Your email address will not be published.