Sunday, March 29, 2015

Production Support Incident 1. SQL Server Suspended Transaction And IO Wait Issue

This is the most critical findings when there is issue with application downtime.

Application: CMS System- Content Management System

Technology: Custom Asp.net

Scenario: For any CMS system , the caching plays a very essential role. To improve overall user experience and responsive of the system , as a thumbrule and architecure design norms the CMS system should be always initialized by caching. The system content is cached one time so that there is no more chatty communication with SQL server or for that matter with database. This is important as the content in CMS website public facing internet website most of the content is global and applicable for all users. In such scenario the best practice is to cache the content and most common element one time during overall website lifecyle throughout a day.

So when we consider caching below set of design principle must be taken care:-
Life cycle of caching-Age of caching
Frequency and timeline when Business user changing content so that the changes reflects during business as usual.
Warm up caching option in IIS to reduce overall users impact on cache expiration.
Importantly the amount of data cached .Impact on w3p process in IIS, CPU utilization and heap memory fot which sql query is executed.
Mission critical application keep logic outside of application layer..keep it in database for quick fix and resolution . If logic is embedded in business logic with linq query within application layer, Consider hugh business impact and application downtime.

1. Quick Checks:

USE master;
GO
EXEC sp_who2 'active';
GO

If there is suspended transaction SPID then there is serious problem. if suspended transaction is not getting clear within 10 secs then there is potential issue with memory or execution completion of query
2. Quick Checks async_network_io wait in sql server
http://blogs.msdn.com/b/joesack/archive/2009/01/09/troubleshooting-async-network-io-networkio.aspx

3. Quick Checks Page latch above 20
http://blogs.msdn.com/b/askjay/archive/2011/07/08/troubleshooting-slow-disk-i-o-in-sql-server.aspx
http://www.jasonstrate.com/2010/09/index-black-ops-part-2-page-io-latch-page-latch/

SELECT session_id, wait_type, resource_description FROM sys.dm_os_waiting_tasksWHERE wait_type LIKE 'PAGELATCH

Resolution:
Either Optimize query
Or Increase RAM of Sql server OS box.

No comments :