SQL Server 2005 Blocking Problem (ASYNC_NETWORK_IO)

Posted by ivankolo on Stack Overflow See other posts from Stack Overflow or by ivankolo
Published on 2010-03-12T18:46:36Z Indexed on 2010/03/12 18:57 UTC
Read the original article Hit count: 346

Filed under:

I am responsible for a third-party application (no access to source) running on IIS and SQL Server 2005 (500 concurrent users, 1TB data, 8 IIS servers). We have recently started to see significant blocking on the database (after months of running this application in production with no problems). This occurs at random intervals during the day, approximately every 30 minutes, and affects between 20 and 100 sessions each time. All of the sessions eventually hit the application time out and the sessions abort.

The problem disappears and then gradually re-emerges. The SPID responsible for the blocking always has the following features:

  • WAIT TYPE = ASYNC_NETWORK_IO
  • The SQL being run is “(@claimid varchar(15))SELECT claimid, enrollid, status, orgclaimid, resubclaimid, primaryclaimid FROM claim WHERE primaryclaimid = @claimid AND primaryclaimid <> claimid)”. This is relatively innocuous SQL that should only return one or two records, not a large dataset.
  • NO OTHER SQL statements have been implicated in the blocking, only this SQL statement.
  • This is parameterized SQL for which an execution plan is cached in sys.dm_exec_cached_plans.
  • This SPID has an object-level S lock on the claim table, so all UPDATEs/INSERTs to the claim table are also blocked.
  • HOST ID varies. Different web servers are responsible for the blocking sessions. E.g., sometimes we trace back to web server 1, sometimes web server 2.

When we trace back to the web server implicated in the blocking, we see the following:

  • There is always some sort of application related error in the Event Log on the web server, linked to the Host ID and Host Process ID from the SQL Session.
  • The error messages vary, usually some sort of SystemOutofMemory. (These error messages seem to be similar to error messages that we have seen in the past without such dramatic consequences. We think was happening before, but didn’t lead to blocking. Why now?)
  • No known problems with the network adapters on either the web servers or the SQL server.

(In any event the record set returned by the offending query would be small.)

Things ruled out:

  • Indexes are regularly defragmented.
  • Statistics regularly updated.
  • Increased sample size of statistics on claim.primaryclaimid.
  • Forced recompilation of the cached execution plan.
  • Created a compound index with primaryclaimid, claimid.
  • No networking problems.
  • No known issues on the web server.
  • No changes to application software on web servers.

We hypothesize that the chain of events goes something like this:

  1. Web server process submits SQL above.
  2. SQL server executes the SQL, during which it acquires a lock on the claim table.
  3. Web server process gets an error and dies.
  4. SQL server session is hung waiting for the web server process to read the data set.
  5. SQL Server sessions that need to get X locks on parts of the claim table (anyone processing claims) are blocked by the lock on the claim table and remain blocked until they all hit the application time out.

Any suggestions for troubleshooting while waiting for the vendor's assistance would be most welcome.

Is there a way to force SQL Server to lock at the row/page level for this particular SQL statement only? Is there a way to set a threshold on ASYNC_NETWORK_IO waits only?

© Stack Overflow or respective owner

Related posts about sql-server-2005