Suddenly getting lock timeouts with MySQL
- by Marc Hughes
We've got a web app hosted on Amazon Web services. Our database is a multi-az RDS MySQL server running 5.1.57 and 3-4 app servers talk to it.
Today, we started seeing a lot of errors along the lines of "Lock wait timeout exceeded; try restarting transaction" - almost 1% of POST requests are seeing this.
There have been no modifications to the code running on the site. There have been no schema changes. We haven't had a big spike in traffic. I've been looking at the processes running, and none seem out of control.
I tried scaling our RDS instance from a small to a large, with no effect.
Two days ago, Amazon had some outages. As part of the recovery from that, our RDS server, and our app servers ended up in different availability zones, but all within the same region. But yesterday, everything was fine so I'm not convinced that's related.
The lock timeouts are in different types of requests and occur in different InnoDB tables.
I have noticed the number of open connections jumped when we started seeing problems, but they may be a symptom and not a cause.
What are my next steps in debugging this?