Sporadic crash of master-slave MySQL replication process
Posted
by obarshay
on Server Fault
See other posts from Server Fault
or by obarshay
Published on 2010-05-14T17:39:10Z
Indexed on
2010/05/14
17:44 UTC
Read the original article
Hit count: 441
mysql
|mysql-replication
Hello,
I was wondering if someone has experienced this and can perhaps provide some insight into this issue.
We have a plan-vanilla MySQL master-slave replication set up. The tables are MyISAM and the master can get quite read/write active. We use the slave instance to perform full daily backups in order to avoid bringing down the master server. The backup process does the following:
STOP SLAVE SQL_THREAD
mysqlhotcopy all tables
START SLAVE SQL_THREAD
Every once in a while (once a month or so) the replication breaks with varying error messages indicating a corrupt query or log file. Here's one that happened last night:
mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: server8.propreports.com
Master_User: nexus8
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: bin.000045
Read_Master_Log_Pos: 581644327
Relay_Log_File: relay.000086
Relay_Log_Pos: 94131
Relay_Master_Log_File: bin.000045
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1064
Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '138070603'£' at line 1' on query. Default database: 'wtsdb'. Query: 'UPDATE fill SET clearing_fee='0.0E id='138070603'£'
Skip_Counter: 0
Exec_Master_Log_Pos: 4164743
Relay_Log_Space: 577574251
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
I follow the following procedure to recover from above error and resume replication:
stop slave;
change master to MASTER_LOG_POS = 4164743, MASTER_LOG_FILE = 'bin.000045';
start slave;
We have multiple servers set up this way and they all sporadically stop replicating with a similar error. Any advice on how to resolve this would be greatly appreciated.
© Server Fault or respective owner