Zato "MySQL Server has gone away"

I had the same problem as this topic discuss:

My logs at the morning on one of the nodes, after trying to deploy a service:

2017-05-24 23:38:55,914 - ^[[1;31mERROR^[[0m - 39792:Dummy-1 - gunicorn.main:22 - Exception in worker process: Traceback (most recent call last): File "/opt/zato/2.0.7/code/eggs/gunicorn-18.0-py2.7.egg/gunicorn/arbiter.py", line 494, in spawn_worker self.cfg.post_fork(self, worker) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 864, in post_fork ParallelServer.start_server(worker.app.zato_wsgi_app, arbiter.zato_deployment_key) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 810, in start_server is_first, locally_deployed = parallel_server._after_init_common(server, zato_deployment_key) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 373, in _after_init_common is_first, locally_deployed = self.maybe_on_first_worker(server, self.kvdb.conn, deployment_key) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 269, in maybe_on_first_worker with Lock(lock_name, self.deployment_lock_expires, self.deployment_lock_timeout, redis_conn): File "/opt/zato/2.0.7/code/eggs/retools-0.3-py2.7.egg/retools/lock.py", line 55, in __enter__ if redis.setnx(self.key, expires): File "/opt/zato/2.0.7/code/eggs/redis-2.9.1-py2.7.egg/redis/client.py", line 922, in setnx return self.execute_command('SETNX', name, value) File "/opt/zato/2.0.7/code/eggs/redis-2.9.1-py2.7.egg/redis/client.py", line 461, in execute_command return self.parse_response(connection, command_name, **options) File "/opt/zato/2.0.7/code/eggs/redis-2.9.1-py2.7.egg/redis/client.py", line 471, in parse_response response = connection.read_response() File "/opt/zato/2.0.7/code/eggs/redis-2.9.1-py2.7.egg/redis/connection.py", line 348, in read_response raise response ResponseError: READONLY You can't write against a read only slave. Traceback (most recent call last): File "/opt/zato/2.0.7/code/eggs/gunicorn-18.0-py2.7.egg/gunicorn/arbiter.py", line 494, in spawn_worker self.cfg.post_fork(self, worker) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 864, in post_fork ParallelServer.start_server(worker.app.zato_wsgi_app, arbiter.zato_deployment_key) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 810, in start_server is_first, locally_deployed = parallel_server._after_init_common(server, zato_deployment_key) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 373, in _after_init_common is_first, locally_deployed = self.maybe_on_first_worker(server, self.kvdb.conn, deployment_key) File "/opt/zato/2.0.7/code/zato-server/src/zato/server/base/parallel.py", line 269, in maybe_on_first_worker with Lock(lock_name, self.deployment_lock_expires, self.deployment_lock_timeout, redis_conn): File "/opt/zato/2.0.7/code/eggs/retools-0.3-py2.7.egg/retools/lock.py", line 55, in __enter__ if redis.setnx(self.key, expires): File "/opt/zato/2.0.7/code/eggs/redis-2.9.1-py2.7.egg/redis/client.py", line 922, in setnx "server.log" 48295L, 9358608C result = self._query(query) File "/opt/zato/2.0.7/code/eggs/PyMySQL-0.6.2-py2.7.egg/pymysql/cursors.py", line 271, in _query conn.query(q) File "/opt/zato/2.0.7/code/eggs/PyMySQL-0.6.2-py2.7.egg/pymysql/connections.py", line 725, in query self._execute_command(COM_QUERY, sql) File "/opt/zato/2.0.7/code/eggs/PyMySQL-0.6.2-py2.7.egg/pymysql/connections.py", line 888, in _execute_command self._write_bytes(prelude + sql[:chunk_size-1]) File "/opt/zato/2.0.7/code/eggs/PyMySQL-0.6.2-py2.7.egg/pymysql/connections.py", line 848, in _write_bytes raise OperationalError(2006, "MySQL server has gone away (%r)" % (e,)) OperationalError: (OperationalError) (2006, "MySQL server has gone away (error(32, 'Broken pipe'))") 'SELECT cluster.id AS cluster_id, cluster.name AS cluster_name, cluster.description AS cluster_description, cluster.odb_type AS cluster_odb_type, cluster.odb_host AS cluster_odb_host, cluster.odb_port AS cluster_odb_port, cluster.odb_user AS cluster_odb_user, cluster.odb_db_name AS cluster_odb_db_name, cluster.odb_schema AS cluster_odb_schema, cluster.broker_host AS cluster_broker_host, cluster.broker_port AS cluster_broker_port, cluster.lb_host AS cluster_lb_host, cluster.lb_port AS cluster_lb_port, cluster.lb_agent_port AS cluster_lb_agent_port, cluster.cw_srv_id AS cluster_cw_srv_id, cluster.cw_srv_keep_alive_dt AS cluster_cw_srv_keep_alive_dt \nFROM cluster \nWHERE cluster.id = %s' (1,) ] 2017-05-31 18:58:59,627 - ^[[1;31mERROR^[[0m - 65303:Dummy-36684 - zato.server.service.store:22 - Exception while visit mod:[<module 'general_reload_zato_config' from '/opt/zato/PROD1/server/work/hot-deploy/current/general_reload_zato_config.py'>], is_internal:[False], fs_location:[/opt/zato/PROD1/server/work/hot-deploy/current/general_reload_zato_config.py], e:[Traceback (most recent call last): File "/opt/zato/2.0.7/code/zato-server/src/zato/server/service/store.py", line 261, in _visit_module name, impl_name, is_internal, timestamp, dumps(str(depl_info)), si) TypeError: 'NoneType' object is not iterable ]

On my case I have 3 machines, each has one server with 2 workers. ODB is a single MySQL server on machine3.

I have 6 workers at total and 12 connections to MySQL (I was expecting to have only 6, since the pool is set to 1, maybe I understood something wrong):

+------+------+----------------------------+--------+---------+------+----------+-----------------------+
| Id   | User | Host                       | db     | Command | Time | State    | Info                  |
+------+------+----------------------------+--------+---------+------+----------+-----------------------+
| 4581 | zato | localhost:56319            | zatodb | Sleep   | 2709 |          | NULL                  |
| 4769 | zato | machine3.localdomain:37176 | zatodb | Sleep   | 4613 |          | NULL                  |
| 4770 | zato | machine1.localdomain:11296 | zatodb | Sleep   | 4612 |          | NULL                  |
| 4771 | zato | machine3.localdomain:37177 | zatodb | Sleep   | 4612 |          | NULL                  |
| 4790 | zato | machine1.localdomain:12243 | zatodb | Sleep   | 4613 |          | NULL                  |
| 4791 | zato | machine2.localdomain:40568 | zatodb | Sleep   | 4613 |          | NULL                  |
| 4793 | zato | machine2.localdomain:40573 | zatodb | Sleep   | 4612 |          | NULL                  |
| 4843 | zato | machine3.localdomain:44888 | zatodb | Sleep   |  445 |          | NULL                  |
| 4846 | zato | machine1.localdomain:25336 | zatodb | Sleep   |   21 |          | NULL                  |
| 4847 | zato | machine1.localdomain:25431 | zatodb | Sleep   |  113 |          | NULL                  |
| 4848 | zato | machine2.localdomain:46295 | zatodb | Sleep   |   25 |          | NULL                  |
| 4849 | zato | machine3.localdomain:45448 | zatodb | Sleep   |   25 |          | NULL                  |
| 4850 | zato | machine2.localdomain:46376 | zatodb | Sleep   |   21 |          | NULL                  |
| 4851 | root | localhost                  | NULL   | Query   |    0 | starting | show full processlist |
+------+------+----------------------------+--------+---------+------+----------+-----------------------+

Is the autoping issue still present on 2.0.7 release? Am I doing something wrong? Besides the workaround on the other topic, is there anything I can do to avoid it? The autoping fix provided on the other topic helps, but if my service auto restarts (or the cluster is restarted for any reason), I would need to execute it again, which is not perfect.

It seems to affect more the singleton machine and my low execution volume is not helping mitigate the issue. It’s not a major issue but it would be good to fix this anyway.

Thanks!
[]'s