(Migrated) About hot-deploy problem

(This message has been automatically imported from the retired mailing list)

Hi all,

     Its me again, we usually use https://zato.io/docs/admin/guide/installing-services.html#admin-installing-services-service-sources-txt  (Hot-deploying from command line) to hot deploy our services. But recently we found it didnt work ok. When we copied the services to /opt/zato/genscript_esb_nj_prod/server1/pickup-dir, nothing happened. I dont know whats wrong with our system, have you all met this problem?

Thanks,
wangxi

=====================================================================
GenScript -

On 08/03/2016 09:02, Xi Wang wrote:

     It=E2=80=99s me again, we usually use=20

https://zato.io/docs/admin/guide/installing-services.html#admin-install=
ing-services-service-sources-txt=20
(=E2=80=9CHot-deploying from command line=E2=80=9D) to hot deploy our s=
ervices. But=20
recently we found it didn=E2=80=99t work ok. When we copied the service=
s to=20
/opt/zato/genscript_esb_nj_prod/server1/pickup-dir, nothing happened.=20
I don=E2=80=99t know what=E2=80=99s wrong with our system, have you all=
met this problem?

I found it doesn’t work if you try to hot-deploy while the server is=20
restarting
(or if you stop the server, copy to pickup-dir, then start=20
the server). I work around this by looking at “top” and waiting for the=20
gunicorn processes to settle down first.

But otherwise, you should get some logs. Do you see anything in:

/opt/zato/genscript_esb_nj_prod/server1/logs/servers1.log
?

We set the level of log “INFO”, and I’ve seen nothing in server’s logs.

Because that system is product system, so I won’t restart zato at will before. I restarted zato just now, and found that hot-deploying is ok now.

I doubt if there was some processes was hanging and caused that problem.

Thanks,
wangxi

发件人: Brian Candler [mailto:b.candler@pobox.com]
发送时间: 2016年3月8日 17:35
收件人: Xi Wang xi.wang@genscript.com; zato-discuss@lists.zato.io
主题: Re: [Zato-discuss] About hot-deploy problem

On 08/03/2016 09:02, Xi Wang wrote:
It’s me again, we usually use https://zato.io/docs/admin/guide/installing-services.html#admin-installing-services-service-sources-txt (“Hot-deploying from command line”) to hot deploy our services. But recently we found it didn’t work ok. When we copied the services to /opt/zato/genscript_esb_nj_prod/server1/pickup-dir, nothing happened. I don’t know what’s wrong with our system, have you all met this problem?
I found it doesn’t work if you try to hot-deploy while the server is restarting (or if you stop the server, copy to pickup-dir, then start the server). I work around this by looking at “top” and waiting for the gunicorn processes to settle down first.

But otherwise, you should get some logs. Do you see anything in:

/opt/zato/genscript_esb_nj_prod/server1/logs/servers1.log
?


敬告各收件方:在享有法律赋予的特权和在其他方面披露受到保护的前提下,本通信所载或其随附的信息属于保密信息,只供指定接收方使用。如阁下并非本通信指明的接收方,请将阁下拥有的本通信及其所有备份(包括所有附件)删除及销毁,以及通知发件方阁下错误收到本通信,在此特提请阁下注意不得阅览或分发本通信,并不得基于对本通信的倚赖而采取任何行动。 Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this e-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the content of this e-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.

Op Tue, 8 Mar 2016 09:34:32 +0000
schreef Brian Candler b.candler@pobox.com:

I found it doesn’t work if you try to hot-deploy while the server is
restarting
(or if you stop the server, copy to pickup-dir, then
start the server). I work around this by looking at “top” and waiting
for the gunicorn processes to settle down first.

Indeed. I assumed until now that hot-deploy always picks up new files,
but my own test also shows it only does so when the server is running
properly. The same goes for synchronizing. I guess that’s why it’s
called “hot”. So when one of the servers is down for maintenance you
have to remember to deploy to it when you get it back up if it’s behind
on the other servers.

On 09/03/16 23:06, Sam Geeraerts wrote:

Hello Sam,

there are two cases here.

I assumed until now that hot-deploy always picks up new files,
but my own test also shows it only does so when the server is running
properly.

One is that if a server actually is running then hot-deploying against
it is obviously fine and safe.

The same goes for synchronizing. I guess that’s why it’s
called “hot”. So when one of the servers is down for maintenance you
have to remember to deploy to it when you get it back up if it’s behind
on the other servers.

In this case, if a server is fully down then hot-deploying services to
it is actually not needed.

The next time that server boots up it will synchronize its set of
services from other members of the cluster - you don’t need to remember
about it yourself.

Op Thu, 10 Mar 2016 14:29:30 +0100
schreef Dariusz Suchojad dsuch@zato.io:

The next time that server boots up it will synchronize its set of
services from other members of the cluster - you don’t need to
remember about it yourself.

I did another test: take 1 server down, hot-deploy to the running one,
bring the other back up and check active services. See output attached.
I find that the server that was down keeps the older version of the
service. Or am I interpreting these results incorrectly?

On 16/03/16 00:16, Sam Geeraerts wrote:

I did another test: take 1 server down, hot-deploy to the running one,
bring the other back up and check active services. See output attached.
I find that the server that was down keeps the older version of the
service. Or am I interpreting these results incorrectly?

Hi Sam,

are you essentially deploying the same service while server2 is down?

If so then this won’t work indeed - they need to be new services for the
synchronization mechanism to kick in.

The reason is that what should server3 do if while it was shut down
server1 and server2 both deployed different versions of the same service?

On 16/03/2016 21:06, Sam Geeraerts wrote:

If we can assume that the version with the latest timestamp is the one
that should be active, then it doesn’t matter if a server is down. When
a server comes up it can negotiate service timestamps with the other
servers and then get the latest version of everything.

Is there a good reason to not make that assumption?

“Most recently deployed” is probably a sane approach here, but it means
when a server comes up it would have to query (at least) one other
server to see if there is newer code available for all deployed services.

You can break this by, e.g.

  • shutting down S2
  • deploying new code to S1
  • shutting down S1
  • restarting S2

The full-fat way of doing this would be a consensus protocol like Raft,
but that has its own consequences (e.g. in a 2-server cluster, you can’t
deploy code when one of the servers is down)

Op Wed, 16 Mar 2016 00:29:34 +0100
schreef Dariusz Suchojad dsuch@zato.io:

The reason is that what should server3 do if while it was shut down
server1 and server2 both deployed different versions of the same
service?

I don’t know how Zato syncs services between servers, but I imagine
that currently it’s just triggered when a new file is found in
pickup-dir. If all 3 servers are up and I deploy version 5 of my
service to server 1, then S1 sends v5 to the 2 other servers. If I then
deploy v6 to S2, then S2 sends v6 to S1 and S3.

If S3 is down and I deploy v7 to S1, then only S2 gets v7. Then I
deploy v8 to S2 and S1 also gets it. So in either case, S1 and S2 end
up with the same version. If I bring S3 back up, then the other servers
only have v8, so that’s what S3 should get. You could say that they all
have the version with the latest timestamp.

If we can assume that the version with the latest timestamp is the one
that should be active, then it doesn’t matter if a server is down. When
a server comes up it can negotiate service timestamps with the other
servers and then get the latest version of everything.

Is there a good reason to not make that assumption?

On 16/03/16 22:06, Sam Geeraerts wrote:

If we can assume that the version with the latest timestamp is the one
that should be active, then it doesn’t matter if a server is down. When
a server comes up it can negotiate service timestamps with the other
servers and then get the latest version of everything.

Is there a good reason to not make that assumption?

You are right except that currently what you deploy is not a service
actually but a Python module plus there is no notion of synchronizing
based on a timestamp - though definitely this could be used.

The problematic scenario I had in mind was:

  • There is a cluster of s1, s2 and s3
  • s2 and s3 are down
  • s1 receives m1.py in v1
  • s1 goes down, s2 is up
  • s2 receives m1.py in v2
  • s3 boots up - what should it synchronize to now?

As things stand today, without timestamps, there is no good answer to
that question.

But let’s assume that s3 gets m1.py v2 so now s1 on startup should also
check if its m1.py is OK since it may have been changed possibly and if
it did, m1.py v2 should be deployed.

This all is nice and can be added but it needs to wait for the next
major release that will introduce quite a bit of changes into the
internal architecture to better control the process of server startup.

Right now we are using gunicorn and it works very nice but we need much
tighter grip on how server processes are started, how they report it to
the coordinator (the process which spawns actual server processes), upon
which events and so on. Meaning we will have to use our own process
controller instead of gunicorn - gevent will stay though, it is a great
library.

This is needed anyway to cut down on the time needed to start servers.
Right now if you have 2 servers each with 2 processes then you perform a
bunch of actions 4 times, one for each process.

For instance, the same service is read and parsed 4 times from the file
system. That can be sped up/cached and made reusable after first process
does what it needs to do. This can definitely save up a lot - consider
that there are 250+ of internal services alone.

Then you will likely have something akin to:

$ zato start /path/to/server1 /path/to/server2 /path/to/server3

And now Zato will know that anything that was accomplished by server1
should carry over to server2 and server3.

Op Wed, 16 Mar 2016 22:29:35 +0100
schreef Dariusz Suchojad dsuch@zato.io:

This all is nice and can be added but it needs to wait for the next
major release that will introduce quite a bit of changes into the
internal architecture to better control the process of server startup.

Cool. Our usage of Zato is still a long way off from this being a
problem for us. I’m just happy to contribute to the discussion. :slight_smile:

On 08/03/16 10:02, Xi Wang wrote:

     It’s me again, we usually use https://zato.io/docs/admin/guide/installing-services.html#admin-installing-services-service-sources-txt

(“Hot-deploying from command line”) to hot deploy our services. But recently we found it didn’t work ok.
When we copied the services to /opt/zato/genscript_esb_nj_prod/server1/pickup-dir, nothing happened.

Hello,

I have managed to reproduce it.

This can happen if a previous configuration message exchanged internally
by servers could not be handled.

Such messages are sent and received using Redis pub/sub (which is
independent of our own REST pub/sub https://zato.io/docs/pubsub/index.html).

Now, if any of the messages using Redis pub/sub could not be, for
instance, parsed as JSON, the connection to Redis is silently closed and
ultimately from the user’s perspective it looks as though hot-deployment
stopped to work.

These two commits upgrade Redis libraries to a new version + catch any
possible exceptions during handling of the said messages.

They will be released in the upcoming 2.0.8 patch release and will allow
one to consult in server.log the reasons for internal config messages to
fail.

thanks,