(Migrated) Scheduling at Scale with Zato

(This message has been automatically imported from the retired mailing list)

Hello,

I have recently begun developing services for Zato at my organization to
meet our growing ESB and scheduling needs. The framework suits our needs
perfectly, however we are encountering a small obstacle when attempting to
schedule at scale. A simple service we have developed takes a JSON payload
over AMQP or HTTP and schedules a one-time task. When the task is executed,
a separate service transmits over an outgoing AMQP channel. All is working
fine here, however during load test when we attempt to schedule a large
number of jobs, the server or servers will become unresponsive (cease
logging) and are unable to receive or execute tasks.

The test case I am using to reproduce this behavior is attempting to
schedule 1000 tasks either via HTTP or via AMQP. These tasks are set to
execute 60 seconds from their creation. A few hundred tasks in, the
behavior described above occurs. I have not yet attempted to trace this
behavior in the application code, figuring it might be wiser to post my
findings in this regard here first.

Please let me know that if the current approach is not viable if there is
an alternative method of scheduling that would support a large number of
jobs, or if this method should be viable, if you have seen this behavior
before. I am able to reproduce the behavior from source and package on
Ubuntu and from package on RHEL, our current environment.

The virtual machine used in test does not have great specs, but would like
to find a way to at least serve a 503 and keep the servers up if the load
becomes too great. Any insights into this issue would be greatly
appreciated.

Thanks for all your hard work on this excellent piece of code, and also for
opening it to the community to explore. Please let me know if I can do
anything to aid in the resolution of this matter or if you have any
questions.

Thanks and Regards,
Brandon (tlspan)

Hi Brandon,

it’s great to hear you find Zato interesting!

Please check out this gist:

https://gist.github.com/dsuch/6d28136741357205f3ef

This uses an alternative approach, that of gevent-based scheduling.

Basically, one service accepts requests and tells gevent to invoke
another one at a later time.

This way you can easily schedule thousands of tasks.

Why not the GUI-based scheduler? Because it uses real OS threads to
start tasks - this works fine as long as you don’t execute a lot of
them. Note that the thread is used only to start a task, not to run it
in, but still in your tests 1000 real threads on a VM could easily bring
the virtual OS to its knees.

Having said all that, the GUI scheduler is built on top of apscheduler …

… and if you could please open a ticket on GH I’ll investigate it,
though not right now, if it’s possible to make apscheduler co-operate
with gevent. That would give us best of the two worlds.

Hi Brandon,

it’s great to hear you find Zato interesting!

Please check out this gist:

https://gist.github.com/dsuch/6d28136741357205f3ef

This uses an alternative approach, that of gevent-based scheduling.

Basically, one service accepts requests and tells gevent to invoke
another one at a later time.

This way you can easily schedule thousands of tasks.

Why not the GUI-based scheduler? Because it uses real OS threads to
start tasks - this works fine as long as you don’t execute a lot of
them. Note that the thread is used only to start a task, not to run it
in, but still in your tests 1000 real threads on a VM could easily bring
the virtual OS to its knees.

Having said all that, the GUI scheduler is built on top of apscheduler …

… and if you could please open a ticket on GH I’ll investigate it,
though not right now, if it’s possible to make apscheduler co-operate
with gevent. That would give us best of the two worlds.