This is a new post to discuss this reply in another thread:
My main usage of Zato is for a mini-workflow for file pre-processing and queue management. The workflow is like this:
- An external provider service files in a remote SFTP;
- My Service1 pools periodically this folder, using an external library (today it’s ssh2-python) and moves files to an internal folder based of the origin folder and type of file, using zato FTP feature;
- My Service2 pools periodically some folders from the previous service and converts each of them to another format;
- My Service3 pools periodically every file already processed by services 1 and 2 and for each file, add each file to a queue to service an external system;
- My Service4 runs periodically and manages the queue created by service2, moving files to be processed to the the external system folder and checking if files from previous executions are done processing, so they are removed from the queue.
All the serviced folders are not local to the Zato machines, so I used always a pooling system to find files, manage which files were already processed or not and so on (also I found no way of using something like inotify remotely).
If Zato could provide a way to notify a service or a pub/sub system with new remote files directly, it would be an awesome addition. Also a cleaner integrated remote file system library would be good. The current version of pyfs (at least in 2.0.7) is problematic. It does not support SFTP natively and the FTP implementation does a poor job of manage operations in large folders. As soon as you approach 15~20k files in a folder, each simple operation takes way longer than normal (which made me implement an external purge system to cleanup the end folders to avoid larger issues).
I failed at managing to upgrade pyfs without breaking zato, paramiko is not gevent friendly and ssh2-python was the closer I got to a stable system, but if I use it in all my services (1-4), we start having gunicorn timeouts, restarts and eventually the cluster stops working, needing a full restart. To make my system stable, I used ssh2-python only at the Service1 (which is the one the customer required to use SFTP) and the local zato FTP system for all the rest. It’s not ideal (since this lib is faster) but it’s working.
Thanks for the question and if you need more clarification, let me know.