Storing listings in the built-in cache


#1

Hey there!

Following up on this issue, I am trying to store the results of a SELECT with JOINS into the cache, i.e. a list of items or listing. And I am having trouble with it. This is the error on the log when trying to store the value in the cache:

2019-02-14 12:25:15,494 DEBG 'zato-server1' stdout output:
2019-02-14 12:25:15,493 - ERROR - 134:DummyThread-7 - zato.broker:57 - Could not handle broker msg:`Bunch(action='106420', cache_name='availability', expires_at=0.0, expiry=0.0, is_key_pickled=False, is_value_pickled=True, key='check_in:2017-07-01|check_out:2017-07-10|guests:2|rooms:[]', msg_type='0002', orig_now=1550147115.485663, source_worker_id='1.2.135.ebdfed681387770699df7802', value="(lp1\nccopy_reg\n_reconstructor\np2\n(czato.common.odb.api\nWritableKeyedTuple\np3\nc__builtin__\nobject\np4\nNtRp5\n(dp6\nS'_elem'\np7\ncsqlalchemy.util._collections\nKeyedTuple\np8\n((lp9\nI6\naI1\naI6\naVNormal bedroom with one double bed\np10\naI0\naI1\naVbrown\np11\naV106\np12\naI2\naI9\naF1152\na(S'id'\np13\nS'floor_no'\np14\nS'room_no'\np15\nS'name'\np16\nS'sgl_beds'\np17\nS'dbl_beds'\np18\nS'code'\np19\nS'number'\np20\nS'accommodates'\np21\nVnights\np22\nVtotal_price\np23\ntp24\ntRp25\nsbag2\n(g3\ng4\nNtRp26\n(dp27\nS'_elem'\np28\ng8\n((lp29\nI2\naI1\naI2\naVLarge bedroom with two single and one double beds\np30\naI2\naI1\naVblack\np31\naV102\np32\naI4\naI9\naF1332\nag24\ntRp33\nsbag2\n(g3\ng4\nNtRp34\n(dp35\nS'_elem'\np36\ng8\n((lp37\nI3\naI1\naI3\naVVery large bedroom with three single and one double beds\np38\naI3\naI1\naVwhite\np39\naV103\np40\naI5\naI9\naF1422\nag24\ntRp41\nsba.")`, e:`Traceback (most recent call last):
  File "/opt/zato/3.0/code/zato-broker/src/zato/broker/__init__.py", line 52, in on_broker_msg
    getattr(self, handler)(msg)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/base/worker/cache_builtin.py", line 60, in on_broker_msg_CACHE_BUILTIN_STATE_CHANGED_SET
    self._unpickle_msg(msg)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/base/worker/cache_builtin.py", line 53, in _unpickle_msg
    msg['value'] = _pickle_loads(msg['value'])
  File "/opt/zato/3.0/code/zato-common/src/zato/common/odb/api.py", line 67, in __getattr__
    return getattr(self._elem, key)

[..] A gazillion times the same line

  File "/opt/zato/3.0/code/zato-common/src/zato/common/odb/api.py", line 67, in __getattr__
    return getattr(self._elem, key)
RuntimeError: maximum recursion depth exceeded

These are the contents of the variable (the output of the query) that I want to store on the cache:

2019-02-14 12:25:15,485 DEBG 'zato-server2' stdout output:
2019-02-14 12:25:15,484 - INFO - 135:DummyThread-6 - availability.search:155 - Result is: [WritableKeyedTuple('id'=6, 'floor_no'=1, 'room_no'=6, 'name'=u'Normal bedroom with one double bed', 'sgl_beds'=0, 'dbl_beds'=1, 'code'=u'brown', 'number'=u'106', 'accommodates'=2, u'nights'=9, u'total_price'=1152.0), WritableKeyedTuple('id'=2, 'floor_no'=1, 'room_no'=2, 'name'=u'Large bedroom with two single and one double beds', 'sgl_beds'=2, 'dbl_beds'=1, 'code'=u'black', 'number'=u'102', 'accommodates'=4, u'nights'=9, u'total_price'=1332.0), WritableKeyedTuple('id'=3, 'floor_no'=1, 'room_no'=3, 'name'=u'Very large bedroom with three single and one double beds', 'sgl_beds'=3, 'dbl_beds'=1, 'code'=u'white', 'number'=u'103', 'accommodates'=5, u'nights'=9, u'total_price'=1422.0)]

So it’s a list of WritableKeyedTuple objects.

This is an excerpt of the service:

class Search(Service):

    class SimpleIO(object):
        input_required = (Date('check_in'), Date('check_out'),
                          Integer('guests'))
        input_optional = (List('rooms'))
        output_optional = ('id', 'number', 'name', 'sbl_beds', 'dbl_beds',
                           'accommodates', 'code', 'nights', 'total_price')
        skip_empty_keys = True
        output_repeated = True

    def handle(self):
        conn = self.user_config.genesisng.database.connection

        # [..] A lot of code to build and execute the query

            if result:

                self.logger.info('Result is: %s' % result)

                # Store results in the cache
                cache = self.cache.get_cache('builtin', 'availability')
                cache_key = 'check_in:%s|check_out:%s|guests:%s|rooms:%s' % (
                    check_in.strftime('%Y-%m-%d'),
                    check_out.strftime('%Y-%m-%d'),
                    guests, str(rooms))
                cache_data = cache.set(cache_key, result, details=True)

                if cache_data:
                    self.response.headers['Cache-Control'] = cache_control
                    self.response.headers['Last-Modified'] = cache_data.\
                        last_write_http
                    self.response.headers['ETag'] = cache_data.hash
                else:
                    self.response.headers['Cache-Control'] = 'no-cache'

                # Return the result
                self.response.payload[:] = result
                self.response.status_code = OK
            else:
                self.response.status_code = NO_CONTENT
                self.response.headers['Cache-Control'] = 'no-cache'

The service is working fine, returning the result and the OK status code, which leads me to believe that one way to go would be to build a list of dicts the same way the code handling the payload attribute does.

But I wanted to see if I could get some feedback before having the application doing the same job twice.

Thanks.


#2

Hi @jsabater,

I think you what you encountered may be related to the fact that you are trying to store live and complex Python objects, ones that potentially have some inner objects with their own objects and so on.

In principle, this is doable. But at one point you will find an obstacle, perhaps this is it.

For instance, your query results may hold references to SQLAlchemy-based objects that, indirectly, may be still holding references to their SQL session.

The caveat at this point is that, during the cache synchronization, such objects are transported to other servers to populate their caches too.

What happens then is that on the other server some of the inner references may be not available at all. Consider the case of your trying to store a live TCP connection in the cache - this would have to fail because other servers are not aware of any such connection.

I do not know if this is that case precisely, I will need to analyse it further but I am just signalling that at one point there are Python-level or OS-level limitations to what you can store in your caches and in Zato it simply cannot be overcome.


#3

Hi, @dsuch!

I totally agree with you. I may have not elaborated my post in this thread properly: in my opinion, it is impossible that all result sets that come from SQLAlchemy queries can be stored in the cache (e.g. because you are using lazy loading of a column via deferred(), any of the reasons you explained above, etc.).

So, as per my previous post, I think that it is inevitable that the result set will have to be transformed into a list (list type) of dictionaries (dict type) by looping the result and extracting the data.

But it seems that this would mean doing the same job twice (maybe), as Zato will be doing that again once the result set is assigned to self.response.payload. Thus this thread, to get some feedback.

I wonder:

  • If I assign my hand-made list of dicts to the payload, will I be saving some or all of the work to Zato, especially when I am using SimpleIO?
  • Do you see a way out to avoid doing the same job twice?
  • Am I worrying too much? :laughing:

Thanks.


#4

When you assign something to self.response.payload the rule is that if it is a string object then it will never be inspected, even if you use SimpleIO.

However, if you have SimpleIO and your response is a complex object (list, dict etc). then it will be compared to your SimpleIO definition, regardless of what your service is doing, i.e. the output parsing part does not know that in your service you already serialized some objects for other purposes.

If I understand it correctly, you create a cache key then you place in cache the same data that goes into the response, is that right?

If so, then the same can be achieved simply by enabling caching in your channel’s definition. The difference is that then the response-building pieces of Zato know that there is some data already serialized, just about to be sent as a response, but it needs to be cached too.

You could then implement get_request_hash method in the service to be able to dictate to what business value the request hashes and, consequently, if it can be served from cache or not - akin to how you check it using your cache_key object.


#5

So assigning a list of dicts to the payload does not prevent Zato from comparing it to the SimpleIO definition, got it.

Regarding the cache, yes, I suppose I could just use the interface, and it would even be more efficient, but since this is a prototype, I wanted to use the cache API myself, hence all the hassle.

This is the code that is working for me right now (an excerpt of the service, right after the query has been executed):

            self.logger.info('Result is: %s' % result)

            if result:
                # TODO: Use self.SimpleIO.output_optional to build the dict automatically
                lod = []
                for r in result:
                    d = {
                        'id': r.id,
                        'number': r.number,
                        'name': r.name,
                        'sbl_beds': r.sgl_beds,
                        'dbl_beds': r.dbl_beds,
                        'accommodates': r.accommodates,
                        'code': r.code,
                        'nights': r.nights,
                        'total_price': r.total_price
                    }
                    lod.append(d)

                # Store results in the cache
                cache = self.cache.get_cache('builtin', 'availability')
                cache_key = 'check_in:%s|check_out:%s|guests:%s|rooms:%s' % (
                    check_in.strftime('%Y-%m-%d'),
                    check_out.strftime('%Y-%m-%d'),
                    guests, str(rooms))
                cache_data = cache.set(cache_key, lod, details=True)

                if cache_data:
                    self.response.headers['Cache-Control'] = cache_control
                    self.response.headers['Last-Modified'] = cache_data.\
                        last_write_http
                    self.response.headers['ETag'] = cache_data.hash
                else:
                    self.response.headers['Cache-Control'] = 'no-cache'

                # Return the result
                self.response.payload[:] = lod
                self.response.status_code = OK
            else:
                self.response.status_code = NO_CONTENT
                self.response.headers['Cache-Control'] = 'no-cache'

Where result in the log appears as:

[WritableKeyedTuple('id'=6, 'floor_no'=1, 'room_no'=6, 'name'=u'Normal bedroom with one double bed', 'sgl_beds'=0, 'dbl_beds'=1, 'code'=u'brown', 'number'=u'106', 'accommodates'=2, u'nights'=9, u'total_price'=1152.0), WritableKeyedTuple('id'=3, 'floor_no'=1, 'room_no'=3, 'name'=u'Very large bedroom with three single and one double beds', 'sgl_beds'=3, 'dbl_beds'=1, 'code'=u'white', 'number'=u'103', 'accommodates'=5, u'nights'=9, u'total_price'=1422.0)]

#6

This bit seems to be the actual answer to the opening post, doesn’t it? It is the way in which the job is not done twice, if I have understood correctly. :slight_smile:


#7

Yes, that would be the answer - there is already functionality that works in pretty much the same way as you are doing it.

But I know that you are doing other useful things like setting the relevant HTTP headers which also makes sense to add to core Zato.

Please open a ticket in GH if you would like to have it added, I mean building HTTP headers automatically - it will be most productive to sit to all such cache-related matters in one go before the next release.


#8

So I just now realised that the code above saves the list of dicts in the cache in this form:

[Bunch(accommodates=2, code=u'brown', dbl_beds=1, id=6, name=u'Normal bedroom with one double bed', nights=9, number=u'106', sbl_beds=0, total_price=1152.0), Bunch(accommodates=5, code=u'white', dbl_beds=1, id=3, name=u'Very large bedroom with three single and one double beds', nights=9, number=u'103', sbl_beds=3, total_price=1422.0)]

This is after the following call using curl:

$ curl -v -g "http://127.0.0.1:11223/genesisng/availability/search?guests=2&check_in=2017-07-01&check_out=2017-07-10&rooms=3&rooms=6"; echo ""
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 11223 (#0)
> GET /genesisng/availability/search?guests=2&check_in=2017-07-01&check_out=2017-07-10&rooms=3&rooms=6 HTTP/1.1
> Host: 127.0.0.1:11223
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Zato
< Date: Fri, 15 Feb 2019 09:46:06 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: application/json
< ETag: b9d39f31a15d93e4d56a13d3388623c9bbf016132cfadefc202929c54ef9a4dc
< X-Zato-CID: d2ac5db1c91666adfdfa963c
< Cache-Control: public,max-age=300
< Last-Modified: Fri, 15 Feb 2019 09:46:06 GMT
< 
* Closing connection 0
{"response": [{"code": "brown", "name": "Normal bedroom with one double bed", "number": "106", "accommodates": 2, "nights": 9, "sbl_beds": 0, "dbl_beds": 1, "total_price": 1152.0, "id": 6}, {"code": "white", "name": "Very large bedroom with three single and one double beds", "number": "103", "accommodates": 5, "nights": 9, "sbl_beds": 3, "dbl_beds": 1, "total_price": 1422.0, "id": 3}]}

Now if I make the same call again:

$ curl -v -g "http://127.0.0.1:11223/genesisng/availability/search?guests=2&check_in=2017-07-01&check_out=2017-07-10&rooms=3&rooms=6"; echo ""
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 11223 (#0)
> GET /genesisng/availability/search?guests=2&check_in=2017-07-01&check_out=2017-07-10&rooms=3&rooms=6 HTTP/1.1
> Host: 127.0.0.1:11223
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Server: Zato
< Date: Fri, 15 Feb 2019 09:48:02 GMT
< Connection: close
< Transfer-Encoding: chunked
< X-Zato-CID: 90259f3a760dcb48ceab7258
< 
* Closing connection 0
{"zato_env": {"details": "Traceback (most recent call last):\n  File \"/opt/zato/3.0/code/zato-server/src/zato/server/connection/http_soap/channel.py\", line 268, in dispatch\n    payload, worker_store, self.simple_io_config, post_data, path_info, soap_action)\n  File \"/opt/zato/3.0/code/zato-server/src/zato/server/connection/http_soap/channel.py\", line 502, in handle\n    params_priority=channel_item.params_pri)\n  File \"/opt/zato/3.0/code/zato-server/src/zato/server/service/__init__.py\", line 524, in update_handle\n    raise resp_e\nAttributeError: 'list' object has no attribute 'getvalue'\n", "result": "ZATO_ERROR", "cid": "90259f3a760dcb48ceab7258"}}

This is the exception in the log:

2019-02-15 09:48:02,426 DEBG 'zato-server1' stdout output:
2019-02-15 09:48:02,425 - WARNING - 134:DummyThread-14 - zato.server.service:517 - Exception in service `availability.search`, e:`Traceback (most recent call last):
  File "/opt/zato/3.0/code/zato-server/src/zato/server/service/__init__.py", line 506, in update_handle
    response = set_response_func(service, data_format=data_format, transport=transport, **kwargs)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/connection/http_soap/channel.py", line 366, in _set_response_data
    self.set_payload(service.response, data_format, transport, service)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/connection/http_soap/channel.py", line 564, in set_payload
    response.payload = response.payload.getvalue() if response.payload else ''
AttributeError: 'list' object has no attribute 'getvalue'
`

2019-02-15 09:48:02,427 DEBG 'zato-server1' stdout output:
2019-02-15 09:48:02,426 - ERROR - 134:DummyThread-14 - zato.server.connection.http_soap.channel:324 - Caught an exception, cid:`90259f3a760dcb48ceab7258`, status_code:`500`, _format_exc:`Traceback (most recent call last):
  File "/opt/zato/3.0/code/zato-server/src/zato/server/connection/http_soap/channel.py", line 268, in dispatch
    payload, worker_store, self.simple_io_config, post_data, path_info, soap_action)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/connection/http_soap/channel.py", line 502, in handle
    params_priority=channel_item.params_pri)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/service/__init__.py", line 524, in update_handle
    raise resp_e
AttributeError: 'list' object has no attribute 'getvalue'

I take it that this happens when I try to get the key from the cache collection:

        # Check whether a copy exists in the cache
        cache_key = 'check_in:%s|check_out:%s|guests:%s|rooms:%s' % (
            check_in.strftime('%Y-%m-%d'), check_out.strftime('%Y-%m-%d'),
            guests, str(rooms))
        cache = self.cache.get_cache('builtin', 'availability')
        cache_data = cache.get(cache_key, details=True)
        if cache_data:
            self.response.status_code = OK
            self.response.headers['Cache-Control'] = cache_control
            self.response.headers['Last-Modified'] = cache_data.last_write_http
            self.response.headers['ETag'] = cache_data.hash
            self.response.headers['Content-Language'] = 'en'
            self.response.payload = cache_data.value
            return

Any hints?


#9

This would need to be self.response.payload[:] = cache_data.value - note the slice operator.

But I can see that Zato should apply it itself in such situations if it recognizes that it is a list and it is SimpleIO - would you open a ticket, please?


#10

Doh! Sorry! My fault when copy-pasting! :sob:


#11

Sure thing. Here it is. Not sure if I explained myself properly, though :laughing:

Incidentally, from the documentation I always believed that by setting output_repeated = True one did not need to use the slice operator when assigning a variable to the payload. It does not seem to be the case. So, if one uses the slice operator self.response.payload[:], is output_repeated = True still necessary?