About the metadata of a cache entry


#1

Hey there!

When using the cache API I noticed there’s metadata for each entry of a collection, such as the last time the entry was read or written. From the examples using the Cache REST API I deduced that one had to get a cache key like this:

entry = cache.get('key', details=True)

Now I would like to return the following headers when a user gets an entity, say login:

  • Last-Modified: HTTP-date
  • ETag: md5 sum of the entry

Two questions arise:

  1. When the entry already exists in the cache, I can get the last_write attribute from the details and use it to fill in the Last-Modified header:
from wsgiref.handlers import format_date_time
from datetime import datetime
from time import mktime
from hashlib import md5

# Check whether a copy exists in the cache
cache = self.cache.get_cache('builtin', 'logins')
cache_data = cache.get(cache_key, details=True)
if cache_data:
    self.response.status_code = OK
    self.response.headers['Cache-Control'] = 'public,max-age=300'
    self.response.headers['Last-Modified'] = format_date_time(cache_data.last_write)
    self.response.headers['ETag'] = md5(str(cache_data.value)).hexdigest()
    self.response.payload = cache_data.value

But when I am setting the cache entry, I do not have access to the metadata yet, so I have to generate the date manually:

cache.set(cache_key, result)
self.response.status_code = OK
self.response.headers['Cache-Control'] = 'public,max-age=300'
self.response.headers['Last-Modified'] = format_date_time(mktime(datetime.now().timetuple()))
self.response.headers['ETag'] = md5(str(result)).hexdigest()
self.response.payload = result

Usually the two dates do not differ, but they could. So I was thinking whether it would be useful for cache.set() to return the metadata. I take it that, if it is an asynchronous call, then it cannot be the case…

  1. Would it make sense to store the md5 sum of the entry as part of the metadata, so that it could be used to create the ETag header?

Thanks.


#2

Returning the newly added entry on .set is a good idea and I have just implemented it, you can install the updates.

Note however that this returns the entry from the server that your service is invoked on, i.e. from the one where you add the entry and you will still have potentially a bit different time on other servers, but at least you do not need a .get right after .set.

value = cache.set('my-key', 'my-value') # This returns 'my-value'

# This returns a cache entry
entry = cache.set('my-key', 'my-value', details=True) 

# Returns 'my-value'
value = entry.value

# Data and metadata as a dictionary
as_dict = entry.to_dict() 

As for returning ETag and Last-Modified - this also sounds good but I would need some time to think about it closer - would you open a ticket, please? Thanks.


#3

Hi, @dsuch! Thanks for the change. I am afraid something is not working fine, though. I am getting the following error on the logs:

 2018-11-11 08:42:57,491 - ERROR - 230:DummyThread-22 - zato.broker:57 - Could not handle broker msg:`Bunch(action='106420', cache_name='logins', expires_at=0.0, expiry=0.0, key='id-1', msg_type='0002', source_worker_id='1.1.136.5da98b49698360508416a62b', value={'username': 'jsabater', 'surname': 'Sabater', 'name': 'Jaume', 'id': 1, 'is_admin': True, 'email': 'jsabater@gmail.com'})`, e:`Traceback (most recent call last):
  File "/opt/zato/3.0/code/zato-broker/src/zato/broker/__init__.py", line 52, in on_broker_msg
    getattr(self, handler)(msg)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/base/worker/cache_builtin.py", line 46, in on_broker_msg_CACHE_BUILTIN_STATE_CHANGED_SET
    self.cache_api.sync_after_set(_BUILTIN, msg)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/connection/cache.py", line 863, in sync_after_set
    self.caches[cache_type][data.cache_name].sync_after_set(data)
  File "/opt/zato/3.0/code/zato-server/src/zato/server/connection/cache.py", line 542, in sync_after_set
    self.impl.set(data.key, data.value, data.expiry, None)
  File "src/zato/cy/cache.pyx", line 624, in src.zato.cy.cache.Cache.set
TypeError: set() takes exactly 5 positional arguments (4 given)

This is the current code being executed:

class Get(Service):
    """Service class to get a login by id."""
    """Channel /genesisng/logins/{id}/get."""

    class SimpleIO:
        input_required = (Integer('id'))
        # Passwords never travel back to the client side
        output_optional = ('id', 'username', 'name', 'surname', 'email',
                           'is_admin')
        skip_empty_keys = True

    def handle(self):
        conn = self.user_config.genesisng.database.connection
        cache_control = self.user_config.genesisng.cache.default_cache_control
        id_ = self.request.input.id

        # Check whether a copy exists in the cache
        cache_key = 'id-%s' % id_
        cache = self.cache.get_cache('builtin', 'logins')
        cache_data = cache.get(cache_key, details=True)
        if cache_data:
            self.response.status_code = OK
            self.response.headers['Cache-Control'] = cache_control
            self.response.headers['Last-Modified'] = format_date_time(
                cache_data.last_write)
            self.response.headers['ETag'] = md5(str(
                cache_data.value)).hexdigest()
            self.response.payload = cache_data.value
            return

        with closing(self.outgoing.sql.get(conn).session()) as session:
            result = session.query(Login).filter(Login.id == id_).one_or_none()

            if result:
                # Save the record in the cache, minus the password
                result = result.asdict()
                del (result['password'])
                cache_data = cache.set(cache_key, result, details=True)

                # TODO: Check for cache_data or return result, not value

                # Return the result
                self.response.status_code = OK
                self.response.headers['Cache-Control'] = cache_control
                self.response.headers['Last-Modified'] = format_date_time(
                    cache_data.last_write)
                self.response.headers['ETag'] = md5(str(
                    cache_data.value)).hexdigest()
                self.response.payload = cache_data.value
            else:
                self.response.status_code = NOT_FOUND
                self.response.headers['Cache-Control'] = 'no-cache'

There is something weird with the Last-Modified heders, as the datetimes differ too much:

$ curl -v -g "http://127.0.0.1:11223/genesisng/logins/1/get"; echo ""
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 11223 (#0)
> GET /genesisng/logins/1/get HTTP/1.1
> Host: 127.0.0.1:11223
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Zato
< Date: Sun, 11 Nov 2018 08:43:22 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: application/json
< ETag: ca38b9699e2ebad80e2dd98a1b287018
< X-Zato-CID: cd8d2a817c5038b3688e695b
< Cache-Control: public,max-age=300
< Last-Modified: Sun, 11 Nov 2018 08:43:11 GMT
< 
* Closing connection 0
{"response": {"username": "jsabater", "surname": "Sabater", "name": "Jaume", "id": 1, "is_admin": true, "email": "jsabater@gmail.com"}}

$ curl -v -g "http://127.0.0.1:11223/genesisng/logins/1/get"; echo ""
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 11223 (#0)
> GET /genesisng/logins/1/get HTTP/1.1
> Host: 127.0.0.1:11223
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Zato
< Date: Sun, 11 Nov 2018 08:43:23 GMT
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: application/json
< ETag: ca38b9699e2ebad80e2dd98a1b287018
< X-Zato-CID: 2bf6eb3d6d2cbd459fed843d
< Cache-Control: public,max-age=300
< Last-Modified: Sun, 11 Nov 2018 08:42:57 GMT
< 
* Closing connection 0
{"response": {"username": "jsabater", "surname": "Sabater", "name": "Jaume", "id": 1, "is_admin": true, "email": "jsabater@gmail.com"}}

This code, unless I’m missing something, was working fine, Last-Modified headers-wise, before I was using details=True when setting the cache entry.

Or maybe installation of Zato updates was not performed correctly. I deleted all entries in the cache through the web-admin panel, but didn’t delete the cache itself (the collection).


#4

My apologies, this is fixed now.

As for the Last-Modified header - the timestamp returned from the entry is time since epoch in milliseconds as returned by gettimeofday(2).

I would clean the cache completely before trying out the latest change, maybe it was related to the exception above.

I’m also not clear in what time range you expected for the last write time to be, e.g. is it a difference in timezones perhaps?


#5

It’s working flawlessly right now. Cache entries are being read and stored properly and last_write has the exact same datetime (down to the seconds) when reading as it had been generating when writting.

No, no, last_write is fine as it is. I think that, due to the error that was happening, the entries were not in sync. No problem whatsoever here right now :slight_smile:

Regarding Github, here is the ticket and here’s the summary of the two requested attributes:

  1. md5sum: MD5 sum of the data, that is hashlib.md5(str(entry.value)).hexdigest(). Useful to generate and ETag header at the service.

  2. last_modified: HTTP-date version of last_write but generated once and synchronized as part of the metadata, so it is exactly the same value across the instances. Useful to generate a Last-Modified header at the service. Format is RFC2616 but it could also be stored as a timestamp. There are different ways to generate it. The one I am using right now in the services is this one:

from wsgiref.handlers import format_date_time
self.response.headers['Last-Modified'] = format_date_time(cache_entry.last_write)

Thanks!

P.S. Somehow the cd …/code turned back into its original, faulty form cd ./code in the installing updates doc.