Using SimpleIO to return multilevel JSON


#1

Hello everyone!

Using SQLAlchemy, I have two schema classes: guests and bookings. Among others, I have two independent services that return the details of a guest and the details of a booking. Now I’d like to have a service that returns the details of a guest and a list of (the details of) all his/her bookings.

There is a similar example in the docs. What I’d like to have is, for instance, something like:

{
"id": 1,
"name": "Isaac",
"surname": "Newton",
"gender": 1,
"email": "inewton@gmail.com",
"birthdate": "1643-01-04",
[..]
"deleted": null
"reservations": [
    {
        "id": 30,
        "id_room": 1
        "reserved": "2016-03-01",
        "guests": 2,
        "check_in": "2016-07-11",
        "check_out": "2016-07-19",
        "base_price": 110.70,
        [..]
        "cancelled": null
    },
    {
        "id": 956,
        "id_room": 1
        "reserved": "2018-08-21",
        "guests": 2,
        "check_in": "2018-09-14",
        "check_out": "2018-09-21",
        "base_price": 120.60,
        [..]
        "cancelled": null
    },
    {
        "id": 2,
        "id_room": 1
        "reserved": "2018-08-21",
        "guests": 2,
        "check_in": "2018-09-14",
        "check_out": "2018-09-21",
        "base_price": 120.60,
        [..]
        "cancelled": null
    }
],
"rooms": [
    {
        "id": 2,
        "number": "101",
        "name": "Double bedroom with sea views",
        [..]
    },
    {
        "id": 9,
        "number": "205",
        "name": "Double bedroom",
        [..]
    }
]
}

So far I have been able to implement of the services in my test application using SimpleIO, but I am not sure I can use it with this one (return a multilevel JSON document with lists of items inside).

Is it possible? If so, how?

Thanks.


#2

It is not possible today though this question appears from time to time and I recognize that it would be good to have it.

The closest we have to it is the ListOfDicts SimpleIO datatype. However, it only indicates that an element, like ‘reservations’ in your example, will be a list of dicts, without saying anything about what keys the dicts will have.

In your example, what is the origin of ‘reservations’ or ‘rooms’ elements? They are not part of one and the same query as the basic data that you return?

I’m just thinking about extending ListOfDicts to make it able to express ideas such as:

class SimpleIO:
  input_required = 'id',
  output_required = ('id', 'name', ListOfDicts('reservations', 'id', 'id_room', 'reserved'))

#3

Hi, @dsuch, and thanks for your reply.

The ListOfDicts approach you suggest looks like a good idea. Please let me know if I can be of any help testing or implementing it. I (we) would certainly be using it a lot.

I have three possible approaches in this service:

One single query, then parse the results manually:

    with closing(self.outgoing.sql.get(conn).session()) as session:
        query = session.query(func.count().over().label('count'))
        query = query.add_entity(Booking)
        query = query.add_entity(Guest)
        query = query.add_entity(Room)
        query = query.filter(
            and_(Guest.id == Booking.id_guest, Guest.deleted.is_(None),
                 Booking.id_room = Room.id, Booking.id_guest == id_))
        result = query.all()
        guest = extract_guest_fields(result)
        bookings = extract_bookings(result)
        rooms = extract_rooms(result)
        response = create_guest_with_bookings_dict(guest, bookings, rooms)

This is pseudo-code I just wrote, so please excuse the inaccuracies. No checks of whatever for the sake of simplicity.

Multiple queries, build the result manually

    with closing(self.outgoing.sql.get(conn).session()) as session:
        guest = session.query(Guest).\
            filter(and_(Guest.id == id_, Guest.deleted.is_(None))).\
            one_or_none()
        query = session.query(func.count().over().label('count'))
        query = query.add_entity(Booking)
        bookings = query.filter(
            and_(Guest.id == Booking.id_guest, Guest.deleted.is_(None),
                 Booking.id_guest == id_))
        bookings = query.all()
        room_ids = extract_room_ids(bookings)
        rooms = session.query(Room).\
            filter(Room.id.in_(room_ids)).one_or_none()
        response = create_guest_with_bookings_dict(guest, bookings, rooms)

Or I might just invoke the room.get() and guest.get() services, which already exist.

Just return the bookings and let the client call guest.get() and room.get()

class Bookings(Service):
    """Service class to get a list of all bookings from a guest."""
    """Channel /genesisng/guests/{id}/bookings."""

    class SimpleIO:
        input_required = ('id')
        output_optional = ('count', 'id', 'id_guest', 'id_room',
                           DateTime('reserved'), 'guests', Date('check_in'),
                           Date('check_out'), DateTime('checked_in'),
                           DateTime('checked_out'), DateTime('cancelled'),
                           'base_price', 'taxes_percentage', 'taxes_value',
                           'total_price', 'locator', 'pin', 'status',
                           'meal_plan', 'additional_services', 'uuid',
                           'deleted')
        output_repeated = True

    def handle(self):
        conn = self.user_config.genesisng.database.connection
        id_ = self.request.input.id

        # Execute query
        with closing(self.outgoing.sql.get(conn).session()) as session:
            query = session.query(func.count().over().label('count'))
            query = query.add_entity(Booking)
            query = query.filter(
                and_(Guest.id == Booking.id_guest, Guest.deleted.is_(None),
                     Booking.id_guest == id_))
            result = query.all()
            self.response.payload[:] = result if result else []

Ideally I’d like to return all necessary information in this service, so that the client does not have to make additional calls. And then there is this balance between just one query to get all the information vs invoking existing services to make more but smaller, simpler queries.

Thanks.


#4

I came up with the syntax as below. I can see two cases - returning a single object that happens to have a ListOfDicts, or returning a list objects, each with a ListOfDicts (naturally, there can be multiple ListOfDicts for each object but I am showing only one for clarity).

Essentially, it would be up to the user to produce input data for ListOfDicts and Zato would extract the correct keys - either from a dictionary-like object, from an SQLAlchemy query result or from anything else that SimpleIO already supports.

Is this something that you had in mind?


from zato.server.service import ListOfDicts, Service

# #################################################################################################

class MyService1(Service):

    class SimpleIO:
        input_required = 'id',
        output_required = ('username', ListOfDicts('addresses', 'street', 'number'))

    def handle(self):

        addresses = [
            {'street':'Street 1', 'number':'1', 'settlement':'Town 1'},
            {'street':'Street 2', 'number':'2', 'settlement':'Town 2'},
            {'street':'Street 3', 'number':'3', 'settlement':'Town 3'},
        ]

        # As before ..
        self.response.payload.username = 'myusername1'

        # .. but this will extract only the relevant keys.
        self.response.payload.addresses = addresses

# #################################################################################################

class MyService2(Service):

    class SimpleIO:
        input_required = 'id',
        output_required = ('username', ListOfDicts('addresses', 'street', 'number'))
        output_repeated = True

    def handle(self):

        response = []

        user1 = {
            'username':'myusername1',
        }

        user1['addresses'] = [
            {'street':'Street 1', 'number':'1', 'settlement':'Town 1'},
            {'street':'Street 2', 'number':'2', 'settlement':'Town 2'},
            {'street':'Street 3', 'number':'3', 'settlement':'Town 3'},
        ]

        user2 = {
            'username':'myusername2',
            'addresses1': []
        }

        user2['addresses'].extend(session.query(UserAddress).filter(UserAddress.user_id=123)).all()

        response.append(user1)
        response.append(user2)

        self.response.payload[:] = response

# #################################################################################################


#5

Hi, @dsuch.

Yes, I believe this case scenario would perfectly suit my needs. You stated that there could me multiple ListOfDicts on each object, which I understand (in the case above, a list of bookings and a list of rooms), but what about more than one level, that is, ListOfDicts with ListOfDicts inside?

For instance, channel /app/guests/{id}/invoices/payments to get a list of all invoices, with their payments, from the guest with id {id}. So we’d have something like:

class InvoicesWithPayments(Service):

    class SimpleIO:
        input_required = ('id')
        output_required = ('name', 'surname',
                           ListOfDicts('invoices', 'number', 'amount',
                                       ListOfDicts('payments', 'date', 'method', 'amount')))

Or would this just be too much? :slight_smile:

Also, should assigning each part of the dictionary to their key inside the payload be the way to go or could we build the dictionary and assign it all to the parent key (in your example, username)?

Incidentally, I see that you often use the output_required attribute whereas I always use the output_optional attribute. I use output_optional because, you know, the id received as input parameter may not exist, or whatever, thus I’d be returning a 404 Not Found. Example:

from httplib import OK, NO_CONTENT, CREATED, NOT_FOUND, CONFLICT

class Get(Service):
    """Service class to get a guest by id."""
    """Channel /genesisng/guests/{id}/details."""

    class SimpleIO(object):
        input_required = ('id')
        output_optional = ('id', 'name',
                           'surname', 'gender', 'email', 'passport',
                           Date('birthdate'), 'address1', 'address2',
                           'locality', 'postcode', 'province', 'country',
                           'home_phone', 'mobile_phone', DateTime('deleted'))

    def handle(self):
        conn = self.user_config.genesisng.database.connection
        id_ = self.request.input.id

        with closing(self.outgoing.sql.get(conn).session()) as session:
            result = session.query(Guest).\
                filter(and_(Guest.id == id_, Guest.deleted.is_(None))).\
                one_or_none()

            if result:
                self.response.status_code = OK
                self.response.payload = result
            else:
                self.response.status_code = NOT_FOUND
                self.response.payload = ''

More here.

So, is it okay to use output_required even if you may end up returning NOT_FOUND?


#6

For now, there would not be any sort of recursive structures. I plan to rewrite SIO before the next release in June. This will make it possible for users to define their own datatypes. But it will be also a good opportunity to introduce several changes, like precompilation of SIO definitions in Cython. This would be also a good time to add means for recursive definitions.

As for assigning to a parent key, yes, this would be possible - just to confirm it, is it not the same as assigning to ‘addresses’ in my example?

Regarding output_required vs. output_optional, my usage of the former does not signify anything, one is not better than the other. They both have their place depending on what is about to be returned.

You need to keep in mind that output_optional or output_required do not know anything about HTTP. Remember that SimpleIO can be used with protocols that are unrelated to HTTP, e.g. AMQP or WebSockets. This means that there is nothing preventing you from returning NOT_FOUND even if you return something or the other way around. Come to think of it, I can see room for a kind of a RESTChannel base service that would actually populate such headers or metadata automatically, but it is not the case today.


#7

Sounds good to me. Phase one with ListOfDicts, phase two add recursive ListOfDicts :+1:

What would be your plans on implementing the former?

I mean creating a dictionary such as:

my_dict = {
    'username': 'myusername1',
    'addresses': [
        {'street':'Street 1', 'number':'1', 'settlement':'Town 1'},
        {'street':'Street 2', 'number':'2', 'settlement':'Town 2'},
        {'street':'Street 3', 'number':'3', 'settlement':'Town 3'},
    ]
}

Then assigning my_dict to self.response.payload. Not that your proposal above seems wrong or anything to me, but just wondering :slight_smile:

I understand that one has the purpose of throwing an exception (required) whereas the other not (optional), but I am not sure I understand what you mean beyond that.


#8

I hope to be able to find a few days, possibly in the next several weeks, to implement what we have discussed, though I cannot commit to an exact timeline.

Assigning a dictionary to self.response.payload will continue to work, as today. We are only talking about extensions to ListOfDicts.

As for output_optional and output_required, this is how I think about it.

  • I start with a data model, be it in UML, XSD, SQL, with a pen and paper or anything else that is used in a given project.

  • Let us say that in our system we have users. Each user has its unique ID and username. Each user also has a sort of a name meant to be used for humans (display name), but each person’s first, middle and last names are optional because users may belong to all sorts of cultures and the very ideas of first, middle and last names may make no sense in a given one. Yet, we can assume that each person has a preferred name of one sort or another that can be used for display purposes.

  • We can express these concepts using a made up notation as below, where - denotes an optional attribute and + is a required one:

  class User:
    + id (uq)
    + username (uq)
    + display_nane
    - first_name
    - middle_name
    - last_name
  • That was a bespoke notation, but we can express it in SQL as such:
CREATE TABLE user (
    id INT NOT NULL UNIQUE,
    username VARCHAR NOT NULL UNIQUE,
    display_name VARCHAR NOT NULL,
    first_name VARCHAR,
    middle_name VARCHAR,
    last_name VARCHAR
)
  • The point is that we start with observations about the real world and then we express it in various ways

  • Now, we start to design Zato services and the first one will be called GetUser. We want to use SimpleIO so we may end up with

class GetUser(Service):
    class SimpleIO:
      input_required = 'id',
      output_required = ('id', 'username', 'display_name')
      output_optional = ('first_name', 'middle_name', 'last_name')
  • This represents the idea of a Get/Read kind of service that, given a user ID, will for sure return this person’s ID (perhaps rewritten from input), followed by username and the display name. Then, optionally, it will also return first, middle and last name. This is optional because we know from our analysis phase that in our model these fields will not always exist.

  • We can also observe that various data declaration formats may possibly lose or add some information - for instance, in SimpleIO there is no way to express it that id and username will be unique among all the others. On the other hand, SQL does not express it that to get a user from our system we need to know the person’s ID first. In other words, they are mostly identical and related but still, each serves a different purpose.

  • This is really the difference between and purpose of output_required and output_optional, to be able to signal to your service’s consumers, and to you as a programmer, what is required to exist in your model or API and what is not.

  • SimpleIO as such does not know what kind of protocol it serves data through (for instance, HTTP or AMQP), it just sees what is required, validates that it exists or not and then serializes your response to a particular format, in your case this is JSON, but it could be XML too and in fact, the desire for the rewrite that I mentioned is driven in part because I would like for such output formats to be easily customizable, e.g. CSV or HL7 output messages are not hard to imagine

  • There is a command, for now undocumented, called zato apispec which can generate Swagger, WSDL or other files for external API clients based on SimpleIO definitions and services - I plan to rework it a bit and then it will be fully documented and supported


#9

That is a very nice explanation, @dsuch. Thank you very much for your time. Definitely worth adding to the docs.

What you explained makes sense, I just didn’t figure it out from the docs (maybe I didn’t read the right document, or part of a document). Anyway, output_required and output_optional do not refer to whether we’ll be returning zero or more tuples of data, but which attributes in each tuple will for sure be there and which ones may be there or not. :+1:

I’ll update my services according to this.


#10

Hi, @dsuch!

I just realised that the ListOfDicts type already appears in the SimpleIO online documentation, which means either that I missed it or that part of the above discussed functionality already exists (or was planned).

How is it?

Thanks.


#11

The element exists but it is essentially pass-through as far as JSON goes. Neither does it implement any of the features that we were talking about in this thread.


#12

So, at the moment, this:

class SimpleIO:
    output_optional = ('name', 'surname',
                       ListOfDicts('id', 'guests', 'locator'))

Or this:


class SimpleIO:
    output_optional = ('name', 'surname',
                       ListOfDicts('bookings', 'id', 'guests', 'locator'))

Is the same as:

class SimpleIO:
    output_optional = ('name', 'surname', 'id', 'guests', 'locator')

Well, no worries! I think I’ll keep using SimpleIO but I won’t be using output_required or output_optional and I’ll build myself the dict, then assign it to the payload of the response. So, when you have the chance to implement the functionality discussed before, I’ll just have to do some rearrangements :slight_smile: