How to design a high-level application protocol for metadata syncing between devices and server?
- by Jaanus
I am looking for guidance on how to best think about designing a high-level application protocol to sync metadata between end-user devices and a server.
My goal: the user can interact with the application data on any device, or on the web. The purpose of this protocol is to communicate changes made on one endpoint to other endpoints through the server, and ensure all devices maintain a consistent picture of the application data. If user makes changes on one device or on the web, the protocol will push data to the central repository, from where other devices can pull it.
Some other design thoughts:
I call it "metadata syncing" because the payloads will be quite small, in the form of object IDs and small metadata about those ID-s. When client endpoints retrieve new metadata over this protocol, they will fetch actual object data from an external source based on this metadata. Fetching the "real" object data is out of scope, I'm only talking about metadata syncing here.
Using HTTP for transport and JSON for payload container. The question is basically about how to best design the JSON payload schema.
I want this to be easy to implement and maintain on the web and across desktop and mobile devices. The best approach feels to be simple timer- or event-based HTTP request/response without any persistent channels. Also, you should not have a PhD to read it, and I want my spec to fit on 2 pages, not 200.
Authentication and security are out of scope for this question: assume that the requests are secure and authenticated.
The goal is eventual consistency of data on devices, it is not entirely realtime. For example, user can make changes on one device while being offline. When going online again, user would perform "sync" operation to push local changes and retrieve remote changes.
Having said that, the protocol should support both of these modes of operation:
Starting from scratch on a device, should be able to pull the whole metadata picture
"sync as you go". When looking at the data on two devices side by side and making changes, should be easy to push those changes as short individual messages which the other device can receive near-realtime (subject to when it decides to contact server for sync).
As a concrete example, you can think of Dropbox (it is not what I'm working on, but it helps to understand the model): on a range of devices, the user can manage a files and folders—move them around, create new ones, remove old ones etc. And in my context the "metadata" would be the file and folder structure, but not the actual file contents. And metadata fields would be something like file/folder name and time of modification (all devices should see the same time of modification).
Another example is IMAP. I have not read the protocol, but my goals (minus actual message bodies) are the same.
Feels like there are two grand approaches how this is done:
transactional messages. Each change in the system is expressed as delta and endpoints communicate with those deltas. Example: DVCS changesets.
REST: communicating the object graph as a whole or in part, without worrying so much about the individual atomic changes.
What I would like in the answers:
Is there anything important I left out above? Constraints, goals?
What is some good background reading on this? (I realize this is what many computer science courses talk about at great length and detail... I am hoping to short-circuit it by looking at some crash course or nuggets.)
What are some good examples of such protocols that I could model after, or even use out of box? (I mention Dropbox and IMAP above... I should probably read the IMAP RFC.)