API Message Localization
- by Jesse Taber
In my post, “Keep Localizable Strings Close To Your Users” I talked about the internationalization and localization difficulties that can arise when you sprinkle static localizable strings throughout the different logical layers of an application. The main point of that post is that you should have your localizable strings reside as close to the user-facing modules of your application as possible. For example, if you’re developing an ASP .NET web forms application all of the localizable strings should be kept in .resx files that are associated with the .aspx views of the application. In this post I want to talk about how this same concept can be applied when designing and developing APIs. An API Facilitates Machine-to-Machine Interaction You can typically think about a web, desktop, or mobile application as a collection “views” or “screens” through which users interact with the underlying logic and data. The application can be designed based on the assumption that there will be a human being on the other end of the screen working the controls. You are designing a machine-to-person interaction and the application should be built in a way that facilitates the user’s clear understanding of what is going on. Dates should be be formatted in a way that the user will be familiar with, messages should be presented in the user’s preferred language, etc. When building an API, however, there are no screens and you can’t make assumptions about who or what is on the other end of each call. An API is, by definition, a machine-to-machine interaction. A machine-to-machine interaction should be built in a way that facilitates a clear and unambiguous understanding of what is going on. Dates and numbers should be formatted in predictable and standard ways (e.g. ISO 8601 dates) and messages should be presented in machine-parseable formats. For example, consider an API for a time tracking system that exposes a resource for creating a new time entry. The JSON for creating a new time entry for a user might look like: 1: {
2: "userId": 4532,
3: "startDateUtc": "2012-10-22T14:01:54.98432Z",
4: "endDateUtc": "2012-10-22T11:34:45.29321Z"
5: }
Note how the parameters for start and end date are both expressed as ISO 8601 compliant dates in UTC. Using a date format like this in our API leaves little room for ambiguity. It’s also important to note that using ISO 8601 dates is a much, much saner thing than the \/Date(<milliseconds since epoch>)\/ nonsense that is sometimes used in JSON serialization.
Probably the most important thing to note about the JSON snippet above is the fact that the end date comes before the start date! The API should recognize that and disallow the time entry from being created, returning an error to the caller. You might inclined to send a response that looks something like this:
1: {
2: "errors": [ {"message" : "The end date must come after the start date"}]
3: }
While this may seem like an appropriate thing to do there are a few problems with this approach:
What if there is a user somewhere on the other end of the API call that doesn’t speak English?
What if the message provided here won’t fit properly within the UI of the application that made the API call?
What if the verbiage of the message isn’t consistent with the rest of the application that made the API call?
What if there is no user directly on the other end of the API call (e.g. this is a batch job uploading time entries once per night unattended)?
The API knows nothing about the context from which the call was made. There are steps you could take to given the API some context (e.g.allow the caller to send along a language code indicating the language that the end user speaks), but that will only get you so far. As the designer of the API you could make some assumptions about how the API will be called, but if we start making assumptions we could very easily make the wrong assumptions. In this situation it’s best to make no assumptions and simply design the API in such a way that the caller has the responsibility to convey error messages in a manner that is appropriate for the context in which the error was raised. You would work around some of these problems by allowing callers to add metadata to each request describing the context from which the call is being made (e.g. accepting a ‘locale’ parameter denoting the desired language), but that will add needless clutter and complexity. It’s better to keep the API simple and push those context-specific concerns down to the caller whenever possible. For our very simple time entry example, this can be done by simply changing our error message response to look like this:
1: {
2: "errors": [ {"code": 100}]
3: }
By changing our error error from exposing a string to a numeric code that is easily parseable by another application, we’ve placed all of the responsibility for conveying the actual meaning of the error message on the caller. It’s best to have the caller be responsible for conveying this meaning because the caller understands the context much better than the API does. Now the caller can see error code 100, know that it means that the end date submitted falls before the start date and take appropriate action. Now all of the problems listed out above are non-issues because the caller can simply translate the error code of ‘100’ into the proper action and message for the current context. The numeric code representation of the error is a much better way to facilitate the machine-to-machine interaction that the API is meant to facilitate.
An API Does Have Human Users
While APIs should be built for machine-to-machine interaction, people still need to wire these interactions together. As a programmer building a client application that will consume the time entry API I would find it frustrating to have to go dig through the API documentation every time I encounter a new error code (assuming the documentation exists and is accurate). The numeric error code approach hurts the discoverability of the API and makes it painful to integrate with. We can help ease this pain by merging our two approaches:
1: {
2: "errors": [ {"code": 100, "message" : "The end date must come after the start date"}]
3: }
Now we have an easily parseable numeric error code for the machine-to-machine interaction that the API is meant to facilitate and a human-readable message for programmers working with the API. The human-readable message here is not intended to be viewed by end-users of the API and as such is not really a “localizable string” in my opinion. We could opt to expose a locale parameter for all API methods and store translations for all error messages, but that’s a lot of extra effort and overhead that doesn’t add a lot real value to the API. I might be a bit of an “ugly American”, but I think it’s probably fine to have the API return English messages when the target for those messages is a programmer. When resources are limited (which they always are), I’d argue that you’re better off hard-coding these messages in English and putting more effort into building more useful features, improving security, tweaking performance, etc.