I'm trying to create a JSON schema for the results of doing statistical analysis based on disparate pieces of data.
The current schema I have looks something like this:
{
// Basic key information.
video : "http://www.youtube.com/watch?v=7uwfjpfK0jo",
start : "00:00:00",
end : null,
// For results of analysis, to be populated:
// *** This is where it gets interesting ***
analysis : {
game : {
value: "Super Street Fighter 4: Arcade Edition Ver. 2012",
confidence: 0.9725
}
teams : [
{
player : {
value : "Desk",
confidence: 0.95,
}
characters : [
{
value : "Hakan",
confidence: 0.80
}
]
}
]
}
}
The issue is the tuples that are used to store a value and the confidence related to that value (i.e. { value : "some value", confidence : 0.85 }), populated after the results of the analysis.
This leads to a creep of this tuple for every value. Take a fully-fleshed out value from the characters array:
{
name : {
value : "Hakan",
confidence: 0.80
}
ultra : {
value: 1,
confidence: 0.90
}
}
As the structures that represent the values become more and more detailed (and more analysis is done on them to try and determine the confidence behind that analysis), the nesting of the tuples adds great deal of noise to the overall structure, considering that the final result (when verified) will be:
{
name : "Hakan",
ultra : 1
}
(And recall that this is just a nested value)
In .NET (in which I'll be using to work with this data), I'd have a little helper like this:
public class UnknownValue<T>
{
T Value { get; set; }
double? Confidence { get; set; }
}
Which I'd then use like so:
public class Character
{
public UnknownValue<Character> Name { get; set; }
}
While the same as the JSON representation in code, it doesn't have the same creep because I don't have to redefine the tuple every time and property accessors hide the appearance of creep. Of course, this is an apples-to-oranges comparison, the above is code while the JSON is data.
Is there a more formalized/cleaner/best practice way of containing the creep of these tuples in JSON, or is the approach above an accepted approach for the type of data I'm trying to store (and I'm just perceiving it the wrong way)?
Note, this is being represented in JSON because this will ultimately go in a document database (something like RavenDB or elasticsearch). I'm not concerned about being able to serialize into the object above, because I can always use data transfer objects to facilitate getting data into/out of my underlying data store.