I have written the following code which works, but is painfully slow once I start executing it over thousands of records:
require("RJSONIO")
people_data <- data.frame(person_id=numeric(0))
json_data <- fromJSON(json_file)
n_people <- length(json_data)
for(lender in 1:n_people) {
person_dataframe <- as.data.frame(t(unlist(json_data[[person]])))
people_data <- merge(people_data, person_dataframe, all=TRUE)
}
output_file <- paste("people_data",".csv")
write.csv(people_data, file=output_file)
I am attempting to build a unified data table from a series of json-formated files. The fromJSON() function reads in the data as lists of lists. Each element of the list is a person, which then contains a list of the attributes for that person.
For example:
[[1]]
person_id
name
gender
hair_color
[[2]]
person_id
name
location
gender
height
[[...]]
structure(list(person_id = "Amy123", name = "Amy", gender = "F",
hair_color = "brown"),
.Names = c("person_id", "name", "gender", "hair_color"))
structure(list(person_id = "matt53", name = "Matt",
location = structure(c(47231, "IN"),
.Names = c("zip_code", "state")),
gender = "M", height = 172),
.Names = c("person_id", "name", "location", "gender", "height"))
The end result of the code above is matrix where the columns are every person-attribute that appears in the structure above, and the rows are the relevant values for each person. As you can see though, some data is missing for some of the people, so I need to ensure those show up as NA and make sure things end up in the right columns. Further, location itself is a vector with two components: state and zip_code, meaning it needs to be flattened to location.state and location.zip_code before it can be merged with another person record; this is what I use unlist() for. I then keep the running master table in people_data.
The above code works, but do you know of a more efficient way to accomplish what I'm trying to do? It appears the merge() is slowing this to a crawl... I have hundreds of files with hundreds of people in each file.
Thanks!
Bryan