Techniques for querying a set of object in-memory in a Java application
- by Edd Grant
Hi All,
We have a system which performs a 'coarse search' by invoking an interface on another system which returns a set of Java objects. Once we have received the search results I need to be able to further filter the resulting Java objects based on certain criteria describing the state of the attributes (e.g. from the initial objects return all objects where x.y z && a.b == c).
The criteria used to filter the set of objects each time is partially user configurable, by this I mean that users will be able to select the values and ranges to match on but the attributes they can pick from will be a fixed set.
The data sets are likely to contain <= 10,000 objects for each search. The search will be executed manually by the application user base probably no more than 2000 times a day (approx). It's probably worth mentioning that all the objects in the result set are known domain object classes which have Hibernate and JPA annotations describing their structure and relationship.
Off the top of my head I can think of 3 ways of doing this:
For each search persist the initial result set objects in our database, then use Hibernate to re-query them using the finer grained criteria.
Use an in-memory Database (such as hsqldb?) to query and refine the initial result set.
Write some custom code which iterates the initial result set and pulls out the desired records.
Option 1 seems to involve a lot of toing and froing across a network to a physical Database (Oracle 10g) which might result in a lot of network and disk activity. It would also require the results from each search to be isolated from other result sets to ensure that different searches don't interfere with each other.
Option 2 seems like a good idea in principle as it would allow me to do the finer query in memory and would not require the persistence of result data which would only be discarded after the search was complete. Gut feeling is that this could be pretty performant too but might result in larger memory overheads (which is fine as we can be pretty flexible on the amount of memory our JVM gets).
Option 3 could be very performant but is something I would like to avoid as any code we write would require such careful testing that the time taken to acheive something flexible and robust enough would probably be prohibitive.
I don't have time to prototype all 3 ideas so I am looking for comments people may have on the 3 options above, plus any further ideas I have not considered, to help me decide which idea might be most suitable. I'm currently leaning toward option 2 (in memory database) so would be keen to hear from people with experience of querying POJOs in memory too.
Hopefully I have described the situation in enough detail but don't hesitate to ask if any further information is required to better understand the scenario.
Cheers,
Edd