Asynchronously returning a hierarchal data using .NET TPL... what should my return object "look" like?
- by makerofthings7
I want to use the .NET TPL to asynchronously do a DIR /S and search each subdirectory on a hard drive, and want to search for a word in each file... what should my API look like?
In this scenario I know that each sub directory will have 0..10000 files or 0...10000 directories. I know the tree is unbalanced and want to return data (in relation to its position in the hierarchy) as soon as it's available. I am interested in getting data as quickly as possible, but also want to update that result if "better" data is found (better means closer to the root of c:)
I may also be interested in finding all matches in relation to its position in the hierarchy. (akin to a report)
Question:
How should I return data to my caller?
My first guess is that I think I need a shared object that will maintain the current "status" of the traversal (started | notstarted | complete ) , and might base it on the System.Collections.Concurrent.
Another idea that I'm considering is the consumer/producer pattern (which ConcurrentCollections can handle) however I'm not sure what the objects "look" like.
Optional Logical Constraint: The API doesn't have to address this, but in my "real world" design, if a directory has files, then only one file will ever contain the word I'm looking for. If someone were to literally do a DIR /S as described above then they would need to account for more than one matching file per subdirectory.
More information :
I'm using Azure Tables to store a hierarchy of data using these TPL extension methods. A "node" is a table. Not only does each node in the hierarchy have a relation to any number of nodes, but it's possible for each node to have a reciprocal link back to any other node. This may have issues with recursion but I'm addressing that with a shared object in my recursion loop.
Note that each "node" also has the ability to store local data unique to that node. It is this information that I'm searching for. In other words, I'm searching for a specific fixed RowKey in a hierarchy of nodes.
When I search for the fixed RowKey in the hierarchy I'm interested in getting the results FAST (first node found) but prefer data that is "closer" to the starting point of the hierarchy.
Since many nodes may have the particular RowKey I'm interested in, sometimes I may want to get a report of ALL the nodes that contain this RowKey.