Count unique visitors by group of visited places
- by Mathieu
I'm facing the problem of counting the unique visitors of groups of places.
Here is the situation:
I have visitors that can visit places. For example, that can be internet users visiting web pages, or customers going to restaurants. A visitor can visit as much places as he wishes, and a place can be visited by several visitors. A visitor can come to the same place several times.
The places belong to groups. A group can obviously contain several places, and places can belong to several groups.
Given that, for each visitor, we can have a list of visited places, how can I have the number of unique visitors per group of places?
Example: I have visitors A, B, C and D; and I have places x, y and z.
I have these visiting lists:
[
A -> [x,x,y,x],
B -> [],
C -> [z,z],
D -> [y,x,x,z]
]
Having these number of unique visitors per place is quite easy:
[
x -> 2, // A and D visited x
y -> 2, // A and D visited y
z -> 2 // C and D visited z
]
But if I have these groups:
[
G1 -> [x,y,z],
G2 -> [x,z],
G3 -> [x,y]
]
How can I have this information?
[
G1 -> 3, // A, C and D visited x or y or z
G2 -> 3, // A, C and D visited x or z
G3 -> 2 // A and D visited x or y
]
Additional notes :
There are so many places that it is not possible to store information about every possible group;
It's not a problem if approximation are made. I don't need 100% precision. Having a fast algorithm that tells me that there were 12345 visits in a group instead of 12543 is better than a slow algorithm telling the exact number. Let's say there can be ~5% deviation.
Is there an algorithm or class of algorithms that addresses this type of problem?