How to determine the source of a request in a distributed service system?
- by Kabumbus
Map/Reduce is a great concept for sorting large quantities of data at once. What to do if you have small parts of data and you need to reduce it all the time?
Simple example - choosing a service for request.
Imagine we have 10 services. Each provides services host with sets of request headers and post/get arguments. Each service declares it has 30 unique keys - 10 per set.
service A:
name
id
...
Now imagine we have a distributed services host. We have 200 machines with 10 services on each. Each service has 30 unique keys in there sets. but now to find to which service to map the incoming request we make our services post unique values that map to that sets. We can have up to or more than 10 000 such values sets on each machine per each service.
service A machine 1
name = Sam
id = 13245
...
service A machine 1
name = Ben
id = 33232
...
...
service A machine 100
name = Ron
id = 777888
...
So we get 200 * 10 * 30 * 30 * 10 000 == 18 000 000 000 and we get 500 requests per second on our gateway each containing 45 items 15 of which are just noise. And our task is to find a service for request (at least a machine it is running on).
On all machines all over cluster for same services we have same rules.
We can first select to which service came our request via rules filter 10 * 30. and we will have 200 * 30 * 10 000 == 60 000 000.
So... 60 mil is definitely a problem...
I hope to get on idea of mapping 30 * 10 000 onto some artificial neural network alike Perceptron that outputs 1 if 30 words (some hashes from words) from the request are correct or if less than Perceptron should return 0. And I’ll send each such Perceptron for each service from each machine to gateway. So I would have a map Perceptron <-> machine for each service.
Can any one tall me if my Perceptron idea is at least “sane”? Or normal people do it some other way? Or if there are better ANNs for such purposes?