How to determine the source of a request in a distributed service system?
Posted
by
Kabumbus
on Programmers
See other posts from Programmers
or by Kabumbus
Published on 2011-09-03T17:49:38Z
Indexed on
2011/11/15
2:10 UTC
Read the original article
Hit count: 291
Map/Reduce is a great concept for sorting large quantities of data at once. What to do if you have small parts of data and you need to reduce it all the time?
Simple example - choosing a service for request.
Imagine we have 10
services. Each provides services host with sets of request headers
and post/get arguments. Each service declares it has 30
unique keys - 10
per set.
service A:
name
id
...
Now imagine we have a distributed services host. We have 200
machines with 10
services on each. Each service has 30
unique keys in there sets. but now to find to which service to map the incoming request we make our services post unique values that map to that sets. We can have up to or more than 10 000
such values sets on each machine per each service.
service A machine 1
name = Sam
id = 13245
...
service A machine 1
name = Ben
id = 33232
...
...
service A machine 100
name = Ron
id = 777888
...
So we get 200 * 10 * 30 * 30 * 10 000 == 18 000 000 000
and we get 500
requests per second on our gateway each containing 45
items 15
of which are just noise
. And our task is to find a service for request (at least a machine it is running on).
On all machines all over cluster for same services we have same rules.
We can first select to which service came our request via rules filter 10 * 30
. and we will have 200 * 30 * 10 000 == 60 000 000
.
So... 60
mil is definitely a problem...
I hope to get on idea of mapping 30 * 10 000
onto some artificial neural network alike Perceptron that outputs 1
if 30
words (some hashes from words) from the request are correct or if less than Perceptron should return 0. And I’ll send each such Perceptron for each service from each machine to gateway. So I would have a map Perceptron <-> machine
for each service.
Can any one tall me if my Perceptron idea is at least “sane”? Or normal people do it some other way? Or if there are better ANNs for such purposes?
© Programmers or respective owner