best way to statistically detect anomalies in data
- by reinier
Hi,
our webapp collects huge amount of data about user actions, network business, database load, etc etc etc
All data is stored in warehouses and we have quite a lot of interesting views on this data.
if something odd happens chances are, it shows up somewhere in the data.
However, to manually detect if something out of the ordinary is going on, one has to continually look through this data, and look for oddities.
My question: what is the best way to detect changes in dynamic data which can be seen as 'out of the ordinary'.
Are bayesan filters (I've seen these mentioned when reading about spam detection) the way to go?
Any pointers would be great!
EDIT:
To clarify the data for example shows a daily curve of database load.
This curve typically looks similar to the curve from yesterday
In time this curve might change slowly.
It would be nice that if the curve from day to day changes say within some perimeters, a warning could go off.
R