A good machine learning technique to weed out good URLs from bad
Posted
by git-noob
on Stack Overflow
See other posts from Stack Overflow
or by git-noob
Published on 2010-03-11T14:11:30Z
Indexed on
2010/03/12
11:47 UTC
Read the original article
Hit count: 668
machine-learning
|svm
Hi,
I have an application that needs to discriminate between good HTTP GET requests and bad.
For example:
http://somesite.com?passes=dodgy+parameter # BAD
http://anothersite.com?passes=a+good+parameter # GOOD
My system can make a binary decision about whether or not a URL is good or bad - but ideally I would like it to predict whether or not a previously unseen URL is good or bad.
http://some-new-site.com?passes=a+really+dodgy+parameter # BAD
I feel the need for a support vector machine (SVM) ... but I need to learn machine learning. Some questions:
1) Is an SVM appropriate for this task? 2) Can I train it with the raw URLs? - without explicitly specifying 'features' 3) How many URLs will I need for it to be good at predictions? 4) What kind of SVM kernel should I use? 5) After I train it, how do I keep it up to date? 6) How do I test unseen URLs again the SVM to decide whether it's good or bad? I
© Stack Overflow or respective owner