Search Results

Search found 151 results on 7 pages for 'similarity'.

Page 2/7 | < Previous Page | 1 2 3 4 5 6 7  | Next Page >

  • Effective way to calculate a similarity percentage between data sets

    - by UltraVi01
    I am currently working with User objects -- each of which have many Goal objects. The Goal objects are not User specific, that is, Users can share the same Goal. I am attempting to fashion a way to calculate a "similarity percentage" between two Users... (i.e., taking into account how many Goals they share as well as how many Goals they do not share) Does anyone have experience with this type of situation? I am using Grails with Mysql if that is helpful. Thanks

    Read the article

  • Converting python collaborative filtering code to use Map Reduce

    - by Neil Kodner
    Using Python, I'm computing cosine similarity across items. given event data that represents a purchase (user,item), I have a list of all items 'bought' by my users. Given this input data (user,item) X,1 X,2 Y,1 Y,2 Z,2 Z,3 I build a python dictionary {1: ['X','Y'], 2 : ['X','Y','Z'], 3 : ['Z']} From that dictionary, I generate a bought/not bought matrix, also another dictionary(bnb). {1 : [1,1,0], 2 : [1,1,1], 3 : [0,0,1]} From there, I'm computing similarity between (1,2) by calculating cosine between (1,1,0) and (1,1,1), yielding 0.816496 I'm doing this by: items=[1,2,3] for item in items: for sub in items: if sub >= item: #as to not calculate similarity on the inverse sim = coSim( bnb[item], bnb[sub] ) I think the brute force approach is killing me and it only runs slower as the data gets larger. Using my trusty laptop, this calculation runs for hours when dealing with 8500 users and 3500 items. I'm trying to compute similarity for all items in my dict and it's taking longer than I'd like it to. I think this is a good candidate for MapReduce but I'm having trouble 'thinking' in terms of key/value pairs. Alternatively, is the issue with my approach and not necessarily a candidate for Map Reduce?

    Read the article

  • Computer Science taxonomy

    - by Bakhtiyor
    I am developing web application where users have collection of tags. I need to create a suggestion list for users based on the similarity of their tags. For example, when a user logs in to the system, system gets his tags and search these tags in the DB of users and showing users who have similar tags. For instance if User 1 has following tags [Linux, Apache, MySQL, PHP] and User 2 has [Windows, IIS, PHP, MySQL] it says that User 2 matchs User 1 with a weight of 50%, because he has 2 similar tags(PHP and MySQL). But imagine the situation where User 1 has [ASP, IIS, MS Access] and User 2 has [PHP, Apache, MySQL]. In this situation my system doesn't suggest User 2 as a "friend" to User 1 or vice versa. But we now that these two users has similarity on the the field of work, both works on Web Technology (or Web Programming, etc). So, that is why I need kind of taxonomy of computer science (right now, but probably I would need taxonomy of other fields also, like medicine, physics, mathematics, etc.) where these concepts are categorized and so that when I search for similarity of ASP and PHP, for example, it can say that they have similarity and belong into one group(or category). I hope I described my problem clearly, but if something wrong explained would be happy for your corrections. Thanks

    Read the article

  • How to sort linq result by most similarity/equality

    - by aNui
    I want to do a search for Music instruments which has its informations Name, Category and Origin as I asked in my post. But now I want to sort/group the result by similarity/equality to the keyword such as. If I have the list { Drum, Grand Piano, Guitar, Guitarrón, Harp, Piano} << sorted by name and if I queried "p" the result should be { Piano, Grand Piano, Harp } but it shows Harp first because of the source list's sequence and if I add {Grand Piano} to the list and query "piano" the result shoud be like { Piano, Grand Piano } or query "guitar" it should be { Guitar, Guitarrón } here's my code static IEnumerable<MInstrument> InstrumentsSearch(IEnumerable<MInstrument> InstrumentsList, string query, MInstrument.Category[] SelectedCategories, MInstrument.Origin[] SelectedOrigins) { var result = InstrumentsList .Where(item => SelectedCategories.Contains(item.category)) .Where(item => SelectedOrigins.Contains(item.origin)) .Where(item => { if ( (" " + item.Name.ToLower()).Contains(" " + query.ToLower()) || item.Name.IndexOf(query) != -1 ) { return true; } return false; } ) .Take(30); return result.ToList<MInstrument>(); } Or the result may be like my old self-invented algorithm that I called "by order of occurence", that is just OK to me. And the further things to do is I need to search the Name, Category or Origin such as. If i type "Italy" it should found Piano or something from Italy. Or if I type "string" it should found Guitar. Is there any way to do those things, please tell me. Thanks in advance.

    Read the article

  • LINQ : How to query how to sort result by most similarity/equality

    - by aNui
    I want to do a search for Music instruments which has its informations Name, Category and Origin as I asked in my post. But now I want to sort/group the result by similarity/equality to the keyword such as. If I have the list { Harp, Piano, Drum, Guitar, Guitarrón } and if I queried "p" the result should be { Piano, Harp } but it shows Harp first because of the list's sequence and if I add {Grand Piano} to the list and query "piano" the result shoud be like { Piano, Grand Piano } here's my code static IEnumerable<MInstrument> InstrumentsSearch(IEnumerable<MInstrument> InstrumentsList, string query, MInstrument.Category[] SelectedCategories, MInstrument.Origin[] SelectedOrigins) { var result = InstrumentsList .Where(item => SelectedCategories.Contains(item.category)) .Where(item => SelectedOrigins.Contains(item.origin)) .Where(item => { if ( (" " + item.Name.ToLower()).Contains(" " + query.ToLower()) || item.Name.IndexOf(query) != -1 ) { return true; } return false; } ) .Take(30); return result.ToList<MInstrument>(); } Or the result may be like my old self-invented algorithm that I called "by order of occurence", that is just OK to me. Is there any way to do that, please tell me. Thanks in advance.

    Read the article

  • CBIR isk-daemon software alternatives

    - by postgres
    isk-daemon software permits setting feature parameters? I download desktop version and apparently not! There are alternative libraries or software which is not all automatic? I have a set of images and i only want to pick up an image, extract some features and compare them with this set of image, basing on a similarity metric that give me a percentage result like isk-daemon but with more freedom in settings, better in python!

    Read the article

  • How to determine a text block of a file in one version come from which file in the previous version?

    - by Muhammad Asaduzzaman
    The problem is described below: Suppose I have a list of files in one version(say A,B,C,D). In the next version I have the following files(A,E,F,G). There are some similarities in their contents. The files in the later version comes from the previous version by file name renaming, content addition, deletion or partial modification or without any change( for example A is not changed). I take a block of text from a file(E, 2nd version) and check which files(in the 1st version) contain this text block. I found that B,C and D contain the text fragment. I want to determine from which file(B or c or d) this text block actually comes from.(I assume that E is a file whose name change in the second version). Since the contents may be changed, added or deleted in the later version, so in order to determine similarity I use LCS algorithm. But I cannot map the file with its previous version. I think one possible approach might be to use the location information of the match text blocks. But this heuristics not always work. Is there any research or algorithm exist to find so. Any direction will be helpful. Thanks in advance.

    Read the article

  • Java: JPQL search -similar- strings

    - by bguiz
    What methods are there to get JPQL to match similar strings? By similar I mean: Contains: search string is found within the string of the matches entity Case-insensitive Small mispellings: e.g. "arow" matches "arrow" I suspect the first two will be easy, however, I would appreciate help with the last one Thank you

    Read the article

  • About curse of dimensionality

    - by Dan
    My question is about this topic I've been reading about a bit. Basically my understanding is that in higher dimensions all points end up being very close to each other. The doubt I have is whether this means that calculating distances the usual way (euclidean for instance) is valid or not. If it were still valid, this would mean that when comparing vectors in high dimensions, the two most similar wouldn't differ much from a third one even when this third one could be completely unrelated. Is this correct? Then in this case, how would you be able to tell whether you have a match or not?

    Read the article

  • MYSQL - SQL query Getting single record for the similar records and populating other columns with which has more length

    - by Bujji
    Here is my case , I have a database table with below fields name place_code email phone address details estd others and example data if you look at the above example table First three records are talking about XYZ and place code 1020 . I want create a single record for these three records based on substring(name,1,4) place_code ( I am lucky here for all the similar records satisfies this condition and unique in the table .) For the other columns which record column length has max . For example again for the above 3 records email should be [email protected] , phone should be 657890 and details should be "testdetails" This should be done for all the table . (Some has single records and some has max 10 records ) Any help on query that helps me to get the desired result ? Thank You Regards Kiran

    Read the article

  • Calculating similarites between sentences

    - by codecreator
    I have datbase with thousands of rows of error logs and their description.This error log is for an application that running 24/7. I want to create a dashboard/UI to view the current common errors happening for prodcution support. The problem I am having is that even though there are lot of common errors, the error description differs by the transcation ID or user ID or things that are unique for that sigle prcoess. e.g Error trasaction XYz failed for user 233 e.g 2. Error trasaction XYz failed for user 567 I consider these two erros to be same. So I want to a program that will go through the new error logs and classify them into groups. I am trying to use "edit distance" but its very slow.Since I alraedy have old error logs, i am trying to think of solutions using that information too. Any thoughts?

    Read the article

  • Simple NLP: How to use ngram to do word similarity?

    - by sadawd
    Dear Everyone, I Hear that google uses up to 7-grams for their own data. I am interested in finding words that are similar in context (i.e. cat and dog) and I was wondering how do I compute the similarity of two words on a n-gram model given that n 2. Given a sample set like this forexample: (I, love cats), (cats, loves, dogs), (dogs, hate, human) What is a good way to compare the similarity of this pair (I, cats)? Also does anyone know of anyway to do levels for NLP? like: Army-Military-Solider ?

    Read the article

  • What's the best way to calculate similarity between rows in a table based on association?

    - by André Pena
    Suppose each Person has a collection of favorite Books. So I have a table for: Person Book The association between Person and Book (joint table for MxN) I want to fetch the Persons that are similar to a Person1 based on the favorite Books overlaping. That is: The more books they have in common, the more they are similar. I don't have to use only SQL to solve this problem. I could use programming also. I'm using SQL Server 2008 and C#. What solution would you experts use?

    Read the article

  • NLP: any easy and good methods to find semantic similarity between words?

    - by sadawd
    Dear Everyone, I don't know whether stackoverflow covers NLP, so I am gonna give this a shot. I am interested to find the semantic relatedness of two words from a specific domain, i.e. "image quality" and "noise". I am doing some research to determine if reviews of a cameras are positive or negative for a particular attribute of the camera. (like image quality in each one of the reviews). However, not everybody uses the exact same wording "image quality" in the posts, so I am out to see if there is a way for me to build something like that: "image quality" which includes ("noise", "color", "sharpness", etc etc) so I can wrap all everything within one big umbrella. I am doing this for another language, so Wordnet is not necessarily helpful. And no, I do now work for Google or Microsoft so I do not have data from people's clicking behavior as input data either. However, I do have a lot of text, pos-tagged, segmented etc. Thanks

    Read the article

  • Mahout - Clustering - "naming" the cluster elements

    - by Mark Bramnik
    I'm doing some research and I'm playing with Apache Mahout 0.6 My purpose is to build a system which will name different categories of documents based on user input. The documents are not known in advance and I don't know also which categories do I have while collecting these documents. But I do know, that all the documents in the model should belong to one of the predefined categories. For example: Lets say I've collected a N documents, that belong to 3 different groups : Politics Madonna (pop-star) Science fiction I don't know what document belongs to what category, but I know that each one of my N documents belongs to one of those categories (e.g. there are no documents about, say basketball among these N docs) So, I came up with the following idea: Apply mahout clustering (for example k-mean with k=3 on these documents) This should divide the N documents to 3 groups. This should be kind of my model to learn with. I still don't know which document really belongs to which group, but at least the documents are clustered now by group Ask the user to find any document in the web that should be about 'Madonna' (I can't show to the user none of my N documents, its a restriction). Then I want to measure 'similarity' of this document and each one of 3 groups. I expect to see that the measurement for similarity between user_doc and documents in Madonna group in the model will be higher than the similarity between the user_doc and documents about politics. I've managed to produce the cluster of documents using 'Mahout in Action' book. But I don't understand how should I use Mahout to measure similarity between the 'ready' cluster group of document and one given document. I thought about rerunning the cluster with k=3 for N+1 documents with the same centroids (in terms of k-mean clustering) and see whether where the new document falls, but maybe there is any other way to do that? Is it possible to do with Mahout or my idea is conceptually wrong? (example in terms of Mahout API would be really good) Thanks a lot and sorry for a long question (couldn't describe it better) Any help is highly appreciated P.S. This is not a home-work project :)

    Read the article

  • Using MinHash to find similiarities between 2 images

    - by Sung Meister
    I am using MinHash algorithm to find similar images between images. I have run across this post, How can I recognize slightly modified images? which pointed me to MinHash algorithm. Being a bit mathematically challenged, I was using a C# implementation from this blog post, Set Similarity and Min Hash. But while trying to use the implementation, I have run into 2 problems. What value should I set universe value to? When passing image byte array to HashSet, it only contains distinct byte values; thus comparing values from 1 ~ 256. What is this universe in MinHash? And what can I do to improve the C# MinHash implementation? Since HashSet<byte> contains values upto 256, similarity value always come out to 1. Here is the source that uses the C# MinHash implementation from Set Similarity and Min Hash: class Program { static void Main(string[] args) { var imageSet1 = GetImageByte(@".\Images\01.JPG"); var imageSet2 = GetImageByte(@".\Images\02.TIF"); //var app = new MinHash(256); var app = new MinHash(Math.Min(imageSet1.Count, imageSet2.Count)); double imageSimilarity = app.Similarity(imageSet1, imageSet2); Console.WriteLine("similarity = {0}", imageSimilarity); } private static HashSet<byte> GetImageByte(string imagePath) { using (var fs = new FileStream(imagePath, FileMode.Open, FileAccess.Read)) using (var br = new BinaryReader(fs)) { //List<int> bytes = br.ReadBytes((int)fs.Length).Cast<int>().ToList(); var bytes = new List<byte>(br.ReadBytes((int) fs.Length).ToArray()); return new HashSet<byte>(bytes); } } }

    Read the article

  • Collaborative Filtering Program: What to do for a Pearson Score When There Isn't Enough Data

    - by Mike
    I'm building a recommendation engine using collaborative filtering. For similarity scores, I use a Pearson correlation. This is great most of the time, but sometimes I have users that only share a 1 or 2 fields. For example: User 1{ a: 4 b: 2 } User 2{ a: 4 b: 3 } Since this is only 2 data points, a Pearson correlation would always be 1 (a straight line or perfect correlation). This obviously isn't what I want, so what value should I use instead? I could just throw away all instances like this (give a correlation of 0), but my data is really sparse right now and I don't want to lose anything. Is there any similarity score I could use that would fit in with the rest of my similarity scores (all Pearson)?

    Read the article

  • Improving performance of fuzzy string matching against a dictionary [closed]

    - by Nathan Harmston
    Hi, So I'm currently working for with using SecondString for fuzzy string matching, where I have a large dictionary to compare to (with each entry in the dictionary has an associated non-unique identifier). I am currently using a hashMap to store this dictionary. When I want to do fuzzy string matching, I first check to see if the string is in the hashMap and then I iterate through all of the other potential keys, calculating the string similarity and storing the k,v pair/s with the highest similarity. Depending on which dictionary I am using this can take a long time ( 12330 - 1800035 entries ). Is there any way to speed this up or make it faster? I am currently writing a memoization function/table as a way of speeding this up, but can anyone else think of a better way to improve the speed of this? Maybe a different structure or something else I'm missing. Many thanks in advance, Nathan

    Read the article

  • fileToSTring keeps on returning " "

    - by karikari
    I managed to get this code to compile with out error. But somehow it did not return the strings that I wrote inside file1.txt and file.txt that I pass its path through str1 and str2. My objective is to use this open source library to measure the similarity between strings contains inside 2 text files. Inside the its Javadoc, its states that ... public static java.lang.StringBuffer fileToString(java.io.File f) private call to load a file and return its content as a string. Parameters: f - a file for which to load its content Returns: a string containing the files contents or "" if empty or not present Here's is my modified code trying to use the FileLoader function, but fails to return the strings inside the file. The end result keeps on returning me the "" . I do not know where is my fault: package uk.ac.shef.wit.simmetrics; import java.io.File; import uk.ac.shef.wit.simmetrics.similaritymetrics.*; import uk.ac.shef.wit.simmetrics.utils.*; public class SimpleExample { public static void main(final String[] args) { if(args.length != 2) { usage(); } else { String str1 = "arg[0]"; String str2 = "arg[1]"; File objFile1 = new File(str1); File objFile2 = new File(str2); FileLoader obj1 = new FileLoader(); FileLoader obj2 = new FileLoader(); str1 = obj1.fileToString(objFile1).toString(); str2 = obj2.fileToString(objFile2).toString(); System.out.println(str1); System.out.println(str2); AbstractStringMetric metric = new MongeElkan(); //this single line performs the similarity test float result = metric.getSimilarity(str1, str2); //outputs the results outputResult(result, metric, str1, str2); } } private static void outputResult(final float result, final AbstractStringMetric metric, final String str1, final String str2) { System.out.println("Using Metric " + metric.getShortDescriptionString() + " on strings \"" + str1 + "\" & \"" + str2 + "\" gives a similarity score of " + result); } private static void usage() { System.out.println("Performs a rudimentary string metric comparison from the arguments given.\n\tArgs:\n\t\t1) String1 to compare\n\t\t2)String2 to compare\n\n\tReturns:\n\t\tA standard output (command line of the similarity metric with the given test strings, for more details of this simple class please see the SimpleExample.java source file)"); } }

    Read the article

  • Finding partial substrings within a string

    - by Peter Chang
    I have two strings which must be compared for similarity. The algorithm must be designed to find the maximal similarity. In this instance, the ordering matters, but intervening (or missing) characters do not. Edit distance cannot be used in this case for various reasons. The situation is basically as follows: string 1: ABCDEFG string 2: AFENBCDGRDLFG the resulting algorithm would find the substrings A, BCD, FG I currently have a recursive solution, but because this must be run on massive amounts of data, any improvements would be greatly appreciated

    Read the article

< Previous Page | 1 2 3 4 5 6 7  | Next Page >