Search Results

Search found 151 results on 7 pages for 'similarity'.

Page 3/7 | < Previous Page | 1 2 3 4 5 6 7  | Next Page >

  • Detect similar sounding words in Ruby

    - by JP
    I'm aware of SOUNDEX and (double) Metaphone, but these don't let me test for the similarity of words as a whole - for example "Hi" sounds very similar to "Bye", but both of these methods will mark them as completely different. Are there any libraries in Ruby, or any methods you know of, that are capable of determining the similarity between two words? (Either a boolean is/isn't similar, or numerical 40% similar) edit: Extra bonus points if there is an easy method to 'drop in' a different dialect or language!

    Read the article

  • The algorithm used to generate recommendations in Google News?

    - by Siddhant
    Hi everyone. I'm study recommendation engines, and I went through the paper that defines how Google News generates recommendations to users for news items which might be of their interest, based on collaborative filtering. One interesting technique that they mention is Minhashing. I went through what it does, but I'm pretty sure that what I have is a fuzzy idea and there is a strong chance that I'm wrong. The following is what I could make out of it :- Collect a set of all news items. Define a hash function for a user. This hash function returns the index of the first item from the news items which this user viewed, in the list of all news items. Collect, say "n" number of such values, and represent a user with this list of values. Based on the similarity count between these lists, we can calculate the similarity between users as the number of common items. This reduces the number of comparisons a lot. Based on these similarity measures, group users into different clusters. This is just what I think it might be. In Step 2, instead of defining a constant hash function, it might be possible that we vary the hash function in a way that it returns the index of a different element. So one hash function could return the index of the first element from the user's list, another hash function could return the index of the second element from the user's list, and so on. So the nature of the hash function satisfying the minwise independent permutations condition, this does sound like a possible approach. Could anyone please confirm if what I think is correct? Or the minhashing portion of Google News Recommendations, functions in some other way? I'm new to internal implementations of recommendations. Any help is appreciated a lot. Thanks!

    Read the article

  • How can I evaluate the connectedness of my nodes?

    - by Travis Leleu
    I've got a space that has nodes that are all interconnected, based on a "similarity score". I would like to determine how "connected" a node is with the others. My purpose is to find nodes that are poorly connected to make sure that the backlink from the other node is prioritized. Perhaps an example would help. I've got a web page that links to my other pages based on a similarity score. Suppose I have the pages: A, B, C, ... A has a backlink from every other page, so it's very well connected. It also has links to all my other pages (each line in the graph is essentially bidirectional). B only has 1 backlink, from A. C has a link from A and D. I would like to make sure that the A-B link is prioritized over the A-C link (even if the similarity score between C and A is higher than B and A). In short, I would like to evaluate which nodes are least and best connected, so that I can mangle the results to my means. I believe this is Graph Connectedness, but I'm at a loss to develop a (simple) algorithm that will help me here. Simply counting the backlinks to a node may be a starting point -- but then how do I take the next step, which is to properly weight the links on the original node (A, in the example above)?

    Read the article

  • Approaches for Content-based Item Recommendations

    - by PartlyCloudy
    Hello, I'm currently developing an application where I want to group similar items. Items (like videos) can be created by users and also their attributes can be altered or extended later (like new tags). Instead of relying on users' preferences as most collaborative filtering mechanisms do, I want to compare item similarity based on the items' attributes (like similar length, similar colors, similar set of tags, etc.). The computation is necessary for two main purposes: Suggesting x similar items for a given item and for clustering into groups of similar items. My application so far is follows an asynchronous design and I want to decouple this clustering component as far as possible. The creation of new items or the addition of new attributes for an existing item will be advertised by publishing events the component can then consume. Computations can be provided best-effort and "snapshotted", which means that I'm okay with the best result possible at a given point in time, although result quality will eventually increase. So I am now searching for appropriate algorithms to compute both similar items and clusters. At important constraint is scalability. Initially the application has to handle a few thousand items, but later million items might be possible as well. Of course, computations will then be executed on additional nodes, but the algorithm itself should scale. It would also be nice if the algorithm supports some kind of incremental mode on partial changes of the data. My initial thought of comparing each item with each other and storing the numerical similarity sounds a little bit crude. Also, it requires n*(n-1)/2 entries for storing all similarities and any change or new item will eventually cause n similarity computations. Thanks in advance! UPDATE tl;dr To clarify what I want, here is my targeted scenario: User generate entries (think of documents) User edit entry meta data (think of tags) And here is what my system should provide: List of similar entries to a given item as recommendation Clusters of similar entries Both calculations should be based on: The meta data/attributes of entries (i.e. usage of similar tags) Thus, the distance of two entries using appropriate metrics NOT based on user votings, preferences or actions (unlike collaborative filtering). Although users may create entries and change attributes, the computation should only take into account the items and their attributes, and not the users associated with (just like a system where only items and no users exist). Ideally, the algorithm should support: permanent changes of attributes of an entry incrementally compute similar entries/clusters on changes scale something better than a simple distance table, if possible (because of the O(n²) space complexity)

    Read the article

  • SQL SERVER – Simple Explanation and Puzzle with SOUNDEX Function and DIFFERENCE Function

    - by pinaldave
    Earlier this week I asked a question where I asked how to Swap Values of the column without using CASE Statement. Read here: A Puzzle – Swap Value of Column Without Case Statement,there were more than 50 solutions proposed in the comment. There were many creative solutions. I have mentioned my personal favorite (different ones) here: Solution of Puzzle – Swap Value of Column Without Case Statement. However, I received lots of questions regarding one of the Solution by SIJIN KUMAR V P. He has used the function SOUNDEX in his solution. The request was to explain how SOUNDEX and DIFFERENCE works. Well, there are pretty decent documentations provided over here SOUNDEX function and DIFFERENCE over on MSDN and if I attempt to explain this function I will end up writing the same details which are available on MSDN. Instead of writing theory, we will try to learn this function by using a couple of simple puzzles. You try to solve the puzzles using the MSDN and see if you can learn something very quickly. In simple words - SOUNDEX converts an alphanumeric string to a four-character code to find similar-sounding words or names. The first character of the code is the first character of character_expression and the second through fourth characters of the code are numbers that represent the letters in the expression. Vowels incharacter_expression are ignored unless they are the first letter of the string. DIFFERENCE function returns an integer value. The  integer returned is the number of characters in the SOUNDEX values that are the same. The return value ranges from 0 through 4: 0 indicates weak or no similarity, and 4 indicates strong similarity or the same values. Learning Puzzle 1: Now let us run following four queries and observe its output. SELECT SOUNDEX('SQLAuthority') SdxValue SELECT SOUNDEX('SLTR') SdxValue SELECT SOUNDEX('SaLaTaRa') SdxValue SELECT SOUNDEX('SaLaTaRaM') SdxValue When you look at the result set all the four values are same. The reason for all the values to be same is as for SQL Server SOUNDEX function all the four strings are similarly sounding string. Learning Puzzle 2: Now let us run following five queries and observe its output. SELECT DIFFERENCE (SOUNDEX('SLTR'),SOUNDEX('SQLAuthority')) SELECT DIFFERENCE (SOUNDEX('TH'),SOUNDEX('SQLAuthority')) SELECT DIFFERENCE ('SQLAuthority',SOUNDEX('SQLAuthority')) SELECT DIFFERENCE ('SLTR',SOUNDEX('SQLAuthority')) SELECT DIFFERENCE ('SLTR','SQLAuthority') When you look at the result set you will get the result in the ranges from 1 to 4. Here is how it works if your result is 0 which means absolutely not relevant to each other and if your result is 1 which means the results are relevant to each other. Have you ever used above two functions in your business need or on production server? If yes, would you please leave a comment with use cases. I believe it will be beneficial to everyone. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Ignore "Bad: new and old password are too similar"

    - by user999
    I receive this message when trying to change my password: "Bad: new and old password are too similar" The passwords' "similarity" is irrelevant for my needs, so I'd like to bypass this. I tried sudo passwd $my_username I thought this had worked because I got a message: passwd: password updated successfully However, the password change has no effect after leaving the terminal, and my old password is still the only one recognized. Any ideas? thanks

    Read the article

  • How are Implicit-Heap dynamic Storage Binding and Dynamic type binding similar?

    - by Appy
    "Concepts of Programming languages" by Robert Sebesta says - Implicit Heap-Dynamic Storage Binding: Implicit Heap-Dynamic variables are bound to heap storage only when they are assigned values. It is similar to dynamic type binding. Can anyone explain the similarity with suitable examples. I understand the meaning of both the phrases, but I am an amateur when it comes to in-depth details.

    Read the article

  • Manipulate score/rank on query results from NHibernate.Search

    - by Fernando Figueiredo
    I've been working with NHibernate, NHibernate.Search and Lucene.Net to improve the search engine used on the website I develop. Basically, I use it to search contents of corporations specification documents. This is not to be confused with Lucene's notion of documents: in my case, a specification document (which I'll hereafter call a "specdoc") can contain many pages, and the content of these pages are the ones that are actually indexed (thus, the pages themselves are the ones that fall into Lucene's concept of documents). So, the pages belong to a specdoc, that in turn belong to a corporation (so, a corporation can have many specdocs). I'm using NHibernate.Search "IndexEmbedded" and "ContainedIn" attributes to associate the pages with their specdoc and the specdocs to their corporations, so I can query for terms in specdoc pages and have Lucene/NH.Search return either the pages themselves, the specdocs, or the corporations that match the query on the pages. I can query this way and get ranked results, thus presenting results (that is, corporations, specdocs or pages) by relevance, which is great. But now I need something more. Specifically in the case where I query terms and have NH.Search return the corporations that match, I need to manually/artificially tune the score of some of the results, because there are corporations that I want to show up on the top of the result set - think of "sponsored results". I'm thinking of doing it on my application, maybe creating an entity/database table that contain an association to the corporation entity, and a score boost value. But I don't know how to feed this to Lucene and have it boost the results accordingly at search time. Initially I thought about deriving a Similarity class to do this, but it doesn't look like Similarity can be used to modify result sets at search time. As per this page, it looks like what I need is to mess around with weight or scoring. But the docs are a little superficial in that there are no examples on how to implement a custom scoring, let alone integrate it with NH.Search. So, does anyone know how to do this, or point me to some documentation or working example on how to do something similar? Thanks!

    Read the article

  • In a digital photo, how can I detect if a mountain is obscured by clouds?

    - by Gavin Brock
    The problem I have a collection of digital photos of a mountain in Japan. However the mountain is often obscured by clouds or fog. What techniques can I use to detect that the mountain is visible in the image? I am currently using Perl with the Imager module, but open to alternatives. All the images are taken from the exact same position - these are some samples. My naïve solution I started by taking several horizontal pixel samples of the mountain cone and comparing the brightness values to other samples from the sky. This worked well for differentiating good image 1 and bad image 2. However in the autumn it snowed and the mountain became brighter than the sky, like image 3, and my simple brightness test started to fail. Image 4 is an example of an edge case. I would classify this as a good image since some of the mountain is clearly visible. UPDATE 1 Thank you for the suggestions - I am happy you all vastly over-estimated my competence. Based on the answers, I have started trying the ImageMagick edge-detect transform, which gives me a much simpler image to analyze. convert sample.jpg -edge 1 edge.jpg I assume I should use some kind of masking to get rid of the trees and most of the clouds. Once I have the masked image, what is the best way to compare the similarity to a 'good' image? I guess the "compare" command suited for this job? How do I get a numeric 'similarity' value from this?

    Read the article

  • Full list of top-like tool family for perfomance monitoring in linux: iftop iotop htop atop more? [closed]

    - by Yauhen Yakimovich
    Top-like utilities are extremely handful in my work and I want to make sure I am not missing any of them. Please extend the following list of performance monitoring (top-like) family of linux tools: top - original tool htop - adds support to multicore/cpu iotop - input/output monitoring iftop - network monitoring atop - merges previous elements into a single overview slabtop – displays a listing of the top caches ? The only criteria - maturity and similarity in function/style.

    Read the article

  • Configuring thouands of related products in Magento?

    - by Anonymous -
    I'm at a stage with a Magento store I'm developing where I've added all the products (all 6000 of them) and now would like to configure related products to up my conversion rate a bit. I was wondering if there was an extension anybody knew of that functions similarly to this one, with the most current version of Magento (Community Edition, 1.6.1). If not, would anyone be able to provide some pointers for writing a script that will run through each product and add 1-5 related products. I have a fairly basic idea of taking product title text and just doing a simple text similarity query between other product titles for now, just to get some related products up there, but the Magento database isn't making a terribly large amount of sense. Thanks to anyone who can shed some light on this. :)

    Read the article

  • Comparing a saved movement with other movement with Kinect

    - by Ewerton
    I need to develop an application where a user (physiotherapist) will perform a movement in front of the Kinect, I'll write the data movement in the database and then the patient will try to imitate this motion. The system will calculate the similarity between the movement recorded and executed. My first idea is, during recording (each 5 second, by example), to store the position (x, y, z) of the points and then compare them in the execution time(by patient). I know that this approach is too simple, because I imagine that in people of different sizes the skeleton is recognized differently, so the comparison is not reliable. My question is about the best way to compare a saved motion with a movement executed (on the fly). PS: Sorry by my English.

    Read the article

  • Google Webmaster Tools reports fake 404 errors

    - by Edgar Quintero
    I have a website where Google Webmaster Tools reports 15,000 links as 404 errors. However, all links return a 200 when I visit them. The problem is, that eventhough I can visit these pages and return a 200, all those 15,000 pages won't index in Google. They aren't appearing in search results. These are constant errors Google Webmaster Tools keeps reporting and I'm not sure what the problem is. We've thought of a DNS issue, but it shouldn't be a DNS issue, because if it were, no page would be indexed (I have 10,000 perfectly indexed). Regarding URL parameters, my pages do not share a similarity in URL parameters that can make it obvious to me what could be causing the error.

    Read the article

  • Is there any negative impact with similar page titles and descriptions on similar sites?

    - by ElHaix
    Currently we have Canadian versions of some websites. We are going to create some American versions, which essentially have everything the same, except the search results are geo-specific to the USA. The format for the results page title and descriptions will remain the same, ie {0} in {1} | Find more {0} etc etc etc... {1}. The search term will most-likely be the same between both sites. Will the relative similarity in the page titles and descriptions between the CDN and USA sites have any negative SEO impact, where the geo location would be the most significant difference?

    Read the article

  • Steam, launching Team Fortress 2: libGL.so.1: wrong ELF class: ELFCLASS64

    - by stefanhgm
    After I got Steam running with the workaround mentioned here, I've got nearly the same problem when launching Team Fortress 2. After starting it from Steam the "Launcher" pops up and after a few seconds it disappears with the following error in the terminal: /home/user/Steam/SteamApps/steamuser/Team Fortress 2/hl2_linux: error while loading shared libraries: libGL.so.1: wrong ELF class: ELFCLASS64 Game removed: AppID 440 "Team Fortress 2", ProcID 5299 saving roaming config store to 'sharedconfig.vdf' roaming config store 2 saved successfully Because of the similarity with the workaround I used before, I tried to execute: export LD_LIBRARY_PATH=/usr/lib32:$LD_LIBRARY_PATH directly before launching the game, but there is no difference.

    Read the article

  • How are design-by-contract and property-based testing (QuickCheck) related?

    - by Todd Owen
    Is their only similarity the fact that they are not xUnit (or more precisely, not based on enumerating specific test cases), or is it deeper than that? Property-based testing (using QuickCheck, ScalaCheck, etc) seem well-suited to a functional programming style where side-effects are avoided. On the other hand, Design by Contract (as implemented in Eiffel) is more suited to OOP languages: you can express post-conditions about the effects of methods, not just their return values. But both of them involve testing assertions that are true in general (rather than assertions that should be true for a specific test case). And both can be tested using randomly generated inputs (with QuickCheck this is the only way, whereas with Eiffel I believe it is an optional feature of the AutoTest tool). Is there an umbrella term to encompass both approaches? Or am I imagining a relationship that doesn't really exist.

    Read the article

  • What Design Pattern is seperating transform converters

    - by RevMoon
    For converting a Java object model into XML I am using the following design: For different types of objects (e.g. primitive types, collections, null, etc.) I define each its own converter, which acts appropriate with respect to the given type. This way it can easily extended without adding code to a huge if-else-then construct. The converters are chosen by a method which tests whether the object is convertable at all and by using a priority ordering. The priority ordering is important so let's say a List is not converted by the POJO converter, even though it is convertable as such it would be more appropriate to use the collection converter. What design pattern is that? I can only think of a similarity to the command pattern.

    Read the article

  • Is JScript dying? If so, where should I go? [closed]

    - by David Is Not Here
    I recently poked around Google for a little bit, looking for information about coding JScript. It's very sparse, which surprised me -- it took a link to a link to find Microsoft's own reference, which appears to omit most if not all references to console-based scripting that extends past Javascript. I'm working with the console here, not a webpage, so input and output seems very different than what Microsoft explains. If JScript is dying (and it appears to be so), where do I go from here? VBScript? My options are limited because the computers I'm using this on are carefully patrolled for new software. JScript's similarity to JavaScript was the biggest reason I had chosen it for porting over some of my prior work. I'm specifically looking for, at best, a console scripting language that doesn't need any extra software on Windows XP or higher, that at least supports standard input, output, pause, and file manipulation, little else.

    Read the article

  • What Design Pattern is separating transform converters

    - by RevMoon
    For converting a Java object model into XML I am using the following design: For different types of objects (e.g. primitive types, collections, null, etc.) I define each its own converter, which acts appropriate with respect to the given type. This way it can easily extended without adding code to a huge if-else-then construct. The converters are chosen by a method which tests whether the object is convertable at all and by using a priority ordering. The priority ordering is important so let's say a List is not converted by the POJO converter, even though it is convertable as such it would be more appropriate to use the collection converter. What design pattern is that? I can only think of a similarity to the command pattern.

    Read the article

  • image segmentation using pil/any package of python

    - by sag
    hi all., i need to segment an image into regions .i'm using pil.i found no module to segment image in pil. I need this segmented regions as a list or dictionary. Actually i'm trying to compare the images for similarity in content aware fashion.for that i need to segment the image. i tried segwin tool but it is drawing another image(which is not required and also time consuming) thans in advance

    Read the article

  • "Push" linq vs reactive framework

    - by Benjol
    (Once again exposing the depths of my ignorance here by combining two concepts which I haven't grokked) I read here about the Reactive framework being a 'Push' model compared to Linq's 'Pull' model. This reminded me of reading an article about 'Push' Linq. Is there really any similarity between these two 'frameworks'? UPDATE Since I asked this question, Jon Skeet has asked it too, here are his first and second impressions.

    Read the article

  • Java graph library for comparing 2 graphs

    - by user311909
    Hello, Does anyone know a good java library for graph comparing by searching maximal common subgraph isomorphism to get information about their similarity? I do not want to compare graphs based on node labels. Or is there any other way how to topologicaly compare graphs with good java library? Now I am using library SimPack and it is usefull but I need something more. Any suggestions will be very helpful. Thanks in advance

    Read the article

< Previous Page | 1 2 3 4 5 6 7  | Next Page >