Search Results

Search found 27 results on 2 pages for 'n gram'.

Page 1/2 | 1 2  | Next Page >

  • N-gram split function for string similarity comparison

    - by Michael
    As part of excersise to better understand F# which I am currently learning , I wrote function to split given string into n-grams. 1) I would like to receive feedback about my function : can this be written simpler or in more efficient way? 2) My overall goal is to write function that returns string similarity (on 0.0 .. 1.0 scale) based on n-gram similarity; Does this approach works well for short strings comparisons , or can this method reliably be used to compare large strings (like articles for example). 3) I am aware of the fact that n-gram comparisons ignore context of two strings. What method would you suggest to accomplish my goal? //s:string - target string to split into n-grams //n:int - n-gram size to split string into let ngram_split (s:string, n:int) = let ngram_count = s.Length - (s.Length % n) let ngram_list = List.init ngram_count (fun i -> if( i + n >= s.Length ) then s.Substring(i,s.Length - i) + String.init ((i + n) - s.Length) (fun i -> "#") else s.Substring(i,n) ) let ngram_array_unique = ngram_list |> Seq.ofList |> Seq.distinct |> Array.ofSeq //produce tuples of ngrams (ngram string,how much occurrences in original string) Seq.init ngram_array_unique.Length (fun i -> (ngram_array_unique.[i], ngram_list |> List.filter(fun item -> item = ngram_array_unique.[i]) |> List.length) )

    Read the article

  • Simple implementation of N-Gram, tf-idf and Cosine similarity in Python

    - by seanieb
    I need to compare documents stored in a DB and come up with a similarity score between 0 and 1. The method I need to use has to be very simple. Implementing a vanilla version of n-grams (where it possible to define how many grams to use), along with a simple implementation of tf-idf and Cosine similarity. Is there any program that can do this? Or should I start writing this from scratch?

    Read the article

  • Edit menu permission issue

    - by user3541568
    i'm an Lubuntu user, but i suppose it concerns everyone. For menu editing there are 3 GUIs: lxmed, menulibre, alacarte. great! Still the ISSUE is that if i start as administrator, for example: gram@gram-pc:~$ alacarte will edit menu, till the time i close app. it doesn't have permission for that... so nothing in menu has been changed... gram@gram-pc:~$ gksudo alacarte or root@gram-pc:/# alacarte will open completely different menu with completely different items... how can i grand permission to edit my-not-root-menu?

    Read the article

  • Storing n-grams in database in < n number of tables.

    - by kurige
    If I was writing a piece of software that attempted to predict what word a user was going to type next using the two previous words the user had typed, I would create two tables. Like so: == 1-gram table == Token | NextWord | Frequency ------+----------+----------- "I" | "like" | 15 "I" | "hate" | 20 == 2-gram table == Token | NextWord | Frequency ---------+------------+----------- "I like" | "apples" | 8 "I like" | "tomatoes" | 12 "I hate" | "tomatoes" | 20 "I hate" | "apples" | 2 Following this example implimentation the user types "I" and the software, using the above database, predicts that the next word the user is going to type is "hate". If the user does type "hate" then the software will then predict that the next word the user is going to type is "tomatoes". However, this implimentation would require a table for each additional n-gram that I choose to take into account. If I decided that I wanted to take the 5 or 6 preceding words into account when predicting the next word, then I would need 5-6 tables, and an exponentially increase in space per n-gram. What would be the best way to represent this in only one or two tables, that has no upper-limit on the number of n-grams I can support?

    Read the article

  • SpeechRecognition issue

    - by Leosa99 _
    I'm creating a Speech Recognition Application like Siri in vb.net. I have found a database of words (in a .txt file) and i want to insert them in my application but its not working . Here my code : Dim WithEvents reco As New Recognition.SpeechRecognitionEngine Dim IA_VOICE As New SpeechSynthesizer Dim List_Word As New Recognition.SrgsGrammar.SrgsOneOf("IN database.") Public Sub New() reco.SetInputToDefaultAudioDevice() Dim gram As New Recognition.SrgsGrammar.SrgsDocument Dim WORD_RULE As New Recognition.SrgsGrammar.SrgsRule("MOT") LOAD_DATABSE(Application.StartupPath & "\RECO_WORD\DataBase.txt") WORD_RULE.Add(List_Word) gram.Rules.Add(WORD_RULE) gram.Root = WORD_RULE reco.LoadGrammar(New Recognition.Grammar(gram)) reco.RecognizeAsync() End Sub Private Sub reco_RecognizeCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognizeCompletedEventArgs) Handles reco.RecognizeCompleted reco.RecognizeAsync() End Sub Private Sub reco_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognitionEventArgs) Handles reco.SpeechRecognized If e.Result.Text = "hi" Then MsgBox("HI!") End If End Sub Sub LOAD_DATABSE(Database_PATH As String) Dim lines() As String = File.ReadAllLines(Database_PATH) Dim numberLinesTotal = lignes.Length Dim numberlignedone As Integer = 0 Dim MOT As New StreamReader(BDD_PATH) While numberlignedone <> numberLinesTota numberlignedone += 1 Dim ITEM As New Recognition.SrgsGrammar.SrgsItem(MOT.ReadLine) Word_List.Items.Add(ITEMS) 'I think its here that its not working. End While MsgBox("END LOADING") End Sub</code> If you know why its not working... Thanks.

    Read the article

  • Volume Shadow Copy not starting

    - by Gram
    Hi We are running a Windows 2008 Small Business Server with Symantec Backup Exec. Last night the back up failed due to the Volume Shadow Copy service being unable to start. When I try to manually start it I get the following error "windows could not start the volume shadow copy on local computer" Has anybody any suggestions other than rebooting the server? many thanks Graham

    Read the article

  • fluent interface program in Ruby

    - by intern
    we have made the following code and trying to run it. class Numeric def gram self end alias_method :grams, :gram def of(name) ingredient = Ingredient.new(name) ingredient.quantity=self return ingredient end end class Ingredient def initialize(n) @@name= n end def quantity=(o) @@quantity = o return @@quantity end def name return @@name end def quantity return @@quantity end end e= 42.grams.of("Test") a= Ingredient.new("Testjio") puts e.quantity a.quantity=90 puts a.quantity puts e.quantity the problem which we are facing in it is that the output of puts a.quantity puts e.quantity is same even when the objects are different. what we observed is that second object i.e 'a' is replacing the value of the first object i.e. 'e'. the output is coming out to be 42 90 90 but the output required is 42 90 42 can anyone suggest why is it happening? it is not replacing the object as object id's are different..only the values of the objects are replaced.

    Read the article

  • fluent interface program in Ruby

    - by intern
    we have made the following code and trying to run it. class Numeric def gram self end alias_method :grams, :gram def of(name) ingredient = Ingredient.new(name) ingredient.quantity=self return ingredient end end class Ingredient def initialize(n) @@name= n end def quantity=(o) @@quantity = o return @@quantity end def name return @@name end def quantity return @@quantity end end e= 42.grams.of("Test") a= Ingredient.new("Testjio") puts e.quantity a.quantity=90 puts a.quantity puts e.quantity the problem which we are facing in it is that the output of puts a.quantity puts e.quantity is same even when the objects are different. what we observed is that second object i.e 'a' is replacing the value of the first object i.e. 'e'. the output is coming out to be 42 90 90 but the output required is 42 90 42 can anyone suggest why is it happening? it is not replacing the object as object id's are different..only the values of the objects are replaced.

    Read the article

  • Stuck in a loop

    - by Luke
    while (true) { //read in the file StreamReader convert = new StreamReader("../../convert.txt"); //define variables string line = convert.ReadLine(); double conversion; int numberIn; double conversionFactor; //ask for the conversion information Console.WriteLine("Enter the conversion in the form (Amount, Convert from, Convert to)"); String inputMeasurement = Console.ReadLine(); string[] inputMeasurementArray = inputMeasurement.Split(','); //loop through the lines looking for a match while (line != null) { string[] fileMeasurementArray = line.Split(','); if (fileMeasurementArray[0] == inputMeasurementArray[1]) { if (fileMeasurementArray[1] == inputMeasurementArray[2]) { Console.WriteLine("The conversion factor for {0} to {1} is {2}", inputMeasurementArray[1], inputMeasurementArray[2], fileMeasurementArray[2]); //convert to int numberIn = Convert.ToInt32(inputMeasurementArray[0]); conversionFactor = Convert.ToDouble(fileMeasurementArray[2]); conversion = (numberIn * conversionFactor); Console.WriteLine("{0} {1} is {2} {3} \n", inputMeasurementArray[0], inputMeasurementArray[1], conversion, inputMeasurementArray[2]); break; } } else { Console.WriteLine("Please enter two valid conversion types \n"); break; } line = convert.ReadLine(); } } The file consists of the following: ounce,gram,28.0 pound,ounce,16.0 pound,kilogram,0.454 pint,litre,0.568 inch,centimetre,2.5 mile,inch,63360.0 The user will input something like 6,ounce,gram The idea is that it finds the correct line by checking if the first and second words in the file are the same as the second and third the user enters. The problem is that if it checks the first line and it fails the if statement, if goes through to the else statement and stops. I am trying to find a way where it will stop after the it finds the correct line but not until. If someone types in a value that isn't in the file, then it should show an error.

    Read the article

  • NTFS Corruption: Files created in Linux corrupted when Windows Boots

    - by Logan Mayfield
    I'm getting some file loss and corruption on my Win7/Ubuntu 12.04 dual boot setup. I have a large shared NTFS partition. I have my Windows Docs/Music/etc. directories on that file and have the comparable directors in Linux setup as a sym. link. I'm using ntfs-3g on the linux side of things to manage the ntfs partition. The shared partition is on a logical partition along with my Linux /home / and /swap partitions. The ntfs partition is mounted at boot time via fstab with the following options: ntfs-3g users,nls=utf8,locale=en_US.UTF-8,exec,rw The problem seems to be confined to newly created and recently edited files. I have not see data loss or corruption when creating/editing files in Windows and then moving over to Ubuntu. I've been using the sync command aggressively in Ubuntu to try to ensure everything is getting written to the HDD. I do not use hibernate in Windows so I know it's not the usual missing files due to Hibernation problem. I'm not seeing any mount related issues on dmesg. Most recently I had a set of files related to a LaTeX document go bad. Some of them show up in Ubuntu but I am unable to delete them. In the GUI file browser they are given thumbnails associated with files I created on my last boot of Windows. To be more specific: I created a few png files in Windows. The files corrupted by that Windows boot are associated with running PdfLatex on a file and are not image files. However, two of the corrupted files show up with the thumbnail image of one of the previously mentioned png files. The png files are not in the same directory as the latex files but they are both win the Document Folder tree. I've had sucess with using NTFS for shared data in the past and am hoping there's some quirk here I'm missing and it's not just bad luck. On one hand this appears to be some kind of Windows problem as data loss occurs when I boot to Windows after having worked in Ubuntu for a while. However, I'm assuming it's more on the Ubuntu end as it requires the special NTFS drivers. Edit for more info: This is a Lenovo Thinkpad L430. Purchased new in the last month. So it's a fairly fresh install. Many of the files on the shared partition were copied over from a previous NTFS formatted shared partition on another HDD. As requested: here's a sample chkdsk log. Some of the files its mentioning were files that got deleted off the partition while in Ubuntu. Others were created/edited but not deleted. Checking file system on D: Volume dismounted. All opened handles to this volume are now invalid. Volume label is Files. CHKDSK is verifying files (stage 1 of 3)... Attribute record of type 0x80 and instance tag 0x2 is cross linked starting at 0x789f47 for possibly 0x21 clusters. Some clusters occupied by attribute of type 0x80 and instance tag 0x2 in file 0x42 is already in use. Deleting corrupt attribute record (128, "") from file record segment 66. 86496 file records processed. File verification completed. 385 large file records processed. 0 bad file records processed. 0 EA records processed. 0 reparse records processed. CHKDSK is verifying indexes (stage 2 of 3)... Deleted invalid filename Screenshot from 2012-09-09 09:51:27.png (72) in directory 46. The NTFS file name attribute in file 0x48 is incorrect. 53 00 63 00 72 00 65 00 65 00 6e 00 73 00 68 00 S.c.r.e.e.n.s.h. 6f 00 74 00 20 00 66 00 72 00 6f 00 6d 00 20 00 o.t. .f.r.o.m. . 32 00 30 00 31 00 32 00 2d 00 30 00 39 00 2d 00 2.0.1.2.-.0.9.-. 30 00 39 00 20 00 30 00 39 00 3a 00 35 00 31 00 0.9. .0.9.:.5.1. 3a 00 32 00 37 00 2e 00 70 00 6e 00 67 00 0d 00 :.2.7...p.n.g... 00 00 00 00 00 00 90 94 49 1f 5e 00 00 80 d4 00 ......I.^.... File 72 has been orphaned since all its filenames were invalid Windows will recover the file in the orphan recovery phase. Correcting minor file name errors in file 72. Index entry found.000 of index $I30 in file 0x5 points to unused file 0x11. Deleting index entry found.000 in index $I30 of file 5. Index entry found.001 of index $I30 in file 0x5 points to unused file 0x16. Deleting index entry found.001 in index $I30 of file 5. Index entry found.002 of index $I30 in file 0x5 points to unused file 0x15. Deleting index entry found.002 in index $I30 of file 5. Index entry DOWNLO~1 of index $I30 in file 0x28 points to unused file 0x2b6. Deleting index entry DOWNLO~1 in index $I30 of file 40. Unable to locate the file name attribute of index entry Screenshot from 2012-09-09 09:51:27.png of index $I30 with parent 0x2e in file 0x48. Deleting index entry Screenshot from 2012-09-09 09:51:27.png in index $I30 of file 46. An index entry of index $I30 in file 0x32 points to file 0x151e8 which is beyond the MFT. Deleting index entry latexsheet.tex in index $I30 of file 50. An index entry of index $I30 in file 0x58bc points to file 0x151eb which is beyond the MFT. Deleting index entry D8CZ82PK in index $I30 of file 22716. An index entry of index $I30 in file 0x58bc points to file 0x151f7 which is beyond the MFT. Deleting index entry EGA4QEAX in index $I30 of file 22716. An index entry of index $I30 in file 0x58bc points to file 0x151e9 which is beyond the MFT. Deleting index entry NGTB469M in index $I30 of file 22716. An index entry of index $I30 in file 0x58bc points to file 0x151fb which is beyond the MFT. Deleting index entry WU5RKXAB in index $I30 of file 22716. Index entry comp220-lab3.synctex.gz of index $I30 in file 0xda69 points to unused file 0xd098. Deleting index entry comp220-lab3.synctex.gz in index $I30 of file 55913. Unable to locate the file name attribute of index entry comp220-numberGrammars.aux of index $I30 with parent 0xda69 in file 0xa276. Deleting index entry comp220-numberGrammars.aux in index $I30 of file 55913. The file reference 0x500000000cd43 of index entry comp220-numberGrammars.out of index $I30 with parent 0xda69 is not the same as 0x600000000cd43. Deleting index entry comp220-numberGrammars.out in index $I30 of file 55913. The file reference 0x500000000cd45 of index entry comp220-numberGrammars.pdf of index $I30 with parent 0xda69 is not the same as 0xc00000000cd45. Deleting index entry comp220-numberGrammars.pdf in index $I30 of file 55913. An index entry of index $I30 in file 0xda69 points to file 0x15290 which is beyond the MFT. Deleting index entry gram.aux in index $I30 of file 55913. An index entry of index $I30 in file 0xda69 points to file 0x15291 which is beyond the MFT. Deleting index entry gram.out in index $I30 of file 55913. An index entry of index $I30 in file 0xda69 points to file 0x15292 which is beyond the MFT. Deleting index entry gram.pdf in index $I30 of file 55913. Unable to locate the file name attribute of index entry comp230-quiz1.synctex.gz of index $I30 with parent 0xda6f in file 0xd183. Deleting index entry comp230-quiz1.synctex.gz in index $I30 of file 55919. An index entry of index $I30 in file 0xf3cc points to file 0x15283 which is beyond the MFT. Deleting index entry require-transform.rkt in index $I30 of file 62412. An index entry of index $I30 in file 0xf3cc points to file 0x15284 which is beyond the MFT. Deleting index entry set.rkt in index $I30 of file 62412. An index entry of index $I30 in file 0xf497 points to file 0x15280 which is beyond the MFT. Deleting index entry logger.rkt in index $I30 of file 62615. An index entry of index $I30 in file 0xf497 points to file 0x15281 which is beyond the MFT. Deleting index entry misc.rkt in index $I30 of file 62615. An index entry of index $I30 in file 0xf497 points to file 0x15282 which is beyond the MFT. Deleting index entry more-scheme.rkt in index $I30 of file 62615. An index entry of index $I30 in file 0xf5bf points to file 0x15285 which is beyond the MFT. Deleting index entry core-layout.rkt in index $I30 of file 62911. An index entry of index $I30 in file 0xf5e0 points to file 0x15286 which is beyond the MFT. Deleting index entry ref.scrbl in index $I30 of file 62944. An index entry of index $I30 in file 0xf6f0 points to file 0x15287 which is beyond the MFT. Deleting index entry base-render.rkt in index $I30 of file 63216. An index entry of index $I30 in file 0xf6f0 points to file 0x15288 which is beyond the MFT. Deleting index entry html-properties.rkt in index $I30 of file 63216. An index entry of index $I30 in file 0xf6f0 points to file 0x15289 which is beyond the MFT. Deleting index entry html-render.rkt in index $I30 of file 63216. An index entry of index $I30 in file 0xf6f0 points to file 0x1528b which is beyond the MFT. Deleting index entry latex-prefix.rkt in index $I30 of file 63216. An index entry of index $I30 in file 0xf6f0 points to file 0x1528c which is beyond the MFT. Deleting index entry latex-render.rkt in index $I30 of file 63216. An index entry of index $I30 in file 0xf6f0 points to file 0x1528e which is beyond the MFT. Deleting index entry scribble.tex in index $I30 of file 63216. An index entry of index $I30 in file 0xf717 points to file 0x1528a which is beyond the MFT. Deleting index entry lang.rkt in index $I30 of file 63255. An index entry of index $I30 in file 0xf721 points to file 0x1528d which is beyond the MFT. Deleting index entry lang.rkt in index $I30 of file 63265. An index entry of index $I30 in file 0xf764 points to file 0x1528f which is beyond the MFT. Deleting index entry lang.rkt in index $I30 of file 63332. An index entry of index $I30 in file 0x14261 points to file 0x15270 which is beyond the MFT. Deleting index entry fddff3ae9ae2221207f144821d475c08ec3d05 in index $I30 of file 82529. An index entry of index $I30 in file 0x14621 points to file 0x15268 which is beyond the MFT. Deleting index entry FETCH_HEAD in index $I30 of file 83489. An index entry of index $I30 in file 0x14650 points to file 0x15272 which is beyond the MFT. Deleting index entry 86 in index $I30 of file 83536. An index entry of index $I30 in file 0x14651 points to file 0x15266 which is beyond the MFT. Deleting index entry pack-7f54ce9f8218d2cd8d6815b8c07461b50584027f.idx in index $I30 of file 83537. An index entry of index $I30 in file 0x14651 points to file 0x15265 which is beyond the MFT. Deleting index entry pack-7f54ce9f8218d2cd8d6815b8c07461b50584027f.pack in index $I30 of file 83537. An index entry of index $I30 in file 0x146f1 points to file 0x15275 which is beyond the MFT. Deleting index entry master in index $I30 of file 83697. An index entry of index $I30 in file 0x146f6 points to file 0x15276 which is beyond the MFT. Deleting index entry remotes in index $I30 of file 83702. An index entry of index $I30 in file 0x1477d points to file 0x15278 which is beyond the MFT. Deleting index entry pad.rkt in index $I30 of file 83837. An index entry of index $I30 in file 0x14797 points to file 0x1527c which is beyond the MFT. Deleting index entry pad1.rkt in index $I30 of file 83863. An index entry of index $I30 in file 0x14810 points to file 0x1527d which is beyond the MFT. Deleting index entry cm.rkt in index $I30 of file 83984. An index entry of index $I30 in file 0x14926 points to file 0x1527e which is beyond the MFT. Deleting index entry multi-file-search.rkt in index $I30 of file 84262. An index entry of index $I30 in file 0x149ef points to file 0x1527f which is beyond the MFT. Deleting index entry com.rkt in index $I30 of file 84463. An index entry of index $I30 in file 0x14b47 points to file 0x15202 which is beyond the MFT. Deleting index entry COMMIT_EDITMSG in index $I30 of file 84807. An index entry of index $I30 in file 0x14b47 points to file 0x15279 which is beyond the MFT. Deleting index entry index in index $I30 of file 84807. An index entry of index $I30 in file 0x14b4c points to file 0x15274 which is beyond the MFT. Deleting index entry master in index $I30 of file 84812. An index entry of index $I30 in file 0x14b61 points to file 0x1520b which is beyond the MFT. Deleting index entry 02 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x1525a which is beyond the MFT. Deleting index entry 28 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x15208 which is beyond the MFT. Deleting index entry 29 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x1521f which is beyond the MFT. Deleting index entry 2c in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x15261 which is beyond the MFT. Deleting index entry 2e in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x151f0 which is beyond the MFT. Deleting index entry 45 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x1523e which is beyond the MFT. Deleting index entry 47 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x151e5 which is beyond the MFT. Deleting index entry 49 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x15214 which is beyond the MFT. Deleting index entry 58 in index $I30 of file 84833. Index entry 6e of index $I30 in file 0x14b61 points to unused file 0xd182. Deleting index entry 6e in index $I30 of file 84833. Unable to locate the file name attribute of index entry a0 of index $I30 with parent 0x14b61 in file 0xd29c. Deleting index entry a0 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x1521b which is beyond the MFT. Deleting index entry cd in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x15249 which is beyond the MFT. Deleting index entry d6 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x15242 which is beyond the MFT. Deleting index entry df in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x15227 which is beyond the MFT. Deleting index entry ea in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x1522e which is beyond the MFT. Deleting index entry f3 in index $I30 of file 84833. An index entry of index $I30 in file 0x14b61 points to file 0x151f2 which is beyond the MFT. Deleting index entry ff in index $I30 of file 84833. An index entry of index $I30 in file 0x14b62 points to file 0x15254 which is beyond the MFT. Deleting index entry 1ed39b36ad4bd48c91d22cbafd7390f1ea38da in index $I30 of file 84834. An index entry of index $I30 in file 0x14b75 points to file 0x15224 which is beyond the MFT. Deleting index entry 96260247010fe9811fea773c08c5f3a314df3f in index $I30 of file 84853. An index entry of index $I30 in file 0x14b79 points to file 0x15219 which is beyond the MFT. Deleting index entry 8f689724ca23528dd4f4ab8b475ace6edcb8f5 in index $I30 of file 84857. An index entry of index $I30 in file 0x14b7c points to file 0x15223 which is beyond the MFT. Deleting index entry 1df17cf850656be42c947cba6295d29c248d94 in index $I30 of file 84860. An index entry of index $I30 in file 0x14b7c points to file 0x15217 which is beyond the MFT. Deleting index entry 31db8a3c72a3e44769bbd8db58d36f8298242c in index $I30 of file 84860. An index entry of index $I30 in file 0x14b7c points to file 0x15267 which is beyond the MFT. Deleting index entry 8e1254d755ff1882d61c07011272bac3612f57 in index $I30 of file 84860. An index entry of index $I30 in file 0x14b82 points to file 0x15246 which is beyond the MFT. Deleting index entry f959bfaf9643c1b9e78d5ecf8f669133efdbf3 in index $I30 of file 84866. An index entry of index $I30 in file 0x14b88 points to file 0x151fe which is beyond the MFT. Deleting index entry 7e9aa15b1196b2c60116afa4ffa613397f2185 in index $I30 of file 84872. An index entry of index $I30 in file 0x14b8a points to file 0x151ea which is beyond the MFT. Deleting index entry 73cb0cd248e494bb508f41b55d862e84cdd6e0 in index $I30 of file 84874. An index entry of index $I30 in file 0x14b8e points to file 0x15264 which is beyond the MFT. Deleting index entry bd555d9f0383cc14c317120149e9376a8094c4 in index $I30 of file 84878. An index entry of index $I30 in file 0x14b96 points to file 0x15212 which is beyond the MFT. Deleting index entry 630dba40562d991bc6cbb6fed4ba638542e9c5 in index $I30 of file 84886. An index entry of index $I30 in file 0x14b99 points to file 0x151ec which is beyond the MFT. Deleting index entry 478be31ca8e538769246e22bba3330d81dc3c8 in index $I30 of file 84889. An index entry of index $I30 in file 0x14b99 points to file 0x15258 which is beyond the MFT. Deleting index entry 66c60c0a0f3253bc9a5112697e4cbb0dfc0c78 in index $I30 of file 84889. An index entry of index $I30 in file 0x14b9c points to file 0x15238 which is beyond the MFT. Deleting index entry 1c7ceeddc2953496f9ffbfc0b6fb28846e3fe3 in index $I30 of file 84892. An index entry of index $I30 in file 0x14b9c points to file 0x15247 which is beyond the MFT. Deleting index entry ae6e32ffc49d897d8f8aeced970a90d3653533 in index $I30 of file 84892. An index entry of index $I30 in file 0x14ba0 points to file 0x15233 which is beyond the MFT. Deleting index entry f71c7d874e45179a32e138b49bf007e5bbf514 in index $I30 of file 84896. Index entry 2e04fefbd794f050d45e7a717d009e39204431 of index $I30 in file 0x14ba7 points to unused file 0xd097. Deleting index entry 2e04fefbd794f050d45e7a717d009e39204431 in index $I30 of file 84903. An index entry of index $I30 in file 0x14baa points to file 0x15241 which is beyond the MFT. Deleting index entry 0dda7dec1c635cd646dfef308e403c2843d5dc in index $I30 of file 84906. An index entry of index $I30 in file 0x14baa points to file 0x151fc which is beyond the MFT. Deleting index entry 98151e654dd546edcfdec630bc82d90619ac8e in index $I30 of file 84906. An index entry of index $I30 in file 0x14bb1 points to file 0x151e9 which is beyond the MFT. Deleting index entry 1997c5be62ffeebc99253cced7608415e38e4e in index $I30 of file 84913. An index entry of index $I30 in file 0x14bb1 points to file 0x1521d which is beyond the MFT. Deleting index entry 6bf3aedefd3ac62d9c49cad72d05e8c0ad242c in index $I30 of file 84913. An index entry of index $I30 in file 0x14bb1 points to file 0x151f4 which is beyond the MFT. Deleting index entry 907b755afdca14c00be0010962d0861af29264 in index $I30 of file 84913. An index entry of index $I30 in file 0x14bb3 points to file 0x15218 which is beyond the MFT. Deleting index entry

    Read the article

  • How important is index size when searching?

    - by Michael K
    My company has recently began using Apache Solr to search its data. As we learn how to use it we have gone down the path of indexing multiple fields to get the results we need. Most of these are either N-Grammed or Edge-N-Grammed. Gramming by nature takes up a lot of space, which takes more time to search. Space is cheap, but time is less so. Index time is not too important, since a delta-import (only get the changes since last index) is extremely quick and you only pay a penalty on the first import. What we've not been able to determine is what effect the index size has on query times. Obviously a larger index takes longer to search, but the time added by n-gramming a field is difficult to predict. How do you determine whether a field is worth gramming? Can you predict how much longer a query will take when you gram a field?

    Read the article

  • Writing a spell checker similar to "did you mean"

    - by user888734
    I'm hoping to write a spellchecker for search queries in a web application - not unlike Google's "Did you mean?" The algorithm will be loosely based on this: http://catalog.ldc.upenn.edu/LDC2006T13 In short, it generates correction candidates and scores them on how often they appear (along with adjacent words in the search query) in an enormous dataset of known n-grams - Google Web 1T - which contains well over 1 billion 5-grams. I'm not using the Web 1T dataset, but building my n-gram sets from my own documents - about 200k docs, and I'm estimating tens or hundreds of millions of n-grams will be generated. This kind of process is pushing the limits of my understanding of basic computing performance - can I simply load my n-grams into memory in a hashtable or dictionary when the app starts? Is the only limiting factor the amount of memory on the machine? Or am I barking up the wrong tree? Perhaps putting all my n-grams in a graph database with some sort of tree query optimisation? Could that ever be fast enough?

    Read the article

  • Would You Swim Laps in Lake Baikal?

    - by rickramsey
    source This is the lake where Yuli Vasiliev's countrymen swim laps. Yuli is one of my favorite OTN writers not just because he really knows his stuff. Not just because his writing is clear and accurate. And not just because his English is better than the English of most native speakers. Yo, those are all good reasons. But it's the Lake Baikal thing. Yuli recently wrote two wicked good how-to's about Oracle VM Templates. You should read them. You might gain a gram of Yuli's respect. Two grams, if you can head butt icebergs while you swim. How to Use Oracle VM Templates How to prepare an Oracle VM environment to use Oracle VM Templates, how to obtain a template, and how to deploy the template to your Oracle VM environment. Also how to create a virtual machine based on that template and how you can clone the template and change the clone's configuration. How to Use Oracle VM VirtualBox Templates How to use Oracle VM VirtualBox Templates in Oracle VM VirtualBox. Similar to the article above, but it describes how to download, install, and configure the templates within Oracle VM VirtualBox, instead of on bare metal. Other OTN Technical Articles by Yuli Vasiliev Retrieving, Transforming, and Consolidating Web Data with Oracle Database Setting Up, Configuring, and Using a WebLogic Server Cluster Cube Development for Beginners How to XQuery Non-JDBC Sources from JDBC Advanced Dimensional Design with Oracle Warehouse Builder Using the JDBC Connectivity Layer in Oracle Warehouse Builder High Performance Oracle JDBC Programming Python Data Persistence with Oracle Querying JPA Entities with JPQL and Native SQL - Rick Website Newsletter Facebook Twitter

    Read the article

  • Simple NLP: How to use ngram to do word similarity?

    - by sadawd
    Dear Everyone, I Hear that google uses up to 7-grams for their own data. I am interested in finding words that are similar in context (i.e. cat and dog) and I was wondering how do I compute the similarity of two words on a n-gram model given that n 2. Given a sample set like this forexample: (I, love cats), (cats, loves, dogs), (dogs, hate, human) What is a good way to compare the similarity of this pair (I, cats)? Also does anyone know of anyway to do levels for NLP? like: Army-Military-Solider ?

    Read the article

  • Ngram IDF smoothing

    - by adi92
    I am trying to use IDF scores to find interesting phrases in my pretty huge corpus of documents. I basically need something like Amazon's Statistically Improbable Phrases, i.e. phrases that distinguish a document from all the others The problem that I am running into is that some (3,4)-grams in my data which have super-high idf actually consist of component unigrams and bigrams which have really low idf.. For example, "you've never tried" has a very high idf, while each of the component unigrams have very low idf.. I need to come up with a function that can take in document frequencies of an n-gram and all its component (n-k)-grams and return a more meaningful measure of how much this phrase will distinguish the parent document from the rest. If I were dealing with probabilities, I would try interpolation or backoff models.. I am not sure what assumptions/intuitions those models leverage to perform well, and so how well they would do for IDF scores. Anybody has any better ideas?

    Read the article

  • "Anagram solver" based on statistics rather than a dictionary/table?

    - by James M.
    My problem is conceptually similar to solving anagrams, except I can't just use a dictionary lookup. I am trying to find plausible words rather than real words. I have created an N-gram model (for now, N=2) based on the letters in a bunch of text. Now, given a random sequence of letters, I would like to permute them into the most likely sequence according to the transition probabilities. I thought I would need the Viterbi algorithm when I started this, but as I look deeper, the Viterbi algorithm optimizes a sequence of hidden random variables based on the observed output. I am trying to optimize the output sequence. Is there a well-known algorithm for this that I can read about? Or am I on the right track with Viterbi and I'm just not seeing how to apply it?

    Read the article

  • sell skimmed dump+pin([email protected])wu transfer,bank transfer,paypal+mailpass

    - by bestseller
    http://megareserve.blogspot.com///////// [email protected]////gmail:[email protected] Sell, Cvv, Bank Logins, Tracks, PayPal, Transfers, WU, Credit Cards, Card, Hacks, Citi, Boa, Visa, MasterCard, Amex, American Express, Make Money Fast, Stolen, Cc, C++, Adder, Western, Union, Banks, Of, America, Wellsfargo, Liberty, Reserve, Gram, Mg, LR, AlertPay ..PAYMENT METHOD LIBERTYRESERVE AND WESTERNUNION ONLY........ Ccv EU is $ 6 per ccv (Visa + Master) Ccv EU is $ 7 per ccv (Amex + Discover) Ccv Au is $ 6 per ccv Ccv Italy is 15 $ per cc sweden 12$ spain 10$ france 12$ Ccv US is $ 1.5 per ccv (Visa) Ccv US is $ 2 per ccv (master) Ccv US is $ 3 per ccv (Amex + Discover) Ccv UK is $ 5 per ccv (Visa + Master) Ccv UK is $ 6 per ccv (Amex + swith) Ccv Ca is $ 6 per ccv (Visa+ Master) Ccv Ca is $ 9 per ccv (Visa Business + Visa Gold) Ccv Germany is 14$ Per Ccv Ccv DOB with US is 15 $ per ccv Ccv DOB with UK is 19 $ per ccv Ccv DOB + BIN with UK 25$ per ccv Ccv US full info is 40 $ per ccv Ccv UK full info is 60 $ per ccv 1 Uk check bins= 12.5$/1cvv 1 Sock live = 1$/1sock live 5day yahoo:[email protected] gmail:[email protected] Balance In Chase:.........70K To 155K ========300$ Balance In Wachovia:.........24K To 80K==========180$ Balance In Boa..........75K To 450K==========400$ Balance In Credit Union:.........Any Amount:=========420$ Balance In Hallifax..........ANY AMOUNT=========420$ Balance In Compass..........ANY AMOUNT=========400$ Balance In Wellsfargo..........ANY AMOUNT=========400$ Balance In Barclays..........80K To 100K=========550$ Balance In Abbey:.........82K ==========650$ Balance in Hsbc:.........50K========650$ and more 1 Paypal with pass email = 50$/paypal 1 Paypal don't have pass email = 20$/Paypal 1 Banklogin us or uk (personel)= 550$ yahoo :[email protected] gmail :[email protected] Track 1/2 Visa Classic, MasterCard Standart US - 13$ UK - 17$ EU - 24$ AU - 26$ Track 1/2 Visa Gold | Platinum | Business | Signature, MasterCard Gold | Platinum US - 17$ UK - 20$ EU - 28$ AU - 30$ Bank transfer Balance 71.000$ CITIBANK SOUTH DAKOTA, N.A. Balance 65.000$ Wachovia: 76.000$ Abbey: 65.000£ Hsbc : 87.000$ Hallifax : 97.000£ Barclays: 110.000£ AHLI UNITED BANK --- 80.000£ LLOYDSTSB ---------- 100.000£ BANK OF SCOTLAND --- 123.000£ BOA ----------210.000$ UBS ---------- 152.000$ RBC BANK ------ 245.000$ BANK OF CANADA -------- 78.000$ BDC ---------- 281.000$ BANK LAURENTIENNE ----- 241.000$ please no test yahoo: [email protected] gmail: [email protected] website ; http://megareserve.blogspot.com Prices For Bin and Its List: 5434, 5419, 4670,374288,545140,454634,3791 with d.o.b,4049,4462,4921.4929.46274547,5506,5569,5404,5031,4921, 5505,5506,4921,4550 ,4552,4988,5186,4462,4543,4567 ,4539,5301,4929,5521 , 4291,5051,4975,5413 5255 4563,4547 4505,4563 5413 5255,5521,5506,4921,4929,54609 7,5609,54609,4543, 4975,5432,5187 ,4973,4627,4049,4779,426565,55 05, 5549, 5404, 5434, 5419, 4670,456730,541361, 451105,4670,5505, 5549, 5404, UK Nomall NO BINS(Serial) with DOB is :10$ UK with BINS(Serial) with DOB is :15$ UK Nomall no BINS(Serial) no DOB is: 6$ UK with BINS(Serial) is :12$ Please do not request : cc for TEST and FREE. DON'T CONTACT ME IF YOU NOT READY NEED TO BUY .

    Read the article

  • What good technology podcasts are out there?

    - by Michael Stum
    Yes, Podcasts, those nice little Audiobooks I can listen to on the way to work. With the current amount of Podcasts, it's like searching a needle in a haystack, except that the haystack happens to be the Internet and is filled with too many of these "Hot new Gadgets" stuff :( Now, even though I am mainly a .NET developer nowadays, maybe anyone knows some good Podcasts from people regarding the whole software lifecycle? Unit Testing, Continous Integration, Documentation, Deployment... So - what are you guys and gals listening to? Please note that the categorizations are somewhat subjective and may not be 100% accurate as many podcasts cover several areas. Categorization is made against what is considered the "main" area. General Software Engineering / Productivity Stack Overflow TekPub (Requires Paid Subscription) SE Radio 43 Folders Perspectives Dr. Dobb's (now a video feed) The Pragmatic Podcast (Inactive) IT Matters Agile Toolkit Podcast The Stack Trace (Inactive) Parleys Techzing The Startup Success Podcast Berkeley CS class lectures FOSS Weekly .NET / Visual Studio / Microsoft Herding Code Hanselminutes .NET Rocks! Deep Fried Bytes Alt.Net Podcast Polymorphic Podcast Sparkling Client (The Silverlight Podcast) dnrTV! Spaghetti Code ASP.NET Podcast Channel 9 Radio TFS PowerScripting Podcast The Thirsty Developer Elegant Code ConnectedShow Crafty Coders Coding QA jQuery yayQuery The official jQuery podcast Java / Groovy The Java Posse Grails Podcast Java Technology Insider Ruby / Rails Railscasts Rails Envy The Ruby on Rails Podcast Rubiverse Web Design / JavaScript / Ajax WebDevRadio Boagworld The Rissington podcast Ajaxian YUI Theater Unix / Linux / Mac / iPhone Mac Developer Network Hacker Public Radio Linux Outlaws Mac OS Ken LugRadio Linux radio show (Inactive) The Linux Action Show! Linux Kernel Mailing List (LKML) Summary Podcast Stanford's iPhone programming class SysAdmin, Security or Infrastructure RunAs Radio Security Now! Crypto-Gram Security Podcast Hak5 VMWare VMTN Windows Weekly PaulDotCom Security The Register - Semi-Coherent Computing FeatherCast General Tech / Business Tekzilla This Week in Tech The Guardian Tech Weekly PCMag Radio Podcast Entrepreneurship Corner Manager Tools Other / Misc. / Podcast Networks IT Conversations Retrobits Podcast No Agenda Netcast Cranky Geeks The Command Line Freelance Radio IBM developerWorks The Register - Open Season Drunk and Retired Technometria Sod This Radio4Nerds Hacker Medley

    Read the article

  • Input string was not in correct format

    - by Luke
    using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; namespace measurementConverter { class Program { static void Main(string[] args) { //read in the file StreamReader convert = new StreamReader("../../convert.txt"); //define variables string line = convert.ReadLine(); int conversion; int numberIn; float conversionFactor; Console.WriteLine("Enter the conversion in the form (amount,from,to)"); String inputMeasurement = Console.ReadLine(); string[] inputMeasurementArray = inputMeasurement.Split(','); while (line != null) { string[] fileMeasurementArray = line.Split(','); if (fileMeasurementArray[0] == inputMeasurementArray[1]) { if (fileMeasurementArray[1] == inputMeasurementArray[2]) { Console.WriteLine("{0}", fileMeasurementArray[2]); } } line = convert.ReadLine(); //convert to int numberIn = Convert.ToInt32(inputMeasurementArray[0]); conversionFactor = Convert.ToInt32(fileMeasurementArray[2]); conversion = (numberIn * conversionFactor); } Console.ReadKey(); } } } Hello, I am trying to get the calculating going. On the line conversionFactor = Convert.ToInt32(fileMeasurementArray[2]);, I am getting an error saying "Input string was not in correct format". Please help! The text file consists of the following: ounce,gram,28.0 pound,ounce,16.0 pound,kilogram,0.454 pint,litre,0.568 inch,centimetre,2.5 mile,inch,63360.0

    Read the article

  • Spaces and Parenthesis in windows PATH variable screws up batch files.

    - by NoName
    So, my path variable (System-Adv Settings-Env Vars-System-PATH) is set to: C:\Python26\Lib\site-packages\PyQt4\bin; %SystemRoot%\system32; %SystemRoot%; %SystemRoot%\System32\Wbem; %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\; C:\Python26\; C:\Python26\Scripts\; C:\cygwin\bin; "C:\PathWithSpaces\What_is_this_bullshit"; "C:\PathWithSpaces 1.5\What_is_this_bullshit_1.5"; "C:\PathWithSpaces (2.0)\What_is_this_bullshit_2.0"; "C:\Program Files (x86)\IronPython 2.6"; "C:\Program Files (x86)\Subversion\bin"; "C:\Program Files (x86)\Git\cmd"; "C:\Program Files (x86)\PuTTY"; "C:\Program Files (x86)\Mercurial"; Z:\droid\android-sdk-windows\tools; Although, obviously, without the newlines. Notice the lines containing PathWithSpaces - the first has no spaces, the second has a space, and the third has a space followed by a parenthesis. Now, notice the output of this batch file: C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\>vcvars32.bat C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin>"C:\Program Files (x86 )\Microsoft Visual Studio 9.0\Common7\Tools\vsvars32.bat" Setting environment for using Microsoft Visual Studio 2008 x86 tools. \What_is_this_bullshit_2.0";"C:\Program was unexpected at this time. C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin> set "PATH=C:\Pro gram Files\Microsoft SDKs\Windows\v6.0A\bin;C:\Python26\Lib\site-packages\PyQt4\ bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\ WindowsPowerShell\v1.0\;C:\Python26\;C:\Python26\Scripts\;C:\cygwin\bin;"C:\Path WithSpaces\What_is_this_bullshit";"C:\PathWithSpaces 1.5\What_is_this_bullshit_1 .5";"C:\PathWithSpaces (2.0)\What_is_this_bullshit_2.0";"C:\Program Files (x86)\ IronPython 2.6";"C:\Program Files (x86)\Subversion\bin";"C:\Program Files (x86)\ Git\cmd";"C:\Program Files (x86)\PuTTY";"C:\Program Files (x86)\Mercurial";Z:\dr oid\android-sdk-windows\tools;" or specifically the line: \What_is_this_bullshit_2.0";"C:\Program was unexpected at this time. So, what is this bullshit? Specifically: Directory in path that is properly escaped with quotes, but with no spaces = fine Directory in path that is properly escaped with quotes, and has spaces but no parenthesis = fine Directory in path that is properly escaped with quotes, and has spaces and has a parenthesis = ERROR Whats going on here? How can I fix this? I'll probably resort to a junction point to let my tools still work as workaround, but if you have any insight into this, please let me know :)

    Read the article

  • My Feelings About Microsoft Surface

    - by Valter Minute
    Advice: read the title carefully, I’m talking about “feelings” and not about advanced technical points proved in a scientific and objective way I still haven’t had a chance to play with a MS Surface tablet (I would love to, of course) and so my ideas just came from reading different articles on the net and MS official statements. Remember also that the MVP motto begins with “Independent” (“Independent Experts. Real World Answers.”) and this is just my humble opinion about a product and a technology. I know that, being an MS MVP you can be called an “MS-fanboy”, I don’t care, I hope that people can appreciate my opinion, even if it doesn’t match theirs. The “Surface” brand can be confusing for techies that knew the “original” surface concept but I think that will be a fresh new brand name for most of the people out there. But marketing department are here to confuse people… so I can understand this “recycle” of an existing name. So Microsoft is entering the hardware arena… for me this is good news. Microsoft developed some nice hardware in the past: the xbox, zune (even if the commercial success was quite limited) and, last but not least, the two arc mices (old and new model) that I use and appreciate. In the past Microsoft worked with OEMs and that model lead to good and bad things. Good thing (for microsoft, at least) is market domination by windows-based PCs that only in the last years has been reduced by the return of the Mac and tablets. Google is also moving in the hardware business with its acquisition of Motorola, and Apple leveraged his control of both the hardware and software sides to develop innovative products. Microsoft can scare OEMs and make them fly away from windows (but where?) or just lead the pack, showing how devices should be designed to compete in the market and bring back some of the innovation that disappeared from recent PC products (look at the shelves of your favorite electronics store and try to distinguish a laptop between the huge mass of anonymous PCs on displays… only Macs shine out there…). Having to compete with MS “official” hardware will force OEMs to develop better product and bring back some real competition in a market that was ruled only by prices (the lower the better even when that means low quality) and no innovative features at all (when it was the last time that a new PC surprised you?). Moving into a new market is a big and risky move, but with Windows 8 Microsoft is playing a crucial move for its future, trying to be back in the innovation run against apple and google. MS can’t afford to fail this time. I saw the new devices (the WinRT and Pro) and the specifications are scarce, misleading and confusing. The first impression is that the device looks like an iPad with a nice keyboard cover… Using “HD” and “full HD” to define display resolution instead of using the real figures and reviving the “ClearType” brand (now dead on Win8 as reported here and missed by people who hate to read text on displays, like myself) without providing clear figures (couldn’t you count those damned pixels?) seems to imply that MS was caught by surprise by apple recent “retina” displays that brought very high definition screens on tablets.Also there are no specifications about the processors used (even if some sources report NVidia Tegra for the ARM tablet and i5 for the x86 one) and expected battery life (a critical point for tablets and the point that killed Windows7 x86 based tablets). Also nothing about the price, and this will be another critical point because other platform out there already provide lots of applications and have a good user base, if MS want to enter this market tablets pricing must be competitive. There are some expansion ports (SD and USB), so no fixed storage model (even if the specs talks about 32-64GB for RT and 128-256GB for pro). I like this and don’t like the apple model where flash memory (that it’s dirt cheap used in thumdrives or SD cards) is as expensive as gold (or cocaine to have a more accurate per gram measurement) when mounted inside a tablet/phone. For big files you’ll be able to use external media and an SD card could be used to store files that don’t require super-fast SSD-like access times, I hope. To be honest I really don’t like the marketplace model and the limitation of Windows RT APIs (no local database? from a company that based a good share of its success on VB6+Access!) and lack of desktop support on the ARM (even if the support is here and has been used to port office). It’s a step toward the consumer market (where competitors are making big money), but may impact enterprise (and embedded) users that may not appreciate Windows 8 new UI or the limitations of the new app model (if you aren’t connected you are dead ). Not having compatibility with the desktop will require brand new applications and honestly made all the CPU cycles spent to convert .NET IL into real machine code in the past like a huge waste of time… as soon as a new processor architecture is supported by Windows you still have to rewrite part of your application (and MS is pushing HTML5+JS and native code more than .NET in my perception). On the other side I believe that the development experience provided by Visual Studio is still miles (or kilometres) ahead of the competition and even the all-uppercase menu of VS2012 hasn’t changed this situation. The new metro UI got mixed reviews. On my side I should say that is very pleasant to use on a touch screen, I like the minimalist design (even if sometimes is too minimal and hides stuff that, in my opinion, should be visible) but I should also say that using it with mouse and keyboard is like trying to pick your nose with boxing gloves… Metro is also very interesting for embedded devices where touch screen usage is quite common and where having an application taking all the screen is the norm. For devices like kiosks, vending machines etc. this kind of UI can be a great selling point. I don’t need a new tablet (to be honest I’m pretty happy with my wife’s iPad and with my PC), but I may change my opinion after having a chance to play a little bit with those new devices and understand what’s hidden under all this mysterious and generic announcements and specifications!

    Read the article

  • Are there any working implementations of the rolling hash function used in the Rabin-Karp string sea

    - by c14ppy
    I'm looking to use a rolling hash function so I can take hashes of n-grams of a very large string. For example: "stackoverflow", broken up into 5 grams would be: "stack", "tacko", "ackov", "ckove", "kover", "overf", "verfl", "erflo", "rflow" This is ideal for a rolling hash function because after I calculate the first n-gram hash, the following ones are relatively cheap to calculate because I simply have to drop the first letter of the first hash and add the new last letter of the second hash. I know that in general this hash function is generated as: H = c1ak - 1 + c2ak - 2 + c3ak - 3 + ... + cka0 where a is a constant and c1,...,ck are the input characters. If you follow this link on the Rabin-Karp string search algorithm , it states that "a" is usually some large prime. I want my hashes to be stored in 32 bit integers, so how large of a prime should "a" be, such that I don't overflow my integer? Does there exist an existing implementation of this hash function somewhere that I could already use? Here is an implementation I created: public class hash2 { public int prime = 101; public int hash(String text) { int hash = 0; for(int i = 0; i < text.length(); i++) { char c = text.charAt(i); hash += c * (int) (Math.pow(prime, text.length() - 1 - i)); } return hash; } public int rollHash(int previousHash, String previousText, String currentText) { char firstChar = previousText.charAt(0); char lastChar = currentText.charAt(currentText.length() - 1); int firstCharHash = firstChar * (int) (Math.pow(prime, previousText.length() - 1)); int hash = (previousHash - firstCharHash) * prime + lastChar; return hash; } public static void main(String[] args) { hash2 hashify = new hash2(); int firstHash = hashify.hash("mydog"); System.out.println(firstHash); System.out.println(hashify.hash("ydogr")); System.out.println(hashify.rollHash(firstHash, "mydog", "ydogr")); } } I'm using 101 as my prime. Does it matter if my hashes will overflow? I think this is desirable but I'm not sure. Does this seem like the right way to go about this?

    Read the article

1 2  | Next Page >