Search Results

Search found 3758 results on 151 pages for 'efficient'.

Page 124/151 | < Previous Page | 120 121 122 123 124 125 126 127 128 129 130 131  | Next Page >

  • Java: split a List into two sub-Lists?

    - by Chris Conway
    What's the simplest, most standard, and/or most efficient way to split a List into two sub-Lists in Java? It's OK to mutate the original List, so no copying should be necessary. The method signature could be /** Split a list into two sublists. The original list will be modified to * have size i and will contain exactly the same elements at indices 0 * through i-1 as it had originally; the returned list will have size * len-i (where len is the size of the original list before the call) * and will have the same elements at indices 0 through len-(i+1) as * the original list had at indices i through len-1. */ <T> List<T> split(List<T> list, int i); [EDIT] List.subList returns a view on the original list, which becomes invalid if the original is modified. So split can't use subList unless it also dispenses with the original reference (or, as in Marc Novakowski's answer, uses subList but immediately copies the result).

    Read the article

  • Automatic tracking algorithm

    - by nico
    Hi everyone, I'm trying to write a simple tracking routine to track some points on a movie. Essentially I have a series of 100-frames-long movies, showing some bright spots on dark background. I have ~100-150 spots per frame, and they move over the course of the movie. I would like to track them, so I'm looking for some efficient (but possibly not overkilling to implement) routine to do that. A few more infos: the spots are a few (es. 5x5) pixels in size the movement are not big. A spot generally does not move more than 5-10 pixels from its original position. The movements are generally smooth. the "shape" of these spots is generally fixed, they don't grow or shrink BUT they become less bright as the movie progresses. the spots don't move in a particular direction. They can move right and then left and then right again the user will select a region around each spot and then this region will be tracked, so I do not need to automatically find the points. As the videos are b/w, I though I should rely on brigthness. For instance I thought I could move around the region and calculate the correlation of the region's area in the previous frame with that in the various positions in the next frame. I understand that this is a quite naïve solution, but do you think it may work? Does anyone know specific algorithms that do this? It doesn't need to be superfast, as long as it is accurate I'm happy. Thank you nico

    Read the article

  • tips for fixing bad coding/dev habits ?

    - by dfafa
    i want to become a better coder....so i have decided to sign up for computing science program...maybe a formal education can assist me. i started working on smaller projects to learn but currently i have really bad coding/dev habits which is hindering my productivity as the codebase increases.... i have highlighted them and perhaps someone could make suggestions (or redirect to resources) or a more efficient method. most stuff that i made in the past were web apps. i usually develop with putty + nano...i just love the minimalist feel i use winscp and develop directly on my private web server...too lazy to do it on localhost and upload it later. i dont use subversion control...which one do i need ? sometimes ctrl +z doesn't work well. when i run out of ideas for naming variable, i use swear words instead. i swear a lot when i get stuck....how to deal with anger issue ? my codes look ugly with comments everywhere. would rather use procedural coding finds "thinking" in OO difficult and time consuming i "write first think later". refactors code only if i am getting paid for it. dislikes configuring linux distro, Apache, MySQL, scaling, designing graphics and layouts. does not like writing tests likes working alone. does not like sharing codes. has an econ degree dislikes reading other people's code would rather write it on my own it seems my only true desire is to translate my ideas to a working prototype as fast as possible....it seems like i am very uninterested in the other details...could it be that i am not cut out to be a coder after all ? is going back to study comp sci a bad idea ?

    Read the article

  • Reversing permutation of an array in Java efficiently

    - by HansDampf
    Okay, here is my problem: Im implementing an algorithm in Java and part of it will be following: The Question is to how to do what I will explain now in an efficient way. given: array a of length n integer array perm, which is a permutation of [1..n] now I want to shuffle the array a, using the order determined by array perm, i.e. a=[a,b,c,d], perm=[2,3,4,1] ------ shuffledA[b,c,d,a], I figured out I can do that by iterating over the array with: shuffledA[i]=a[perm[i-1]], (-1 because the permutation indexes in perm start with 1 not 0) Now I want to do some operations on shuffledA... And now I want to do the reverse the shuffle operation. This is where I am not sure how to do it. Note that a can hold an item more than once, i.e. a=[a,a,a,a] If that was not the case, I could iterate perm, and find the corresponding indexes to the values. Now I thought that using a Hashmap instead of the the perm array will help. But I am not sure if this is the best way to do.

    Read the article

  • Help making userscript work in chrome

    - by Vishal Shah
    I've written a userscript for Gmail Pimp.my.Gmail & i'd like it to be compatible with Google Chrome too. Now i have tried a couple of things, to the best of my Javascript knowledge (which is very weak) & have been successful up-to a certain extent, though im not sure if it's the right way. Here's what i tried, to make it work in Chrome: The very first thing i found is that contentWindow.document doesn't work in chrome, so i tried contentDocument, which works. BUT i noticed one thing, checking the console messages in Firefox and Chrome, i saw that the script gets executed multiple times in Firefox whereas in Chrome it just executes once! So i had to abandon the window.addEventListener('load', init, false); line and replace it with window.setTimeout(init, 5000); and i'm not sure if this is a good idea. The other thing i tried is keeping the window.addEventListener('load', init, false); line and using window.setTimeout(init, 1000); inside init() in case the canvasframe is not found. So please do lemme know what would be the best way to make this script cross-browser compatible. Oh and im all ears for making this script better/efficient code wise (which is sure there is)

    Read the article

  • How to avoid multiple, unused has_many associations when using multiple models for the same entity (

    - by mikep
    Hello, I'm looking for a nice, Ruby/Rails-esque solution for something. I'm trying to split up some data using multiple tables, rather than just using one gigantic table. My reasoning is pretty much to try and avoid the performance drop that would come with having a big table. So, rather than have one table called books, I have multiple tables: books1, books2, books3, etc. (I know that I could use a partition, but, for now, I've decided to go the 'multiple tables' route.) Each user has their books placed into a specific table. The actual book table is chosen when the user is created, and all of their books go into the same table. The goal is to try and keep each table pretty much even -- but that's a different issue. One thing I don't particularly want to have is a bunch of unused associations in the User class. Right now, it looks like I'd have to do the following: class User < ActiveRecord::Base has_many :books1, :books2, :books3, :books4, :books5 end class Books1 < ActiveRecord::Base belongs_to :user end class Books2 < ActiveRecord::Base belongs_to :user end First off, for each specific user, only one of the book tables would be usable/applicable, since all of a user's books are stored in the same table. So, only one of the associations would be in use at any time and any other has_many :bookX association that was loaded would be a waste. I don't really know Ruby/Rails does internally with all of those has_many associations though, so maybe it's not so bad. But right now I'm thinking that it's really wasteful, and that there may just be a better, more efficient way of doing this. Is there's some sort of special Ruby/Rails methodology that could be applied here to avoid having to have all of those has_many associations? Also, does anyone have any advice on how to abstract the fact that there's multiple book tables behind a single books model/class?

    Read the article

  • Running an existing LINQ query against a dynamic object (DataTable like)

    - by TomTom
    Hello, I am working on a generic OData provider to go against a custom data provider that we have here. Thsi is fully dynamic in that I query the data provider for the table it knows. I have a basic storage structure in place so far based on the OData sample code. My problem is: OData supports queries and expects me to hand in an IQueryable implementation. On the lowe rside, I dont have any query support. Not a joke - the provider returns tables and the WHERE clause is not supported. Performance is not an issue here - the tables are small. It is ok to sort them in the OData provider. My main problem is this. I submit a SQL statement to get out the data of a table. The result is some sort of ADO.NET data reader here. I need to expose an IQueryable implementation for this data to potentially allow later filtering. Any ide ahow to best touch that? .NET 3.5 only (no 4.0 planned for some time). I was seriously thinking of creating dynamic DTO classes for every table (emitting bytecode) so I can use standard LINQ. Right now I am using a dictionary per entry (not too efficient) but I see no real way to filter / sort based on them.

    Read the article

  • How to do this Python / MySQL manipulation (match) more efficiently?

    - by NJTechie
    Following is my data : Company Table : ID Company Address City State Zip Phone 1 ABC 123 Oak St Philly PA 17542 7329878901 2 CDE 111 Joe St Newark NJ 08654 3 GHI 211 Foe St Brick NJ 07740 7321178901 4 JAK 777 Wall Ocean NJ 07764 7322278901 5 KLE 87 Ilk St Plains NY 07654 7376578901 6 AB 1 W.House SField PA 87656 7329878901 Branch Office Table : ID Address City State Zip Phone 1 323 Alk St Philly PA 17542 7329832221 1 171 Joe St Newark NJ 08654 3 287 Foe St Brick NJ 07740 7321178901 3 700 Wall Ocean NJ 07764 7322278901 1 89 Blk St Surrey NY 07154 7376222901 File to be Matched (In MySQL): ID Company Address City State Zip Phone 1 ABC 123 Oak St Philly PA 17542 7329878901 2 AB 171 Joe St Newark NJ 08654 3 GHI 211 Foe St Brick NJ 07740 7321178901 4 JAK 777 Wall Ocean NJ 07764 7322278901 5 K 87 Ilk St Plains NY 07654 7376578901 Resulting File : ID Company Address City State Zip Phone appendedID 1 ABC 123 Oak St Philly PA 17542 7329878901 [Original record, field always empty] 1 ABC 171 Joe St Newark NJ 08654 1 [Company Table] 1 ABC 323 Alk St Philly PA 17542 7329832221 1 [Branch Office Table] 1 AB 1 W.House SField PA 87656 7329878901 6 [Partial firm and State, Zip match] 2 CDE 111 Joe St Newark NJ 08654 3 GHI 211 Foe St Brick NJ 07740 7321178901 3 GHI 700 Wall Ocean NJ 07764 7322278901 3 3 GHI 287 Foe St Brick NJ 07740 7321178901 3 4 JAK 777 Wall Ocean NJ 07764 7322278901 5 KLE 87 Ilk St Surrey NY 07654 7376578901 5 KLE 89 Blk St Surrey NY 07154 7376222901 5 Requirement : 1) I have to match each firm on the 'File to be Matched' to that of Company and Branch Office tables (MySQL). 2) If there are multiple exact/partial matches, then the ID from Company, Branch Office table is inserted as a new row in the resulting file. 3) Not all the firms will be matched perfectly, in that case I have to match on partial Company names (like 5/8th of the company name) and any of the address fields and insert them in the resulting file. Please help me out in the most efficient solution for this problem.

    Read the article

  • Best way to randomly select columns from random rows of SQL results.

    - by LesterDove
    A search of SO yields many results describing how to select random rows of data from a database table. My requirement is a bit different, though, in that I'd like to select individual columns from across random rows in the most efficient/random/interesting way possible. To better illustrate: I have a large Customers table, and from that I'd like to generate a bunch of fictitious demo Customer records that aren't real people. I'm thinking of just querying randomly from the Customers table, and then randomly pairing FirstNames with LastNames, Address, City, State, etc. So if this is my real Customer data (simplified): FirstName LastName State ========================== Sally Simpson SD Will Warren WI Mike Malone MN Kelly Kline KS Then I'd generate several records that look like this: FirstName LastName State ========================== Sally Warren MN Kelly Malone SD Etc. My initial approach works, but it lacks the elegance that I'm hoping the final answer will provide. (I'm particularly unhappy with the repetitiveness of the subqueries, and the fact that this solution requires a known/fixed number of fields and therefore isn't reusable.) SELECT FirstName = (SELECT TOP 1 FirstName FROM Customer ORDER BY newid()), LastName= (SELECT TOP 1 LastNameFROM Customer ORDER BY newid()), State = (SELECT TOP 1 State FROM Customer ORDER BY newid()) Thanks!

    Read the article

  • SSIS - Bulk Update at Database Field Level

    - by Adam
    Hello, Here's our mission: Receive files from clients. Each file contains anywhere from 1 to 1,000,000 records. Records are loaded to a staging area and business-rule validation is applied. Valid records are then pumped into an OLTP database in a batch fashion, with the following rules: If record does not exist (we have a key, so this isn't an issue), create it. If record exists, optionally update each database field. The decision is made based on one of 3 factors...I don't believe it's important what those factors are. Our main problem is finding an efficient method of optionally updating the data at a field level. This is applicable across ~12 different database tables, with anywhere from 10 to 150 fields in each table (original DB design leaves much to be desired, but it is what it is). Our first attempt has been to introduce a table that mirrors the staging environment (1 field in staging for each system field) and contains a masking flag. The value of the masking flag represents the 3 factors. We've then put an UPDATE similar to... UPDATE OLTPTable1 SET Field1 = CASE WHEN Mask.Field1 = 0 THEN Staging.Field1 WHEN Mask.Field1 = 1 THEN COALESCE( Staging.Field1 , OLTPTable1.Field1 ) WHEN Mask.Field1 = 2 THEN COALESCE( OLTPTable1.Field1 , Staging.Field1 ) ... As you can imagine, the performance is rather horrendous. Has anyone tackled a similar requirement? We're a MS shop using a Windows Service to launch SSIS packages that handle the data processing. Unfortunately, we're pretty much novices at this stuff.

    Read the article

  • Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line

    - by skyl
    I have a tab-separated data file with a little over 2 million lines and 19 columns. You can find it, in US.zip: http://download.geonames.org/export/dump/. I started to run the following but with for l in f.readlines(). I understand that just iterating over the file is supposed to be more efficient so I'm posting that below. Still, with this small optimization, I'm using 10% of my memory on the process and have only done about 3% of the records. It looks like, at this pace, it will run out of memory like it did before. Also, the function I have is very slow. Is there anything obvious I can do to speed it up? Would it help to del the objects with each pass of the for loop? def run(): from geonames.models import POI f = file('data/US.txt') for l in f: li = l.split('\t') try: p = POI() p.geonameid = li[0] p.name = li[1] p.asciiname = li[2] p.alternatenames = li[3] p.point = "POINT(%s %s)" % (li[5], li[4]) p.feature_class = li[6] p.feature_code = li[7] p.country_code = li[8] p.ccs2 = li[9] p.admin1_code = li[10] p.admin2_code = li[11] p.admin3_code = li[12] p.admin4_code = li[13] p.population = li[14] p.elevation = li[15] p.gtopo30 = li[16] p.timezone = li[17] p.modification_date = li[18] p.save() except IndexError: pass if __name__ == "__main__": run()

    Read the article

  • Comparing all values within a List against each other

    - by Kave
    I am a bit stuck here and can't think further. public struct CandidateDetail { public int CellX { get; set; } public int CellY { get; set; } public int CellId { get; set; } } var dic = new Dictionary<int, List<CandidateDetail>>(); How can I compare each CandidateDetail item against other CandidateDetail items within the same dictionary in the most efficient way? Example: There are three keys for the dictionary: 5, 6 and 1. Therefore we have three entries. now each of these key entries would have a List associated with. In this case let say each of these three numbers has exactly two CandidateDetails items within the list associated to each key. This means in other words we have two 5, two 6 and two 1 in different or in the same cells. I would like to know: if[5].1stItem.CellId == [6].1stItem.CellId = we got a hit. That means we have a 5 and a 6 within the same Cell if[5].2ndItem.CellId == [6].2ndItem.CellId = perfect. We found out that the other 5 and 6 are together within a different cell. if[1].1stItem.CellId == ... Now I need to check the 1 also against the other 5 and 6 to see if the one exists within the previous same two cells or not. Could a Linq expression help perhaps? I am quite stuck here... I don't know...Maybe I am taking the wrong approach. I am trying to solve the "Hidden pair" of the game Sudoku. :) http://www.sudokusolver.eu/ExplainSolveMethodD.aspx Many Thanks, Kave

    Read the article

  • Getting a lightweight installation of java eclipse.

    - by liam
    Having dealt with yet another stupid eclipse problem, I want to try to get the lightest, most minimal eclipse installation as possible. To be clear, I use eclipse for two things: - Editing Java - Debugging Java Everything else I do through emacs/zsh (editing jsp/xml/js, file management, svn check-in, etc). I have not found any aspect of working in eclipse to do these tasks to be efficient or even reliable, so I do not want plug-ins that relate to it. From the eclipse.org site, this is the lightest install of eclipse that they have, and I don't want any of those things (bugzilla, mylyn, cvs, xml_ui), and have actually had problems with each of them even though I do not use them. So what is the minimal build I can get that will: 1) Ignore svn metadata 2) Includes the full-featured editor (intellisense and type-finding) 3) Includes the full-featured debugger (standard eclipse/jdk) Does not have any extra plug-ins, platforms, or "integrations" with other platforms, specifically, I don't want to deal with plug-ins relating to: Maven, JSP Validation, Javascript editing or validation, CVS or SVN, Mylyn, Spring or Hibernate "natures", app servers like a bundled tomcat/glassfish/etc, J2EE tools, or anything of the like. I do primarily spring/hibernate/web-mvc apps, and have never dealt with an eclipse plug-in that handles any of it gracefully, I can work effectively with my own toolset, but eclipse extensions do nothing but get in the way. I have worked with plain eclipse up to Ganymede, MyEclipse (up to 7.5), and the latest version of Spring-SourceTools, and find that they are all saddled with buggy useless plug-ins (though the combination is always different). Switching to netbeans/intellij is not an option, and my teammates work with svn-controlled .class/.project files, so it pretty much has to be eclipse. Does anyone have any good advice on how I can save a few grey hairs?

    Read the article

  • Reading text files line by line, with exact offset/position reporting

    - by Benjamin Podszun
    Hi. My simple requirement: Reading a huge ( a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X). I tried the naive and easy way first, using a StreamWriter and accessing the underlying BaseStream.Position. Unfortunately that doesn't work as I intended: Given a file containing the following Foo Bar Baz Bla Fasel and this very simple code using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) { string line; long pos = sr.BaseStream.Position; while ((line = sr.ReadLine()) != null) { Console.Write("{0:d3} ", pos); Console.WriteLine(line); pos = sr.BaseStream.Position; } } the output is: 000 Foo 025 Bar 025 Baz 025 Bla 025 Fasel I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad.. The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..

    Read the article

  • Filtering on a left join in SQLalchemy

    - by Adam Ernst
    Using SQLalchemy I want to perform a left outer join and filter out rows that DO have a match in the joined table. I'm sending push notifications, so I have a Notification table. This means I also have a ExpiredDeviceId table to store device_ids that are no longer valid. (I don't want to just delete the affected notifications as the user might later re-install the app, at which point the notifications should resume according to Apple's docs.) CREATE TABLE Notification (device_id TEXT, time DATETIME); CREATE TABLE ExpiredDeviceId (device_id TEXT PRIMARY KEY, expiration_time DATETIME); Note: there may be multiple Notifications per device_id. There is no "Device" table for each device. So when doing SELECT FROM Notification I should filter accordingly. I can do it in SQL: SELECT * FROM Notification LEFT OUTER JOIN ExpiredDeviceId ON Notification.device_id = ExpiredDeviceId.device_id WHERE expiration_time == NULL But how can I do it in SQLalchemy? sess.query( Notification, ExpiredDeviceId ).outerjoin( (ExpiredDeviceId, Notification.device_id == ExpiredDeviceId.device_id) ).filter( ??? ) Alternately I could do this with a device_id NOT IN (SELECT device_id FROM ExpiredDeviceId) clause, but that seems way less efficient.

    Read the article

  • Migrating a Large amount of data from old publishing site to new site

    - by tommizzle
    Hi, I am currently in the process of creating a new news/publishing site on the Movable Type platform. There are around 20 or so sites with 20,000+ rows of data to be moved/aggregated to ~8 sites (we have a number of location specific sites and are going to aggregate the content from these into 1 single site for each niche). We have discussed how to do this and came to the conclusion that it would probably be better to hire somebody to do it (I could probably do it, but i'm limited on time and am sure that a specialist would be more efficient). So my questions to you guys are: 1) What kind of skill set should we look for in an applicant? 2) There will be a large amount of input from our side... is getting somebody to work remotely out of the question? 3) How long would a task like this traditionally take (I know this question is very subjective, but an estimation would be awesome)? 4) Do you have any recommendations for firms who would be able to take on a large task like this? Thanks in advance, Tom

    Read the article

  • How to update application files using patching?

    - by Marek
    I am not interested in any auto update solution, such as ClickOnce or the MS Updater Block. For anyone feeling the urge to ask why not: I am already using these and there is nothing wrong with them, I would just like to learn about any efficient alternatives. I would like to publish patches = small differences that will modify existing files of the deployment with the smallest possible delta. Not only code needs to be patched, but also resource files. Patching the running code can be accomplished by maintaining two separate synchronized copies of the deployment (no on the fly changes to the running executable are required). The application itself can be xcopy deployed (to avoid MSI auto-correcting the modified files or breaking ClickOnce signatures). I would like to learn how to handle different versions of patches (e.g. there is a patch issued that fixes one error and later another patch that fixes another error (in the same file) - users may have any combination of these and there comes a third patch - in text files, this may be easy to implement, but how about executable files? (native Win32 code vs. .NET, any difference?) If the first problem is too hard to solve or unsolvable for executables, I would like to at least learn if there is a solution that implements simple patching with serial revisions - in order to install revision 5, user must have all previous revisions installed to ensure validity of the deployment. Are there any existing solutions to accomplish this? NOTE: There are a few questions on SO that may seem like duplicates, but none with a good answer. This question is about the Windows platform, preferably .NET.

    Read the article

  • Read from one large file and write to many (tens, hundreds, or thousands) files in Java?

    - by Rudiger
    I have a large-ish file (4-5 GB compressed) of small messages that I wish to parse into approximately 6,000 files by message type. Messages are small; anywhere from 5 to 50 bytes depending on the type. Each message starts with a fixed-size type field (a 6-byte key). If I read a message of type '000001', I want to write append its payload to 000001.dat, etc. The input file contains a mixture of messages; I want N homogeneous output files, where each output file contains only the messages of a given type. What's an efficient a fast way of writing these messages to so many individual files? I'd like to use as much memory and processing power to get it done as fast as possible. I can write compressed or uncompressed files to the disk. I'm thinking of using a hashmap with a message type key and an outputstream value, but I'm sure there's a better way to do it. Thanks!

    Read the article

  • The best way to implement drawing features like Keynote

    - by Shamseddine
    Hi all, I'm trying to make a little iPad tool's for drawing simple geometrical objects (rect, rounded rect, ellipse, star, ...). My goal is to make something very close to Keynote (drawing feature), i.e. let the user add a rect (for instance), resizing it and moving it. I want too the user can select many objects and move them together. I've thought about at least 3 differents ways to do that : Extends UIView for each object type, a class for Rect, another for Ellipse, ... With custom drawing method. Then add this view as subview of the global view. Extends CALayer for each object type, a class for Rect, another for Ellipse, ... With custom drawing method. Then add this layer as sublayer of the global view layer's. Extends NSObject for each object type, a class for Rect, another for Ellipse, ... With just a drawing method which will get as argument a CGContext and a Rect and draw directly the form in it. Those methods will be called by the drawing method of the global view. I'm aware that the two first ways come with functions to detect touch on each object, to add easily shadows,... but I'm afraid that they are a little too heavy ? That's why I thought about the last way, which it seems to be straight forward. Which way will be the more efficient ??? Or maybe I didn't thought another way ? Any help will be appreciated ;-) Thanks.

    Read the article

  • Best way to detect similar email addresses?

    - by Chris
    I have a list of ~20,000 email addresses, some of which I know to be fraudulent attempts to get around a "1 per e-mail" limit. ([email protected], [email protected], [email protected], etc...). I want to find similar email addresses for evaluation. Currently I'm using a levenshtein algorithm to check each e-mail against the others in the list and report any with an edit distance of less than 2. However, this is painstakingly slow. Is there a more efficient approach? The test code I'm using now is: using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; using System.Threading; namespace LevenshteinAnalyzer { class Program { const string INPUT_FILE = @"C:\Input.txt"; const string OUTPUT_FILE = @"C:\Output.txt"; static void Main(string[] args) { var inputWords = File.ReadAllLines(INPUT_FILE); var outputWords = new SortedSet<string>(); for (var i = 0; i < inputWords.Length; i++) { if (i % 100 == 0) Console.WriteLine("Processing record #" + i); var word1 = inputWords[i].ToLower(); for (var n = i + 1; n < inputWords.Length; n++) { if (i == n) continue; var word2 = inputWords[n].ToLower(); if (word1 == word2) continue; if (outputWords.Contains(word1)) continue; if (outputWords.Contains(word2)) continue; var distance = LevenshteinAlgorithm.Compute(word1, word2); if (distance <= 2) { outputWords.Add(word1); outputWords.Add(word2); } } } File.WriteAllLines(OUTPUT_FILE, outputWords.ToArray()); Console.WriteLine("Found {0} words", outputWords.Count); } } }

    Read the article

  • How to preserve order of temp table rows when inner joined with another table?

    - by Triynko
    Does an SQL Server "join" preserve any kind of row order consistently (i.e. that of the left table or that of the right table)? Psuedocode: create table #p (personid bigint); foreach (id in personid_list) insert into #p (personid) values (id) select id from users inner join #p on users.personid = #p.id Suppose I have a list of IDs that correspond to person entries. Each of those IDs may correspond to zero or more user accounts (since each person can have multiple accounts). To quickly select columns from the users table, I populate a temp table with person ids, then inner join it with the users table. I'm looking for an efficient way to ensure that the order of the results in the join matches the order of the ids as they were inserted into the temp table, so that the user list that's returned is in the same order as the person list as it was entered. I've considered the following alternatives: using "#p inner join users", in case the left table's order is preserved using "#p left join users where id is not null", in case a left join preserves order and the inner join doesn't using "create table (rownum int, personid bigint)", inserting an incrementing row number as the temp table is populated, so the results can be ordered by rownum in the join using an SQL Server equivalent of the "order by order of [tablename]" clause available in DB2 I'm currently using option 3, and it works... but I hate the idea of using an order by clause for something that's already ordered. I just don't know if the temp table preserves the order in which the rows were inserted or how the join operates and what order the results come out in.

    Read the article

  • How to eliminate duplicate nodes bases on values of multiple attributes?

    - by JayRaj
    Hello All, How can I eliminate duplicate nodes based on values of multiple (more than 1) attributes? Also the attribute names are passed as parameters to the stylesheet. Now I am aware of the Muenchian method of grouping that uses a <xsl:key> element. But I came to know that XSLT 1.0 does not allow paramters/variables in <xsl:key>. Is there another method(s) to achieve duplicate nodes removal? It is fine if it not as efficient as the Munechian method. Update from previus question: XML: <data id = "root"> <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/> <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/> <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/> <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/> <record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/> <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/> <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/> <record id="8" operator1='abc' operator2='yyy' operator3='zzz'/> <record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/> <record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/> </data>

    Read the article

  • Coding the Python way

    - by Aaron Moodie
    I've just spent the last half semester at Uni learning python. I've really enjoyed it, and was hoping for a few tips on how to write more 'pythonic' code. This is the __init__ class from a recent assignment I did. At the time I wrote it, I was trying to work out how I could re-write this using lambdas, or in a neater, more efficient way, but ran out of time. def __init__(self, dir): def _read_files(_, dir, files): for file in files: if file == "classes.txt": class_list = readtable(dir+"/"+file) for item in class_list: Enrol.class_info_dict[item[0]] = item[1:] if item[1] in Enrol.classes_dict: Enrol.classes_dict[item[1]].append(item[0]) else: Enrol.classes_dict[item[1]] = [item[0]] elif file == "subjects.txt": subject_list = readtable(dir+"/"+file) for item in subject_list: Enrol.subjects_dict[item[0]] = item[1] elif file == "venues.txt": venue_list = readtable(dir+"/"+file) for item in venue_list: Enrol.venues_dict[item[0]] = item[1:] elif file.endswith('.roll'): roll_list = readlines(dir+"/"+file) file = os.path.splitext(file)[0] Enrol.class_roll_dict[file] = roll_list for item in roll_list: if item in Enrol.enrolled_dict: Enrol.enrolled_dict[item].append(file) else: Enrol.enrolled_dict[item] = [file] try: os.path.walk(dir, _read_files, None) except: print "There was a problem reading the directory" As you can see, it's a little bulky. If anyone has the time or inclination, I'd really appreciate a few tips on some python best-practices. Thanks.

    Read the article

  • parse content away from structure in a binary file

    - by Jeff Godfrey
    Using C#, I need to read a packed binary file created using FORTRAN. The file is stored in an "Unformatted Sequential" format as described here (about half-way down the page in the "Unformatted Sequential Files" section): http://www.tacc.utexas.edu/services/userguides/intel8/fc/f_ug1/pggfmsp.htm As you can see from the URL, the file is organized into "chunks" of 130 bytes or less and includes 2 length bytes (inserted by the FORTRAN compiler) surrounding each chunk. So, I need to find an efficient way to parse the actual file payload away from the compiler-inserted formatting. Once I've extracted the actual payload from the file, I'll then need to parse it up into its varying data types. That'll be the next exercise. My first thoughts are to slurp up the entire file into a byte array using File.ReadAllBytes. Then, just iterate through the bytes, skipping the formatting and transferring the actual data to a second byte array. In the end, that second byte array should contain the actual file contents minus all the formatting, which I'd then need to go back through to get what I need. As I'm fairly new to C#, I thought there might be a better, more accepted way of tackling this. Also, in case it's helpful, these files could be fairly large (say 30MB), though most will be much smaller...

    Read the article

  • Speeding up inner-joins and subqueries while restricting row size and table membership

    - by hiffy
    I'm developing an rss feed reader that uses a bayesian filter to filter out boring blog posts. The Stream table is meant to act as a FIFO buffer from which the webapp will consume 'entries'. I use it to store the temporary relationship between entries, users and bayesian filter classifications. After a user marks an entry as read, it will be added to the metadata table (so that a user isn't presented with material they have already read), and deleted from the stream table. Every three minutes, a background process will repopulate the Stream table with new entries (i.e. whenever the daemon adds new entries after the checks the rss feeds for updates). Problem: The query I came up with is hella slow. More importantly, the Stream table only needs to hold one hundred unread entries at a time; it'll reduce duplication, make processing faster and give me some flexibility with how I display the entries. The query (takes about 9 seconds on 3600 items with no indexes): insert into stream(entry_id, user_id) select entries.id, subscriptions_users.user_id from entries inner join subscriptions_users on subscriptions_users.subscription_id = entries.subscription_id where subscriptions_users.user_id = 1 and entries.id not in (select entry_id from metadata where metadata.user_id = 1) and entries.id not in (select entry_id from stream where user_id = 1); The query explained: insert into stream all of the entries from a user's subscription list (subscriptions_users) that the user has not read (i.e. do not exist in metadata) and which do not already exist in the stream. Attempted solution: adding limit 100 to the end speeds up the query considerably, but upon repeated executions will keep on adding a different set of 100 entries that do not already exist in the table (with each successful query taking longer and longer). This is close but not quite what I wanted to do. Does anyone have any advice (nosql?) or know a more efficient way of composing the query?

    Read the article

< Previous Page | 120 121 122 123 124 125 126 127 128 129 130 131  | Next Page >