Search Results

Search found 6716 results on 269 pages for 'distributed algorithm'.

Page 25/269 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >

  • Designing for an algorithm that reports progress

    - by Stefano Borini
    I have an iterative algorithm and I want to print the progress. However, I may also want it not to print any information, or to print it in a different way, or do other logic. In an object oriented language, I would perform the following solutions: Solution 1: virtual method have the algorithm class MyAlgoClass which implements the algo. The class also implements a virtual reportIteration(iterInfo) method which is empty and can be reimplemented. Subclass the MyAlgoClass and override reportIteration so that it does what it needs to do. This solution allows you to carry additional information (for example, the file unit) in the reimplemented class. I don't like this method because it clumps together two functionalities that may be unrelated, but in GUI apps it may be ok. Solution 2: observer pattern the algorithm class has a register(Observer) method, keeps a list of the registered observers and takes care of calling notify() on each of them. Observer::notify() needs a way to get the information from the Subject, so it either has two parameters, one with the Subject and the other with the data the Subject may pass, or just the Subject and the Observer is now in charge of querying it to fetch the relevant information. Solution 3: callbacks I tend to see the callback method as a lightweight observer. Instead of passing an object, you pass a callback, which may be a plain function, but also an instance method in those languages that allow it (for example, in python you can because passing an instance method will remain bound to the instance). C++ however does not allow it, because if you pass a pointer to an instance method, this will not be defined. Please correct me on this regard, my C++ is quite old. The problem with callbacks is that generally you have to pass them together with the data you want the callback to be invoked with. Callbacks don't store state, so you have to pass both the callback and the state to the Subject in order to find it at callback execution, together with any additional data the Subject may provide about the event is reporting. Question My question is relative to the fact that I need to implement the opening problem in a language that is not object oriented, namely Fortran 95, and I am fighting with my usual reasoning which is based on python assumptions and style. I think that in Fortran the concept is similar to C, with the additional trouble that in C you can store a function pointer, while in Fortran 95 you can only pass it around. Do you have any comments, suggestions, tips, and quirks on this regard (in C, C++, Fortran and python, but also in any other language, so to have a comparison of language features that can be exploited on this regard) on how to design for an algorithm that must report progress to some external entity, using state from both the algorithm and the external entity ?

    Read the article

  • Is there any super fast algorithm for finding LINES on picture?

    - by Ole Jak
    So I have Image like this I need some super fast algorithm for finding all straight lines on it. I want to give to algorithm parameters like min length and max line distortion. I want to get relative to picture pixel coords start and end points of lines. So on this picture to find all lines between dalles and thouse 2 black lines on top. So I need algorithm for super fast finding straight lines of different colors on picture. Is there any such algorithm? (super duper fast=)

    Read the article

  • Algorithm for nice graph labels for time/date axis?

    - by Aaron
    Hello, I'm looking for a "nice numbers" algorithm for determining the labels on a date/time value axis. I'm familar with Paul Heckbert's Nice Numbers algorithm (http://tinyurl.com/5gmk2c). I have a plot that displays time/date on the X axis and the user can zoom in and look at a smaller time frame. I'm looking for an algorithm that picks nice dates to display on the ticks. For example: Looking at a day or so: 1/1 12:00, 1/1 4:00, 1/1 8:00... Looking at a week: 1/1, 1/2, 1/3... Looking at a month: 1/09, 2/09, 3/09... The nice label ticks don't need to correspond to the first visible point, but close to it. Is anybody familar with such an algorithm? Thanks

    Read the article

  • Distributed computing for a company? Is there such a 'free' thing?

    - by Jakub
    I am new to the whole distributed computing / cloud thing. But I had an idea at work for our multimedia stuff like movie encoding / cpu intensive things tasks (which sometimes take a few hours). Is there a 'free' (linux?) way to go about using a Windows machine, and offsetting those cpu cycles for that task to say 10 servers that are generally idle (cpu wise)? I'm just curious if there is a way to do this or am I just grasping at straws here. My thought is that a 'cloud' setup would achieve this, however like I stated initially, I am a total newbie when it comes to it. This is just an idea, looking for some thoughts? Anyone achieve this?

    Read the article

  • Algorithm for a lucky game [on hold]

    - by Ronnie
    Assume we have the following Keno(lottery type) game: From 80 numbers(from 1 to 80), 20 are being drawn. The players choose 1 or 2 or 3..... or 12 numbers to play(12 categories). If they choose for example 4 then they win if they predict correctly a certain amount of numbers(2,3 or 4) from the 4 they have played and lose if the predict only 1 or 0 numbers. They win X times their money accordingly to some predefined factor depending on how many numbers they predict from each category. The same with the other categories. And e.g 11 out of 11 gives 250000 times your money and 12 out of 12 gives 1000000 your money. So the company would want to avoid winnings so high. Every draw by the company is being made every 5 minutes and in each draw around 120000 (let's say) different predictions(Keno tickets) are being played. Let's assume 12000 are being played in category 10 and 12000 in category 11 and also 12000 in category 12. I'm wondering if there is an algorithm to allow the company that provides the game in the 5 minutes between the drawings, to find a 20 number set, in order to avoid any "12 out of 12" and "11 out of 11" and "11 out of 12" and "10 out of 11" and "10 out of 10" winning ticket. That means is there any algorithm, where in a time of less than 1 minute approximately(in todays hardware), to be able to find a 20 number set so that none of the 12000 12 and 11 and 10 number sets that the players played(in categories 10,11 and 12) contains any winning of "12 out of 12" and "11 out of 11" and "11 out of 12" and "10 out of 11" and "10 out of 10"? Or even better the generalization of the problem: What is the best algorithm(from a perspective of minimal time), to be able to find a Y number set from numbers 1 to Z(e.g Y=20, Z=80) so that none of the X sets of K-numbers that are being played(in category K) contains more than K-m numbers from the Y-set? (Note that for Y=K and m=1 there is a practical algorithm.)

    Read the article

  • Sorting Algorithm : output

    - by Aaditya
    I faced this problem on a website and I quite can't understand the output, please help me understand it :- Bogosort, is a dumb algorithm which shuffles the sequence randomly until it is sorted. But here we have tweaked it a little, so that if after the last shuffle several first elements end up in the right places we will fix them and don't shuffle those elements furthermore. We will do the same for the last elements if they are in the right places. For example, if the initial sequence is (3, 5, 1, 6, 4, 2) and after one shuffle we get (1, 2, 5, 4, 3, 6) we will keep 1, 2 and 6 and proceed with sorting (5, 4, 3) using the same algorithm. Calculate the expected amount of shuffles for the improved algorithm to sort the sequence of the first n natural numbers given that no elements are in the right places initially. Input: 2 6 10 Output: 2 1826/189 877318/35343 For each test case output the expected amount of shuffles needed for the improved algorithm to sort the sequence of first n natural numbers in the form of irreducible fractions. I just can't understand the output.

    Read the article

  • How to use TCP/IP Nagle algorithm at Apple Push Notification

    - by Mahbubur R Aaman
    From Apple's Developer Library The binary interface employs a plain TCP socket for binary content that is streaming in nature. For optimum performance, you should batch multiple notifications in a single transmission over the interface, either explicitly or using a TCP/IP Nagle algorithm. How to use TCP/IP Nagle algorithm in case Apple's Push Notification? How to batch multiple notification in a single transmission over the interface? Additional # In Apple's Push Notification Urban Airship is a familiar name to send large amount of push notification within several minutes. Does they use TCP/IP Nagle algorithm?

    Read the article

  • How can I locate empty space next to polygon regions?

    - by Stephen
    Let's say I have the following area in a top-down map: The circle is the player, the black square is an obstacle, and the grey polygons with red borders are walk-able areas that will be used as a navigation mesh for enemies. Obstacles and grey polygons are always convex. The grey regions were defined using an algorithm when the world was generated at runtime. Notice the little white column. I need to figure out where any empty space like this is, if at all, after the algorithm builds the grey regions, so that I can fill the space with another region. Basically what I'm hoping for is an algorithm that can detect empty space next to a polygon.

    Read the article

  • moore's law and quadratic algorithm

    - by damon
    I was going thru a video (from coursera - by sedgewick) in which he argues that you cannot sustain Moore's law using a quadratic algorithm.He elaborates like this In year 197* you build a computer of power X ,and need to count N objects.This takes M days According to Moore's law,you have a computer of power 2X after 1.5 years.But now you have 2N objects to count. If you use a quadratic algorithm, In year 197*+1.5 ,it takes (4M)/2 = 2M days 4M because the algorithm is quadratic,and division by 2 because of doubling computer power. I find this hard to understand.I tried to work thru this as below To count N objects using comp=X , it takes M days. -> N/X = M After 1.5 yrs ,you need to count 2N objects using comp=2X -> 2N/(2X) -> N/X -> M days where do I go wrong? can someone please help me understand?

    Read the article

  • Tangent basis calculation problem

    - by Kirill Daybov
    I have the problem with seams with calculating a tangent basis in my application. I'm using a seems to be right algorithm, but it gives wrong result on the seams. What am I doing wrong? Is there a problem with an algorithm, or with the model? The designer says that our models with our normal maps are rendered correctly in Xoliul Shader Plugin in 3Ds Max, so there should be a way to calculate correct tangent basis programmatically. Here's an example of the problem I'm talking about. Steps, I've already taken: - Tried different algorithm (from Gamasutra, I can't post the link because I don't have enough reputation yet). I got wrong, much worse, results; - Tried to average basis vectors for vertexes are used in multiple faces; - Tried to average basis vectors for vertexes that have same world coordinates (this would be obviously wrong solution, but I've tried it anyway).

    Read the article

  • Most efficient AABB - Ray intersection algorithm for input/output distance calculation

    - by Tobbey
    Thanks to the following thread : most efficient AABB vs Ray collision algorithms I have seen very fast algorithm for ray/AABB intersection point computation. Unfortunately, most of the recent algorithm are accelerated by omitting the "output" intersection point of the box. In my application, I would interested in getting both the the distance from source ray to input: t0 and source ray to output of bounding box: t1. I have seen for instance Eisemann designed a very fast version regarding plucker, smits, ... , but it does not compare the case when both input/output distance should be computed see: http://www.cg.cs.tu-bs.de/publications/Eisemann07FRA/ Does someone know where I can find more information on algorithm performances for the specific input/output problem ? Thank you in advance

    Read the article

  • algorithm to print the digits in the correct order

    - by Aga Waw
    I've been trying to write an algorithm that will print separately the digits from an integer. I have to write it in Pseudocode. I know how to write an algorithm that reverse the digits. digi(n): while n != 0: x = n % 10 n = n // 10 print (x) But I don't know how to write an algorithm to print the digits in the correct order. f.eg. the input is integer 123467 and the output is: 1 2 3 4 6 7 The numbers will be input from the user, and we cannot convert them to a string. I just need help gettin started on writing algorithms. Thanks

    Read the article

  • How does one unit test an algorithm

    - by Asa Baylus
    I was recently working on a JS slideshow which rotates images using a weighted average algorithm. Thankfully, timgilbert has written a weighted list script which implements the exact algorithm I needed. However in his documentation he's noted under todos: "unit tests!". I'd like to know is how one goes about unit testing an algorithm. In the case of a weighted average how would you create a proof that the averages are accurate when there is the element of randomness? Code samples of similar would be very helpful to my understanding.

    Read the article

  • "Bad apple" algorithm, or process crashes shared sandbox

    - by Roger Lipscombe
    I'm looking for an algorithm to handle the following problem, which I'm (for now) calling the "bad apple" algorithm. The problem I've got a N processes running in M sandboxes, where N M. It's impractical to give each process its own sandbox. At least one of those processes is badly-behaved, and is bringing down the entire sandbox, thus killing all of the other processes. If it was a single badly-behaved process, then I could use a simple bisection to put half of the processes in one sandbox, and half in another sandbox, until I found the miscreant. This could probably be extended by partitioning the set into more than two pieces until the badly-behaved process was found. For example, partitioning into 8 sets allows me to eliminate 7/8 of the search space at each step, and so on. The question If more than one process is badly-behaved -- including the possibility that they're all badly-behaved -- does this naive algorithm "work"? Is it guaranteed to work within some sensible bounds?

    Read the article

  • Nearest color algorithm using Hex Triplet

    - by Lijo
    Following page list colors with names http://en.wikipedia.org/wiki/List_of_colors. For example #5D8AA8 Hex Triplet is "Air Force Blue". This information will be stored in a databse table (tbl_Color (HexTriplet,ColorName)) in my system Suppose I created a color with #5D8AA7 Hex Triplet. I need to get the nearest color available in the tbl_Color table. The expected anaser is "#5D8AA8 - Air Force Blue". This is because #5D8AA8 is the nearest color for #5D8AA7. Do we have any algorithm for finding the nearest color? How to write it using C# / Java? REFERENCE http://stackoverflow.com/questions/5440051/algorithm-for-parsing-hex-into-color-family http://stackoverflow.com/questions/6130621/algorithm-for-finding-the-color-between-two-others-in-the-colorspace-of-painte Suggested Formula: Suggested by @user281377. Choose the color where the sum of those squared differences is minimal (Square(Red(source)-Red(target))) + (Square(Green(source)-Green(target))) +(Square(Blue(source)-Blue(target)))

    Read the article

  • How to find 2D grid cells swept by a moving circle?

    - by Nevermind
    I'm making a game based on a 2D grid, with some cells passable and some not. Dynamic objects can move continuously, independent of the grid, but need to collide with impassable cells. I wrote an algorithm to trace a ray against the grid, that gives me all cells that ray intersects. However, actual object are not point-sized; I'm currently representing them as circles. But I can't figure out an effective algorithm to trace a moving circle. Here's a picture of what I need: The numbers show in what order the circle collides with grid cells. Does anybody know the algorithm to find these collisions? Preferably in C#. Update The circle can be bigger than a single grid cell.

    Read the article

  • Healthcare and Distributed Data Don't Mix

    - by [email protected]
    How many times have you heard the story?  Hard disk goes missing, USB thumb drive goes missing, laptop goes missing...Not a week goes by that we don't hear about our data going missing...  Healthcare data is a big one, but we hear about credit card data, pricing info, corporate intellectual property...  When I have spoken at Security and IT conferences part of my message is "Why do you give your users data to lose in the first place?"  I don't suggest they can't have access to it...in fact I work for the company that provides the premiere data security and desktop solutions that DO provide access.  Access isn't the issue.  'Keeping the data' is the issue.We are all human - we all make mistakes... I fault no one for having their car stolen or that they dropped a USB thumb drive. (well, except the thieves - I can certainly find some fault there)  Where I find fault is in policy (or lack thereof sometimes) that allows users to carry around private, and important, data with them.  Mr. Director of IT - It is your fault, not theirs.  Ms. CSO - Look in the mirror.It isn't like one can't find a network to access the data from.  You are on a network right now.  How many Wireless ones (wifi, mifi, cellular...) are there around you, right now?  Allowing employees to remove data from the confines of (wait for it... ) THE DATA CENTER is just plain indefensible when it isn't required.  The argument that the laptop had a password and the hard disk was encrypted is ridiculous.  An encrypted drive tells thieves that before they sell the stolen unit for $75, they should crack the encryption and ascertain what the REAL value of the laptop is... credit card info, Identity info, pricing lists, banking transactions... a veritable treasure trove of info people give away on an 'encrypted disk'.What started this latest rant on lack of data control was an article in Government Health IT that was forwarded to me by Denny Olson, an Oracle Principal Sales Consultant in Minnesota.  The full article is here, but the point was that a couple laptops went missing in a couple different cases, and.. well... no one knows where the data is, and yes - they were loaded with patient info.  What were you thinking?Obviously you can't steal data form a Sun Ray appliance... since it has no data, nor any storage to keep the data on, and Secure Global Desktop allows access from Macs, Linux and Windows client devices...  but in all cases, there is no keeping the data unless you explicitly allow for it in your policy.   Since you can get at the data securely from any network, why would you want to take personal responsibility for it?  Both Sun Rays and Secure Global Desktop are widely used in Healthcare... but clearly not widely enough.We need to do a better job of getting the message out -  Healthcare (or insert your business type here) and distributed data don't mix. Then add Hot Desking and 'follow me printing' and you have something that Clinicians (and CSOs) love.Thanks for putting up my blood pressure, Denny.

    Read the article

  • Jini : single server with multiple clients

    - by user200340
    Hi all, I have a question about how to make multiple clients can access a single file located on server side and keep the file consistent. I have a simple PhoneBook server-client Jini program running at the moment, and server only provides some getter functions to clients, such as getName(String number), getNumber(String name) from a PhoneBook class(serializable), phonebook data are stored in a text file (phonebook.txt) at the moment. I have tried to implement some functions allowing to write a new records into the phonebook.txt file. If the writing record (name) is existing, an integer number will be added into the writing record. for example the existing phonebook.txt is .... John 01-01010101 .... if the writing record is "John 01-12345678",then "John_1 01-12345678" will be writen into phonebook.txt However, if i start with two clients A and B (on the same machine using localhost), and A tries to write "John 01-11111111", B tries to write "John 01-22222222". The early record will be overwritten later record. So, there must be something i did complete wrong. My client and server code are just like Jini HelloWorld example. My server side code is . 1. LookupDiscovery with parameter new String[]{""}; 2. DiscoveryListener for LookupDiscovery 3. registrations are saved into a HashTable 4. for every discovered lookup service, i use registrar to register the ServiceItem, ServiceItem contains a null attributeSets, a null serviceId, and a service. The client code has: 1. LookupDiscovery with parameter new String[]{""}; 2. DiscoveryListener for LookupDiscovery 3. a ServiceTemplate with null attributeSets, a null serviceId and a type, the type is the interface class. 4. for each found ServiceRegistrar, if it can find the looking for ServiceTemplate, the returned Object is cast into the type of the interface class. I have tried to google more details, and i found JavaSpace could be the one i missed. But i am still not sure about it (i only start Jini for a very short time). So any help would be greatly appreciated.

    Read the article

  • Odd company release cycle: Go Distributed Source Control?

    - by MrLane
    sorry about this long post, but I think it is worth it! I have just started with a small .NET shop that operates quite a bit differently to other places that I have worked. Unlike any of my previous positions, the software written here is targetted at multiple customers and not every customer gets the latest release of the software at the same time. As such, there is no "current production version." When a customer does get an update, they also get all of the features added to he software since their last update, which could be a long time ago. The software is highly configurable and features can be turned on and off: so called "feature toggles." Release cycles are very tight here, in fact they are not on a shedule: when a feature is complete the software is deployed to the relevant customer. The team only last year moved from Visual Source Safe to Team Foundation Server. The problem is they still use TFS as if it were VSS and enforce Checkout locks on a single code branch. Whenever a bug fix gets put out into the field (even for a single customer) they simply build whatever is in TFS, test the bug was fixed and deploy to the customer! (Myself coming from a pharma and medical devices software background this is unbeliveable!). The result is that half baked dev code gets put into production without being even tested. Bugs are always slipping into release builds, but often a customer who just got a build will not see these bugs if they don't use the feature the bug is in. The director knows this is a problem as the company is starting to grow all of a sudden with some big clients coming on board and more smaller ones. I have been asked to look at source control options in order to eliminate deploying of buggy or unfinished code but to not sacrifice the somewhat asyncronous nature of the teams releases. I have used VSS, TFS, SVN and Bazaar in my career, but TFS is where most of my experience has been. Previously most teams I have worked with use a two or three branch solution of Dev-Test-Prod, where for a month developers work directly in Dev and then changes are merged to Test then Prod, or promoted "when its done" rather than on a fixed cycle. Automated builds were used, using either Cruise Control or Team Build. In my previous job Bazaar was used sitting on top of SVN: devs worked in their own small feature branches then pushed their changes to SVN (which was tied into TeamCity). This was nice in that it was easy to isolate changes and share them with other peoples branches. With both of these models there was a central dev and prod (and sometimes test) branch through which code was pushed (and labels were used to mark builds in prod from which releases were made...and these were made into branches for bug fixes to releases and merged back to dev). This doesn't really suit the way of working here, however: there is no order to when various features will be released, they get pushed when they are complete. With this requirement the "continuous integration" approach as I see it breaks down. To get a new feature out with continuous integration it has to be pushed via dev-test-prod and that will capture any unfinished work in dev. I am thinking that to overcome this we should go down a heavily feature branched model with NO dev-test-prod branches, rather the source should exist as a series of feature branches which when development work is complete are locked, tested, fixed, locked, tested and then released. Other feature branches can grab changes from other branches when they need/want, so eventually all changes get absorbed into everyone elses. This fits very much down a pure Bazaar model from what I experienced at my last job. As flexible as this sounds it just seems odd to not have a dev trunk or prod branch somewhere, and I am worried about branches forking never to re-integrate, or small late changes made that never get pulled across to other branches and developers complaining about merge disasters... What are peoples thoughts on this? A second final question: I am somewhat confused about the exact definition of distributed source control: some people seem to suggest it is about just not having a central repository like TFS or SVN, some say it is about being disconnected (SVN is 90% disconnected and TFS has a perfectly functional offline mode) and others say it is about Feature Branching and ease of merging between branches with no parent-child relationship (TFS also has baseless merging!). Perhaps this is a second question!

    Read the article

  • Any Open Source Pregel like framework for distributed processing of large Graphs?

    - by Akshay Bhat
    Google has described a novel framework for distributed processing on Massive Graphs. http://portal.acm.org/citation.cfm?id=1582716.1582723 I wanted to know if similar to Hadoop (Map-Reduce) are there any open source implementations of this framework? I am actually in process of writing a Pseudo distributed one using python and multiprocessing module and thus wanted to know if someone else has also tried implementing it. Since public information about this framework is extremely scarce. (A link above and a blog post at Google Research)

    Read the article

  • Bracketing algorithm when root finding. Single root in "quadratic" function

    - by Ander Biguri
    I am trying to implement a root finding algorithm. I am using the hybrid Newton-Raphson algorithm found in numerical recipes that works pretty nicely. But I have a problem in bracketing the root. While implementing the root finding algorithm I realised that in several cases my functions have 1 real root and all the other imaginary (several of them, usually 6 or 9). The only root I am interested is in the real one so the problem is not there. The thing is that the function approaches the root like a cubic function, touching with the point the y=0 axis... Newton-Rapson method needs some brackets of different sign and all the bracketing methods I found don't work for this specific case. What can I do? It is pretty important to find that root in my program... EDIT: more problems: sometimes due to reaaaaaally small numerical errors, say a variation of 1e-6 in some value the "cubic" function does NOT have that real root, it is just imaginary with a neglectable imaginary part... (checked with matlab) EDIT 2: Much more information about the problem. Ok, I need root finding algorithm. Info I have: The root I need to find is between [0-1] , if there are more roots outside that part I am not interested in them. The root is real, there may be imaginary roots, but I don't want them. Probably all the rest of the roots will be imaginary The root may be double in that point, but I think that actually doesn't mater in numerical analysis problems I need to use the root finding algorithm several times during the overall calculations, but the function will always be a polynomial In one of the particular cases of the root finding, my polynomial will be similar to a quadratic function that touches Y=0 with the point. Example of a real case: The coefficient may not be 100% precise and that really slight imprecision may make the function not to touch the Y=0 axis. I cannot solve for this specific case because in other cases it may be that the polynomial is pretty normal and doesn't make any "strange" thing. The method I am actually using is NewtonRaphson hybrid, where if the derivative is really small it makes a bisection instead of NewRaph (found in numerical recipes). Matlab's answer to the function on the image: roots: 0.853553390593276 + 0.353553390593278i 0.853553390593276 - 0.353553390593278i 0.146446609406726 + 0.353553390593273i 0.146446609406726 - 0.353553390593273i 0.499999999999996 + 0.000000040142134i 0.499999999999996 - 0.000000040142134i The function is a real example I prepared where I know that the answer I want is 0.5 Note: I still haven't check completely some of the answers I you people have give me (Thank you!), I am just trying to give al the information I already have to complete the question.

    Read the article

  • Python: (sampling with replacement): efficient algorithm to extract the set of UNIQUE N-tuples from a set

    - by Homunculus Reticulli
    I have a set of items, from which I want to select DISSIMILAR tuples (more on the definition of dissimilar touples later). The set could contain potentially several thousand items, although typically, it would contain only a few hundreds. I am trying to write a generic algorithm that will allow me to select N items to form an N-tuple, from the original set. The new set of selected N-tuples should be DISSIMILAR. A N-tuple A is said to be DISSIMILAR to another N-tuple B if and only if: Every pair (2-tuple) that occurs in A DOES NOT appear in B Note: For this algorithm, A 2-tuple (pair) is considered SIMILAR/IDENTICAL if it contains the same elements, i.e. (x,y) is considered the same as (y,x). This is a (possible variation on the) classic Urn Problem. A trivial (pseudocode) implementation of this algorithm would be something along the lines of def fetch_unique_tuples(original_set, tuple_size): while True: # randomly select [tuple_size] items from the set to create first set # create a key or hash from the N elements and store in a set # store selected N-tuple in a container if end_condition_met: break I don't think this is the most efficient way of doing this - and though I am no algorithm theorist, I suspect that the time for this algorithm to run is NOT O(n) - in fact, its probably more likely to be O(n!). I am wondering if there is a more efficient way of implementing such an algo, and preferably, reducing the time to O(n). Actually, as Mark Byers pointed out there is a second variable m, which is the size of the number of elements being selected. This (i.e. m) will typically be between 2 and 5. Regarding examples, here would be a typical (albeit shortened) example: original_list = ['CAGG', 'CTTC', 'ACCT', 'TGCA', 'CCTG', 'CAAA', 'TGCC', 'ACTT', 'TAAT', 'CTTG', 'CGGC', 'GGCC', 'TCCT', 'ATCC', 'ACAG', 'TGAA', 'TTTG', 'ACAA', 'TGTC', 'TGGA', 'CTGC', 'GCTC', 'AGGA', 'TGCT', 'GCGC', 'GCGG', 'AAAG', 'GCTG', 'GCCG', 'ACCA', 'CTCC', 'CACG', 'CATA', 'GGGA', 'CGAG', 'CCCC', 'GGTG', 'AAGT', 'CCAC', 'AACA', 'AATA', 'CGAC', 'GGAA', 'TACC', 'AGTT', 'GTGG', 'CGCA', 'GGGG', 'GAGA', 'AGCC', 'ACCG', 'CCAT', 'AGAC', 'GGGT', 'CAGC', 'GATG', 'TTCG'] Select 3-tuples from the original list should produce a list (or set) similar to: [('CAGG', 'CTTC', 'ACCT') ('CAGG', 'TGCA', 'CCTG') ('CAGG', 'CAAA', 'TGCC') ('CAGG', 'ACTT', 'ACCT') ('CAGG', 'CTTG', 'CGGC') .... ('CTTC', 'TGCA', 'CAAA') ] [[Edit]] Actually, in constructing the example output, I have realized that the earlier definition I gave for UNIQUENESS was incorrect. I have updated my definition and have introduced a new metric of DISSIMILARITY instead, as a result of this finding.

    Read the article

  • Python: (sampling with replacement): efficient algorithm to extract the set of DISSIMILAR N-tuples from a set

    - by Homunculus Reticulli
    I have a set of items, from which I want to select DISSIMILAR tuples (more on the definition of dissimilar touples later). The set could contain potentially several thousand items, although typically, it would contain only a few hundreds. I am trying to write a generic algorithm that will allow me to select N items to form an N-tuple, from the original set. The new set of selected N-tuples should be DISSIMILAR. A N-tuple A is said to be DISSIMILAR to another N-tuple B if and only if: Every pair (2-tuple) that occurs in A DOES NOT appear in B Note: For this algorithm, A 2-tuple (pair) is considered SIMILAR/IDENTICAL if it contains the same elements, i.e. (x,y) is considered the same as (y,x). This is a (possible variation on the) classic Urn Problem. A trivial (pseudocode) implementation of this algorithm would be something along the lines of def fetch_unique_tuples(original_set, tuple_size): while True: # randomly select [tuple_size] items from the set to create first set # create a key or hash from the N elements and store in a set # store selected N-tuple in a container if end_condition_met: break I don't think this is the most efficient way of doing this - and though I am no algorithm theorist, I suspect that the time for this algorithm to run is NOT O(n) - in fact, its probably more likely to be O(n!). I am wondering if there is a more efficient way of implementing such an algo, and preferably, reducing the time to O(n). Actually, as Mark Byers pointed out there is a second variable m, which is the size of the number of elements being selected. This (i.e. m) will typically be between 2 and 5. Regarding examples, here would be a typical (albeit shortened) example: original_list = ['CAGG', 'CTTC', 'ACCT', 'TGCA', 'CCTG', 'CAAA', 'TGCC', 'ACTT', 'TAAT', 'CTTG', 'CGGC', 'GGCC', 'TCCT', 'ATCC', 'ACAG', 'TGAA', 'TTTG', 'ACAA', 'TGTC', 'TGGA', 'CTGC', 'GCTC', 'AGGA', 'TGCT', 'GCGC', 'GCGG', 'AAAG', 'GCTG', 'GCCG', 'ACCA', 'CTCC', 'CACG', 'CATA', 'GGGA', 'CGAG', 'CCCC', 'GGTG', 'AAGT', 'CCAC', 'AACA', 'AATA', 'CGAC', 'GGAA', 'TACC', 'AGTT', 'GTGG', 'CGCA', 'GGGG', 'GAGA', 'AGCC', 'ACCG', 'CCAT', 'AGAC', 'GGGT', 'CAGC', 'GATG', 'TTCG'] # Select 3-tuples from the original list should produce a list (or set) similar to: [('CAGG', 'CTTC', 'ACCT') ('CAGG', 'TGCA', 'CCTG') ('CAGG', 'CAAA', 'TGCC') ('CAGG', 'ACTT', 'ACCT') ('CAGG', 'CTTG', 'CGGC') .... ('CTTC', 'TGCA', 'CAAA') ] [[Edit]] Actually, in constructing the example output, I have realized that the earlier definition I gave for UNIQUENESS was incorrect. I have updated my definition and have introduced a new metric of DISSIMILARITY instead, as a result of this finding.

    Read the article

< Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >