tuples - Page 9 - Developer IT

Breaking 1NF to model subset constraints. Does this sound sane?

- by Chris Travers

My first question here. Appologize if it is in the wrong forum but this seems pretty conceptual. I am looking at doing something that goes against conventional wisdom and want to get some feedback as to whether this is totally insane or will result in problems, so critique away! I am on PostgreSQL 9.1 but may be moving to 9.2 for this part of this project. To re-iterate: Does it seem sane to break 1NF in this way? I am not looking for debugging code so much as where people see problems that this might lead. The Problem In double entry accounting, financial transactions are journal entries with an arbitrary number of lines. Each line has either a left value (debit) or a right value (credit) which can be modelled as a single value with negatives as debits and positives as credits or vice versa. The sum of all debits and credits must equal zero (so if we go with a single amount field, sum(amount) must equal zero for each financial journal entry). SQL-based databases, pretty much required for this sort of work, have no way to express this sort of constraint natively and so any approach to enforcing it in the database seems rather complex. The Write Model The journal entries are append only. There is a possibility we will add a delete model but it will be subject to a different set of restrictions and so is not applicable here. If and when we allow deletes, we will probably do them using a simple ON DELETE CASCADE designation on the foreign key, and require that deletes go through a dedicated stored procedure which can enforce the other constraints. So inserts and selects have to be accommodated but updates and deletes do not for this task. My Proposed Solution My proposed solution is to break first normal form and model constraints on arrays of tuples, with a trigger that breaks the rows out into another table. CREATE TABLE journal_line ( entry_id bigserial primary key, account_id int not null references account(id), journal_entry_id bigint not null, -- adding references later amount numeric not null ); I would then add "table methods" to extract debits and credits for reporting purposes: CREATE OR REPLACE FUNCTION debits(journal_line) RETURNS numeric LANGUAGE sql IMMUTABLE AS $$ SELECT CASE WHEN $1.amount < 0 THEN $1.amount * -1 ELSE NULL END; $$; CREATE OR REPLACE FUNCTION credits(journal_line) RETURNS numeric LANGUAGE sql IMMUTABLE AS $$ SELECT CASE WHEN $1.amount > 0 THEN $1.amount ELSE NULL END; $$; Then the journal entry table (simplified for this example): CREATE TABLE journal_entry ( entry_id bigserial primary key, -- no natural keys :-( journal_id int not null references journal(id), date_posted date not null, reference text not null, description text not null, journal_lines journal_line[] not null ); Then a table method and and check constraints: CREATE OR REPLACE FUNCTION running_total(journal_entry) returns numeric language sql immutable as $$ SELECT sum(amount) FROM unnest($1.journal_lines); $$; ALTER TABLE journal_entry ADD CONSTRAINT CHECK (((journal_entry.running_total) = 0)); ALTER TABLE journal_line ADD FOREIGN KEY journal_entry_id REFERENCES journal_entry(entry_id); And finally we'd have a breakout trigger: CREATE OR REPLACE FUNCTION je_breakout() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN IF TG_OP = 'INSERT' THEN INSERT INTO journal_line (journal_entry_id, account_id, amount) SELECT NEW.id, account_id, amount FROM unnest(NEW.journal_lines); RETURN NEW; ELSE RAISE EXCEPTION 'Operation Not Allowed'; END IF; END; $$; And finally CREATE TRIGGER AFTER INSERT OR UPDATE OR DELETE ON journal_entry FOR EACH ROW EXECUTE_PROCEDURE je_breaout(); Of course the example above is simplified. There will be a status table that will track approval status allowing for separation of duties, etc. However the goal here is to prevent unbalanced transactions. Any feedback? Does this sound entirely insane? Standard Solutions? In getting to this point I have to say I have looked at four different current ERP solutions to this problems: Represent every line item as a debit and a credit against different accounts. Use of foreign keys against the line item table to enforce an eventual running total of 0 Use of constraint triggers in PostgreSQL Forcing all validation here solely through the app logic. My concerns are that #1 is pretty limiting and very hard to audit internally. It's not programmer transparent and so it strikes me as being difficult to work with in the future. The second strikes me as being very complex and required a series of contraints and foreign keys against self to make work, and therefore it strikes me as complex, hard to sort out at least in my mind, and thus hard to work with. The fourth could be done as we force all access through stored procedures anyway and this is the most common solution (have the app total things up and throw an error otherwise). However, I think proof that a constraint is followed is superior to test cases, and so the question becomes whether this in fact generates insert anomilies rather than solving them. If this is a solved problem it isn't the case that everyone agrees on the solution....

Read the article

Python: How to read huge text file into memory

- by asmaier

I'm using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file $ ls -l links.csv; file links.csv; tail links.csv -rw-r--r-- 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line terminators 4757187,59883 4757187,99822 4757187,66546 4757187,638452 4757187,4627959 4757187,312826 4757187,6143 4757187,6141 4757187,3081726 4757187,58197 So each line in the file consists of a tuple of two comma separated integer values. I want to read in the whole file and sort it according to the second column. I know, that I could do the sorting without reading the whole file into memory. But I thought for a file of 500MB I should still be able to do it in memory since I have 1GB available. However when I try to read in the file, Python seems to allocate a lot more memory than is needed by the file on disk. So even with 1GB of RAM I'm not able to read in the 500MB file into memory. My Python code for reading the file and printing some information about the memory consumption is: #!/usr/bin/python # -*- coding: utf-8 -*- import sys infile=open("links.csv", "r") edges=[] count=0 #count the total number of lines in the file for line in infile: count=count+1 total=count print "Total number of lines: ",total infile.seek(0) count=0 for line in infile: edge=tuple(map(int,line.strip().split(","))) edges.append(edge) count=count+1 # for every million lines print memory consumption if count%1000000==0: print "Position: ", edge print "Read ",float(count)/float(total)*100,"%." mem=sys.getsizeof(edges) for edge in edges: mem=mem+sys.getsizeof(edge) for node in edge: mem=mem+sys.getsizeof(node) print "Memory (Bytes): ", mem The output I got was: Total number of lines: 30609720 Position: (9745, 2994) Read 3.26693612356 %. Memory (Bytes): 64348736 Position: (38857, 103574) Read 6.53387224712 %. Memory (Bytes): 128816320 Position: (83609, 63498) Read 9.80080837067 %. Memory (Bytes): 192553000 Position: (139692, 1078610) Read 13.0677444942 %. Memory (Bytes): 257873392 Position: (205067, 153705) Read 16.3346806178 %. Memory (Bytes): 320107588 Position: (283371, 253064) Read 19.6016167413 %. Memory (Bytes): 385448716 Position: (354601, 377328) Read 22.8685528649 %. Memory (Bytes): 448629828 Position: (441109, 3024112) Read 26.1354889885 %. Memory (Bytes): 512208580 Already after reading only 25% of the 500MB file, Python consumes 500MB. So it seem that storing the content of the file as a list of tuples of ints is not very memory efficient. Is there a better way to do it, so that I can read in my 500MB file into my 1GB of memory?

Read the article

Constraint Satisfaction Problem

- by Carl Smotricz

I'm struggling my way through Artificial Intelligence: A Modern Approach in order to alleviate my natural stupidity. In trying to solve some of the exercises, I've come up against the "Who Owns the Zebra" problem, Exercise 5.13 in Chapter 5. This has been a topic here on SO but the responses mostly addressed the question "how would you solve this if you had a free choice of problem solving software available?" I accept that Prolog is a very appropriate programming language for this kind of problem, and there are some fine packages available, e.g. in Python as shown by the top-ranked answer and also standalone. Alas, none of this is helping me "tough it out" in a way as outlined by the book. The book appears to suggest building a set of dual or perhaps global constraints, and then implementing some of the algorithms mentioned to find a solution. I'm having a lot of trouble coming up with a set of constraints suitable for modelling the problem. I'm studying this on my own so I don't have access to a professor or TA to get me over the hump - this is where I'm asking for your help. I see little similarity to the examples in the chapter. I was eager to build dual constraints and started out by creating (the logical equivalent of) 25 variables: nationality1, nationality2, nationality3, ... nationality5, pet1, pet2, pet3, ... pet5, drink1 ... drink5 and so on, where the number was indicative of the house's position. This is fine for building the unary constraints, e.g. The Norwegian lives in the first house: nationality1 = { :norway }. But most of the constraints are a combination of two such variables through a common house number, e.g. The Swede has a dog: nationality[n] = { :sweden } AND pet[n] = { :dog } where n can range from 1 to 5, obviously. Or stated another way: nationality1 = { :sweden } AND pet1 = { :dog } XOR nationality2 = { :sweden } AND pet2 = { :dog } XOR nationality3 = { :sweden } AND pet3 = { :dog } XOR nationality4 = { :sweden } AND pet4 = { :dog } XOR nationality5 = { :sweden } AND pet5 = { :dog } ...which has a decidedly different feel to it than the "list of tuples" advocated by the book: ( X1, X2, X3 = { val1, val2, val3 }, { val4, val5, val6 }, ... ) I'm not looking for a solution per se; I'm looking for a start on how to model this problem in a way that's compatible with the book's approach. Any help appreciated.

Read the article

def constrainedMatchPair(firstMatch,secondMatch,length):

- by smart

matches of a key string in a target string, where one of the elements of the key string is replaced by a different element. For example, if we want to match ATGC against ATGACATGCACAAGTATGCAT, we know there is an exact match starting at 5 and a second one starting at 15. However, there is another match starting at 0, in which the element A is substituted for C in the key, that is we match ATGC against the target. Similarly, the key ATTA matches this target starting at 0, if we allow a substitution of G for the second T in the key string. consider the following steps. First, break the key string into two parts (where one of the parts could be an empty string). Let's call them key1 and key2. For each part, use your function from Problem 2 to find the starting points of possible matches, that is, invoke starts1 = subStringMatchExact(target,key1) and starts2 = subStringMatchExact(target,key2) The result of these two invocations should be two tuples, each indicating the starting points of matches of the two parts (key1 and key2) of the key string in the target. For example, if we consider the key ATGC, we could consider matching A and GC against a target, like ATGACATGCA (in which case we would get as locations of matches for A the tuple (0, 3, 5, 9) and as locations of matches for GC the tuple (7,). Of course, we would want to search over all possible choices of substrings with a missing element: the empty string and TGC; A and GC; AT and C; and ATG and the empty string. Note that we can use your solution for Problem 2 to find these values. Once we have the locations of starting points for matches of the two substrings, we need to decide which combinations of a match from the first substring and a match of the second substring are correct. There is an easy test for this. Suppose that the index for the starting point of the match of the first substring is n (which would be an element of starts1), and that the length of the first substring is m. Then if k is an element of starts2, denoting the index of the starting point of a match of the second substring, there is a valid match with one substitution starting at n, if n+m+1 = k, since this means that the second substring match starts one element beyond the end of the first substring. finally the question is Write a function, called constrainedMatchPair which takes three arguments: a tuple representing starting points for the first substring, a tuple representing starting points for the second substring, and the length of the first substring. The function should return a tuple of all members (call it n) of the first tuple for which there is an element in the second tuple (call it k) such that n+m+1 = k, where m is the length of the first substring.

Read the article

Best way to have common class shared by both C++ and Ruby?

- by shuttle87

I am currently working on a project where a team of us are designing a game, all of us are proficient in ruby and some (but not all) of us are proficient in c++. Initially we made the backend in ruby but we ported it to c++ for more speed. The c++ port of the backend has exactly the same features and algorithms as the original ruby code. However we still have a bunch of code in ruby that does useful things but we want it to now get the data from the c++ classes. Our first thought was that we could save some of the data structures in something like XML or redis and call that, but some of the developers don't like that idea. We don't need anything particularly complex data structures to be passed between the different parts of the code, just tuples, strings and ints. Is there any way of integrating the ruby code so that it can call the c++ stuff natively? Will we need to embed code? Will we have to make a ruby extension? If so are there any good resources/tutorials you could suggest? For example say we have this code in the c++ backend: class The_game{ private: bool printinfo; //print the player diagnostic info at the beginning if true int numplayers; std::vector<Player*> players; string current_action; int action_is_on; // the index of the player in the players array that the action is now on //more code here public: Table(std::vector<Player *> in_players, std::vector<Statistics *> player_stats ,const int in_numplayers); ~Table(); void play_game(); History actions_history; }; class History{ private: int action_sequence_number; std::vector<Action*> hand_actions; public: void print_history(); void add_action(Action* the_action_to_be_added); int get_action_sequence_number(){ return action_sequence_number;} bool history_actions_are_equal(); int last_action_size(int street,int number_of_actions_ago); History(); ~History(); }; Is there any way to natively call something in the actions_history via The_game object in ruby? (The objects in the original ruby code all had the same names and functionality) By this I mean: class MyRubyClass def method1(arg1) puts arg1 self.f() # ... but still available puts cpp_method.the_current_game.actions_history.get_action_sequence_number() end # Constructor: def initialize(arg) puts "In constructor with arg #{arg}" #get the c++ object here and call it cpp_method end end Is this possible? Any advice or suggestions are appreciated.

Read the article

Python - 2 Questions: Editing a variable in a function and changing the order of if else statements

- by Eric

First of all, I should explain what I'm trying to do first. I'm creating a dungeon crawler-like game, and I'm trying to program the movement of computer characters/monsters in the map. The map is basically a Cartesian coordinate grid. The locations of characters are represented by tuples of the x and y values, (x,y). The game works by turns, and in a turn a character can only move up, down, left or right 1 space. I'm creating a very simple movement system where the character will simply make decisions to move on a turn by turn basis. Essentially a 'forgetful' movement system. A basic flow chart of what I'm intending to do: Find direction towards destination Make a priority list of movements to be done using the direction eg.('r','u','d','l') means it would try to move right first, then up, then down, then left. Try each of the possibilities following the priority order. If the first movement fails (blocked by obstacle etc.), then it would successively try the movements until the first one that is successful, then it would stop. At step 3, the way I'm trying to do it is like this: def move(direction,location): try: -snip- # Tries to move, raises the exception Movementerror if cannot move in the direction return 1 # Indicates movement successful except Movementerror: return 0 # Indicates movement unsuccessful (thus character has not moved yet) prioritylist = ('r','u','d','l') if move('r',location): pass elif move('u',location): pass elif move('d',location): pass elif move('l',location): pass else: pass In the if/else block, the program would try the first movement on the priority on the priority list. At the move function, the character would try to move. If the character is not blocked and does move, it returns 1, leading to the pass where it would stop. If the character is blocked, it returns 0, then it tries the next movement. However, this results in 2 problems: How do I edit a variable passed into a function inside the function itself, while returning if the edit is successful? I have been told that you can't edit a variable inside a function as it won't really change the value of the variable, it just makes the variable inside the function refer to something else while the original variable remain unchanged. So, the solution is to return the value and then assign the variable to the returned value. However, I want it to return another value indicating if this edit is successful, so I want to edit this variable inside the function itself. How do I do so? How do I change the order of the if/else statements to follow the order of the priority list? It needs to be able to change during runtime as the priority list can change resulting in a different order of movement to try.

Read the article

CodePlex Daily Summary for Monday, September 17, 2012

CodePlex Daily Summary for Monday, September 17, 2012Popular ReleasesMetodología General Ajustada - MGA: 03.01.06: Cambios Parmenio: Actualizaciones al formato 2 de programación, se corrigió las vigencias y los estados de los botones. Cambios John: Integración de código con cambios enviados por Parmenio Bonilla. Generación de instaladores. Soporte técnico por correo electrónico y telefónico.Visual Studio Icon Patcher: Version 1.5.1: This fixes a bug in the 1.5 release where it would crash when no language packs were installed for VS2010.WPF Animated GIF: WPF Animated GIF 1.2: Improvements Added support for using the repeat count from GIF metadata (Netscape application block) if the RepeatBehavior is not explicitly specified. If the repeat count can't be found in the metadata, the behavior will default to Forever Note: the default value for the RepeatBehavior property has been changed to 0x (the default value for this type) instead of Forever. This might be a breaking change in some cases. If you need the animation to run forever regardless of the repeat count spe...Layered Architecture Solution Guidance (LASG): LASG 1.0.0.6 for Visual Studio 2012: PRE-REQUISITES Open GAX SQL Server 2008 R2 Management Objects Microsoft Enterprise Library 5.0 (for the generated code) Windows Azure SDK (for layered cloud applications) Silverlight 5 SDK (for Silverlight applications) THE RELEASE This is release only works on Visual Studio 2012. For the Visual Studio 2010 version, please visit here. To read more about the features in this version, please visit here. Take note that LASG is not meant to generate an entire application but just th...DeForm: DeForm v1.1: New javascript client app New effects: brightness, hue, saturationsheetengine - Isometric HTML5 JavaScript Display Engine: sheetengine v1.1.0: This release of sheetengine introduces major drawing optimizations. A background canvas is created with the full drawn scenery onto which only the changed parts are redrawn. For example a moving object will cause only its bounding box to be redrawn instead of the full scene. This background canvas is copied to the main canvas in each iteration. For this reason the size of the bounding box of every object needs to be defined and also the width and height of the background canvas. The example...VFPX: Desktop Alerts 1.0.2: This update for the Desktop Alerts contains changes to behavior for setting custom sounds for alerts. I have removed ALERTWAV.TXT from the project, and also removed DA_DEFAULTSOUND from the VFPALERT.H file. The AlertManager class and Alert class both have a "default" cSound of ADDBS(JUSTPATH(_VFP.ServerName))+"alert.wav" --- so, as long as you distribute a sound file with the file name "alert.wav" along with the EXE, that file will be used. You can set your own sound file globally by setti...MCEBuddy 2.x: MCEBuddy 2.2.15: Changelog for 2.2.15 (32bit and 64bit) 1. Added support for %originalfilepath% to get the source file full path. Used for custom commands only. 2. Added support for better parsing of Media Portal XML files to extract ShowName and Episode Name and download additional details from TVDB (like Season No, Episode No etc). 3. Added support for TVDB seriesID in metadata 4. Added support for eMail non blocking UI testRazor-sharp your skills: CSharp 4.0 Examples: Dynamic word Covariant and contravariant generic type parameters Optional Parameters and Named Arguments Tuples Task Parallel LibraryDECnet 2.0 Router: Second Alpha Release: This second alpha release fixes some bugs and limitations. It has been tested in two DECnet areas and seems to be stable enough for more extensive testing. ThisCrashReporter.NET : Exception reporting library for C# and VB.NET: CrashReporter.NET 1.2: *Added html mail format which shows hierarchical exception report for better understanding.DotNetNuke Search Engine Sitemaps Provider: Version 02.00.00: New release of the Search Engine Sitemap Providers New version - not backwards compatible with 1.x versions New sandboxing to prevent exceptions in module providers interfering with main provider Now installable using the Host->Extensions page New sitemaps available for Active Forums and Ventrian Property Agent Now derived from DotNetNuke Provider base for better framework integration DotNetNuke minimum compatibility raised to DNN 5.2, .NET to 3.5Annoying Manager: 1.0.0.0: Annoying Manager is in beta stage no longer! The main improvement in this release is the task report feature, where users can check their tasks.PDF Viewer Web part: PDF Viewer Web Part: PDF Viewer Web PartIIS Express Manager: IIS Express Manager v 0.5B: Several added features, including adding site and right click menu for sites; which allows you to start/stop site, view it directly in browser etc.Chris on SharePoint Solutions: View Grid Banding - v1.0: Initial release of the View Creation and Management Page Column Selector Banding solution.Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.67: Fix issue #18629 - incorrectly handling null characters in string literals and not throwing an error when outside string literals. update for Issue #18600 - forgot to make the ///#DEBUG= directive also set a known-global for the given debug namespace. removed the kill-switch for disregarding preprocessor define-comments (///#IF and the like) and created a separate CodeSettings.IgnorePreprocessorDefines property for those who really need to turn that off. Some people had been setting -kil...Lakana - WPF Framework: Lakana V2: Lakana V2 contains : - Lakana WPF Forms (with sample project) - Lakana WPF Navigation (with sample project)Microsoft SQL Server Product Samples: Database: OData QueryFeed workflow activity: The OData QueryFeed sample activity shows how to create a workflow activity that consumes an OData resource, and renders entity properties in a Microsoft Excel 2010 worksheet or Microsoft Word 2010 document. Using the sample QueryFeed activity, you can consume any OData resource. The sample activity uses LINQ to project OData metadata into activity designer expression items. By setting activity expressions, a fully qualified OData query string is constructed consisting of Resource, Filter, Or...Arduino for Visual Studio: Arduino 1.x for Visual Studio 2012, 2010 and 2008: Register for the forum for more news and updates Version 1209.15 is beta and resolves a number of issues in Visual Studio 2012 and minor debugger fixes for all vs versions. After you have tested a working installation, if you would like to beta the debug tool then email beta at visualmicro.com. Version 1208.19 (click the downloads tab) is considered stable for visual studio 2010 and 2008. Key Features of 1209.10 Support for Visual Studio 2012 (.NET 4.5) Debug tools beta team can re-e...New ProjectsApertium.NET: This is a Windows 8 library to use Apertium easelyAsyncFtp: AsyncFtp is a library, which enables support for async ftp transactions in .NET Framework.Computer Club System: Computer Club System - designed to manage client machines in a computer club.Dynamic Time Warp for Time Series Analysis: This is a conversion to C# of Stan Salvador, Philip Chan Fast DTW algorithm originally implemented in Java. G.Controls: win8 ????????htmlhelp: ??????????????HtmlMaker: ?html?????if、for、foreach??????????，????c#???，????????html??。 ??????，?????????~~JsonSerializerLite: JsonSerializerLite is a C#.NET library that aims to be a compliant, easy-to-use and lightweight JSON serializer/deserializer. Launchbar: Access all your favorite applications at lightning speed.LP 2012: Calculates football stats and predicts the winners. This is a closed project at the moment. We are not asking for any help.Malibu Project: to be definedPersonal Family Record System: In April, 2012, person family information management project was begun. This is a project for K14T students of Van Lang University. PPCalc: ProPoints/WeightWatchers points calculator for Windows 8.SA Plugins: The code available in this project is open source, the know-how is not, sorry.SapientS School Management System: This is a software to manage a Advanced level classes of a school in a efficient manner.Sonar: Sonar is .NET ORM written in C# 4.0VB.net: Project này là c?a nhóm Tùng và Phu?c cùng làm v? nh?ng ?ng d?ng qu?n lýWebser Web Browser: One of the worlds most basic browsers ever designed. Its nice on the eyes and can get you surfing the web in less than five minutes.

Read the article

Point in polygon OR point on polygon using LINQ

- by wageoghe

As noted in an earlier question, How to Zip enumerable with itself, I am working on some math algorithms based on lists of points. I am currently working on point in polygon. I have the code for how to do that and have found several good references here on SO, such as this link Hit test. So, I can figure out whether or not a point is in a polygon. As part of determining that, I want to determine if the point is actually on the polygon. This I can also do. If I can do all of that, what is my question you might ask? Can I do it efficiently using LINQ? I can already do something like the following (assuming a Pairwise extension method as described in my earlier question as well as in links to which my question/answers links, and assuming a Position type that has X and Y members). I have not tested much, so the lambda might not be 100% correct. Also, it does not take very small differences into account. public static PointInPolygonLocation PointInPolygon(IEnumerable<Position> pts, Position pt) { int numIntersections = pts.Pairwise( (p1, p2) => { if (p1.Y != p2.Y) { if ((p1.Y >= pt.Y && p2.Y < pt.Y) || (p1.Y < pt.Y && p2.Y >= pt.Y)) { if (p1.X < p1.X && p2.X < pt.X) { return 1; } if (p1.X < pt.X || p2.X < pt.X) { if (((pt.Y - p1.Y) * ((p1.X - p2.X) / (p1.Y - p2.Y)) * p1.X) < pt.X) { return 1; } } } } return 0; }).Sum(); if (numIntersections % 2 == 0) { return PointInPolygonLocation.Outside; } else { return PointInPolygonLocation.Inside; } } This function, PointInPolygon, takes the input Position, pt, iterates over the input sequence of position values, and uses the Jordan Curve method to determine how many times a ray extended from pt to the left intersects the polygon. The lambda expression will yield, into the "zipped" list, 1 for every segment that is crossed, and 0 for the rest. The sum of these values determines if pt is inside or outside of the polygon (odd == inside, even == outside). So far, so good. Now, for any consecutive pairs of position values in the sequence (i.e. in any execution of the lambda), we can also determine if pt is ON the segment p1, p2. If that is the case, we can stop the calculation because we have our answer. Ultimately, my question is this: Can I perform this calculation (maybe using Aggregate?) such that we will only iterate over the sequence no more than 1 time AND can we stop the iteration if we encounter a segment that pt is ON? In other words, if pt is ON the very first segment, there is no need to examine the rest of the segments because we have the answer. It might very well be that this operation (particularly the requirement/desire to possibly stop the iteration early) does not really lend itself well to the LINQ approach. It just occurred to me that maybe the lambda expression could yield a tuple, the intersection value (1 or 0 or maybe true or false) and the "on" value (true or false). Maybe then I could use TakeWhile(anontype.PointOnPolygon == false). If I Sum the tuples and if ON == 1, then the point is ON the polygon. Otherwise, the oddness or evenness of the sum of the other part of the tuple tells if the point is inside or outside.

Read the article

Scala: Recursively building all pathes in a graph?

- by DarqMoth

Trying to build all existing paths for an udirected graph defined as a map of edges using the following algorithm: Start: with a given vertice A Find an edge (X.A, X.B) or (X.B, X.A), add this edge to path Find all edges Ys fpr which either (Y.C, Y.B) or (Y.B, Y.C) is true For each Ys: A=B, goto Start Providing edges are defined as the following map, where keys are tuples consisting of two vertices: val edges = Map( ("n1", "n2") -> "n1n2", ("n1", "n3") -> "n1n3", ("n3", "n4") -> "n3n4", ("n5", "n1") -> "n5n1", ("n5", "n4") -> "n5n4") As an output I need to get a list of ALL pathes where each path is a list of adjecent edges like this: val allPaths = List( List(("n1", "n2") -> "n1n2"), List(("n1", "n3") -> "n1n3", ("n3", "n4") -> "n3n4"), List(("n5", "n1") -> "n5n1"), List(("n5", "n4") -> "n5n4"), List(("n2", "n1") -> "n1n2", ("n1", "n3") -> "n1n3", ("n3", "n4") -> "n3n4", ("n5", "n4") -> "n5n4")) //... //... more pathes to go } Note: Edge XY = (x,y) - "xy" and YX = (y,x) - "yx" exist as one instance only, either as XY or YX So far I have managed to implement code that duplicates edges in the path, which is wrong and I can not find the error: object Graph2 { type Vertice = String type Edge = ((String, String), String) type Path = List[((String, String), String)] val edges = Map( //(("v1", "v2") , "v1v2"), (("v1", "v3") , "v1v3"), (("v3", "v4") , "v3v4") //(("v5", "v1") , "v5v1"), //(("v5", "v4") , "v5v4") ) def main(args: Array[String]): Unit = { val processedVerticies: Map[Vertice, Vertice] = Map() val processedEdges: Map[(Vertice, Vertice), (Vertice, Vertice)] = Map() val path: Path = List() println(buildPath(path, "v1", processedVerticies, processedEdges)) } /** * Builds path from connected by edges vertices starting from given vertice * Input: map of edges * Output: list of connected edges like: List(("n1", "n2") -> "n1n2"), List(("n1", "n3") -> "n1n3", ("n3", "n4") -> "n3n4"), List(("n5", "n1") -> "n5n1"), List(("n5", "n4") -> "n5n4"), List(("n2", "n1") -> "n1n2", ("n1", "n3") -> "n1n3", ("n3", "n4") -> "n3n4", ("n5", "n4") -> "n5n4")) */ def buildPath(path: Path, vertice: Vertice, processedVerticies: Map[Vertice, Vertice], processedEdges: Map[(Vertice, Vertice), (Vertice, Vertice)]): List[Path] = { println("V: " + vertice + " VM: " + processedVerticies + " EM: " + processedEdges) if (!processedVerticies.contains(vertice)) { val edges = children(vertice) println("Edges: " + edges) val x = edges.map(edge => { if (!processedEdges.contains(edge._1)) { addToPath(vertice, processedVerticies.++(Map(vertice -> vertice)), processedEdges, path, edge) } else { println("ALready have edge: "+edge+" Return path:"+path) path } }) val y = x.toList y } else { List(path) } } def addToPath( vertice: Vertice, processedVerticies: Map[Vertice, Vertice], processedEdges: Map[(Vertice, Vertice), (Vertice, Vertice)], path: Path, edge: Edge): Path = { val newPath: Path = path ::: List(edge) val key = edge._1 val nextVertice = neighbor(vertice, key) val x = buildPath (newPath, nextVertice, processedVerticies, processedEdges ++ (Map((vertice, nextVertice) -> (vertice, nextVertice))) ).flatten // need define buidPath type x } def children(vertice: Vertice) = { edges.filter(p => (p._1)._1 == vertice || (p._1)._2 == vertice) } def containsPair(x: (Vertice, Vertice), m: Map[(Vertice, Vertice), (Vertice, Vertice)]): Boolean = { m.contains((x._1, x._2)) || m.contains((x._2, x._1)) } def neighbor(vertice: String, key: (String, String)): String = key match { case (`vertice`, x) => x case (x, `vertice`) => x } } Running this results in: List(List(((v1,v3),v1v3), ((v1,v3),v1v3), ((v3,v4),v3v4))) Why is that?

Read the article

Data Warehouse ETL slow - change primary key in dimension?

- by Jubbles

I have a working MySQL data warehouse that is organized as a star schema and I am using Talend Open Studio for Data Integration 5.1 to create the ETL process. I would like this process to run once per day. I have estimated that one of the dimension tables (dimUser) will have approximately 2 million records and 23 columns. I created a small test ETL process in Talend that worked, but given the amount of data that may need to be updated daily, the current performance will not cut it. It takes the ETL process four minutes to UPDATE or INSERT 100 records to dimUser. If I assumed a linear relationship between the count of records and the amount of time to UPDATE or INSERT, then there is no way the ETL can finish in 3-4 hours (my hope), let alone one day. Since I'm unfamiliar with Java, I wrote the ETL as a Python script and ran into the same problem. Although, I did discover that if I did only INSERT, the process went much faster. I am pretty sure that the bottleneck is caused by the UPDATE statements. The primary key in dimUser is an auto-increment integer. My friend suggested that I scrap this primary key and replace it with a multi-field primary key (in my case, 2-3 fields). Before I rip the test data out of my warehouse and change the schema, can anyone provide suggestions or guidelines related to the design of the data warehouse the ETL process how realistic it is to have an ETL process INSERT or UPDATE a few million records each day will my friend's suggestion significantly help If you need any further information, just let me know and I'll post it. UPDATE - additional information: mysql> describe dimUser; Field Type Null Key Default Extra user_key int(10) unsigned NO PRI NULL auto_increment id_A int(10) unsigned NO NULL id_B int(10) unsigned NO NULL field_4 tinyint(4) unsigned NO 0 field_5 varchar(50) YES NULL city varchar(50) YES NULL state varchar(2) YES NULL country varchar(50) YES NULL zip_code varchar(10) NO 99999 field_10 tinyint(1) NO 0 field_11 tinyint(1) NO 0 field_12 tinyint(1) NO 0 field_13 tinyint(1) NO 1 field_14 tinyint(1) NO 0 field_15 tinyint(1) NO 0 field_16 tinyint(1) NO 0 field_17 tinyint(1) NO 1 field_18 tinyint(1) NO 0 field_19 tinyint(1) NO 0 field_20 tinyint(1) NO 0 create_date datetime NO 2012-01-01 00:00:00 last_update datetime NO 2012-01-01 00:00:00 run_id int(10) unsigned NO 999 I used a surrogate key because I had read that it was good practice. Since, from a business perspective, I want to keep aware of potential fraudulent activity (say for 200 days a user is associated with state X and then the next day they are associated with state Y - they could have moved or their account could have been compromised), so that is why geographic data is kept. The field id_B may have a few distinct values of id_A associated with it, but I am interested in knowing distinct (id_A, id_B) tuples. In the context of this information, my friend suggested that something like (id_A, id_B, zip_code) be the primary key. For the large majority of daily ETL processes (80%), I only expect the following fields to be updated for existing records: field_10 - field_14, last_update, and run_id (this field is a foreign key to my etlLog table and is used for ETL auditing purposes).

Read the article

Flow-Design Cheat Sheet – Part I, Notation

- by Ralf Westphal

You want to avoid the pitfalls of object oriented design? Then this is the right place to start. Use Flow-Oriented Analysis (FOA) and –Design (FOD or just FD for Flow-Design) to understand a problem domain and design a software solution. Flow-Orientation as described here is related to Flow-Based Programming, Event-Based Programming, Business Process Modelling, and even Event-Driven Architectures. But even though “thinking in flows” is not new, I found it helpful to deviate from those precursors for several reasons. Some aim at too big systems for the average programmer, some are concerned with only asynchronous processing, some are even not very much concerned with programming at all. What I was looking for was a design method to help in software projects of any size, be they large or tiny, involing synchronous or asynchronous processing, being local or distributed, running on the web or on the desktop or on a smartphone. That´s why I took ideas from all of the above sources and some additional and came up with Event-Based Components which later got repositioned and renamed to Flow-Design. In the meantime this has generated some discussion (in the German developer community) and several teams have started to work with Flow-Design. Also I´ve conducted quite some trainings using Flow-Orientation for design. The results are very promising. Developers find it much easier to design software using Flow-Orientation than OOAD-based object orientation. Since Flow-Orientation is moving fast and is not covered completely by a single source like a book, demand has increased for at least an overview of the current state of its notation. This page is trying to answer this demand by briefly introducing/describing every notational element as well as their translation into C# source code. Take this as a cheat sheet to put next to your whiteboard when designing software. However, please do not expect any explanation as to the reasons behind Flow-Design elements. Details on why Flow-Design at all and why in this specific way you´ll find in the literature covering the topic. Here´s a resource page on Flow-Design/Event-Based Components, if you´re able to read German. Notation Connected Functional Units The basic element of any FOD are functional units (FU): Think of FUs as some kind of software code block processing data. For the moment forget about classes, methods, “components”, assemblies or whatever. See a FU as an abstract piece of code. Software then consists of just collaborating FUs. I´m using circles/ellipses to draw FUs. But if you like, use rectangles. Whatever suites your whiteboard needs best. The purpose of FUs is to process input and produce output. FUs are transformational. However, FUs are not called and do not call other FUs. There is no dependency between FUs. Data just flows into a FU (input) and out of it (output). From where and where to is of no concern to a FU. This way FUs can be concatenated in arbitrary ways: Each FU can accept input from many sources and produce output for many sinks: Flows Connected FUs form a flow with a start and an end. Data is entering a flow at a source, and it´s leaving it through a sink. Think of sources and sinks as special FUs which conntect wires to the environment of a network of FUs. Wiring Details Data is flowing into/out of FUs through wires. This is to allude to electrical engineering which since long has been working with composable parts. Wires are attached to FUs usings pins. They are the entry/exit points for the data flowing along the wires. Input-/output pins currently need not be drawn explicitly. This is to keep designing on a whiteboard simple and quick. Data flowing is of some type, so wires have a type attached to them. And pins have names. If there is only one input pin and output pin on a FU, though, you don´t need to mention them. The default is Process for a single input pin, and Result for a single output pin. But you´re free to give even single pins different names. There is a shortcut in use to address a certain pin on a destination FU: The type of the wire is put in parantheses for two reasons. 1. This way a “no-type” wire can be easily denoted, 2. this is a natural way to describe tuples of data. To describe how much data is flowing, a star can be put next to the wire type: Nesting – Boards and Parts If more than 5 to 10 FUs need to be put in a flow a FD starts to become hard to understand. To keep diagrams clutter free they can be nested. You can turn any FU into a flow: This leads to Flow-Designs with different levels of abstraction. A in the above illustration is a high level functional unit, A.1 and A.2 are lower level functional units. One of the purposes of Flow-Design is to be able to describe systems on different levels of abstraction and thus make it easier to understand them. Humans use abstraction/decomposition to get a grip on complexity. Flow-Design strives to support this and make levels of abstraction first class citizens for programming. You can read the above illustration like this: Functional units A.1 and A.2 detail what A is supposed to do. The whole of A´s responsibility is decomposed into smaller responsibilities A.1 and A.2. FU A thus does not do anything itself anymore! All A is responsible for is actually accomplished by the collaboration between A.1 and A.2. Since A now is not doing anything anymore except containing A.1 and A.2 functional units are devided into two categories: boards and parts. Boards are just containing other functional units; their sole responsibility is to wire them up. A is a board. Boards thus depend on the functional units nested within them. This dependency is not of a functional nature, though. Boards are not dependent on services provided by nested functional units. They are just concerned with their interface to be able to plug them together. Parts are the workhorses of flows. They contain the real domain logic. They actually transform input into output. However, they do not depend on other functional units. Please note the usage of source and sink in boards. They correspond to input-pins and output-pins of the board. Implicit Dependencies Nesting functional units leads to a dependency tree. Boards depend on nested functional units, they are the inner nodes of the tree. Parts are independent, they are the leafs: Even though dependencies are the bane of software development, Flow-Design does not usually draw these dependencies. They are implicitly created by visually nesting functional units. And they are harmless. Boards are so simple in their functionality, they are little affected by changes in functional units they are depending on. But functional units are implicitly dependent on more than nested functional units. They are also dependent on the data types of the wires attached to them: This is also natural and thus does not need to be made explicit. And it pertains mainly to parts being dependent. Since boards don´t do anything with regard to a problem domain, they don´t care much about data types. Their infrastructural purpose just needs types of input/output-pins to match. Explicit Dependencies You could say, Flow-Orientation is about tackling complexity at its root cause: that´s dependencies. “Natural” dependencies are depicted naturally, i.e. implicitly. And whereever possible dependencies are not even created. Functional units don´t know their collaborators within a flow. This is core to Flow-Orientation. That makes for high composability of functional units. A part is as independent of other functional units as a motor is from the rest of the car. And a board is as dependend on nested functional units as a motor is on a spark plug or a crank shaft. With Flow-Design software development moves closer to how hardware is constructed. Implicit dependencies are not enough, though. Sometimes explicit dependencies make designs easier – as counterintuitive this might sound. So FD notation needs a ways to denote explicit dependencies: Data flows along wires. But data does not flow along dependency relations. Instead dependency relations represent service calls. Functional unit C is depending on/calling services on functional unit S. If you want to be more specific, name the services next to the dependency relation: Although you should try to stay clear of explicit dependencies, they are fundamentally ok. See them as a way to add another dimension to a flow. Usually the functionality of the independent FU (“Customer repository” above) is orthogonal to the domain of the flow it is referenced by. If you like emphasize this by using different shapes for dependent and independent FUs like above. Such dependencies can be used to link in resources like databases or shared in-memory state. FUs can not only produce output but also can have side effects. A common pattern for using such explizit dependencies is to hook a GUI into a flow as the source and/or the sink of data: Which can be shortened to: Treat FUs others depend on as boards (with a special non-FD API the dependent part is connected to), but do not embed them in a flow in the diagram they are depended upon. Attributes of Functional Units Creation and usage of functional units can be modified with attributes. So far the following have shown to be helpful: Singleton: FUs are by default multitons. FUs in the same of different flows with the same name refer to the same functionality, but to different instances. Think of functional units as objects that get instanciated anew whereever they appear in a design. Sometimes though it´s helpful to reuse the same instance of a functional unit; this is always due to valuable state it holds. Signify this by annotating the FU with a “(S)”. Multiton: FUs on which others depend are singletons by default. This is, because they usually are introduced where shared state comes into play. If you want to change them to be a singletons mark them with a “(M)”. Configurable: Some parts need to be configured before the can do they work in a flow. Annotate them with a “(C)” to have them initialized before any data items to be processed by them arrive. Do not assume any order in which FUs are configured. How such configuration is happening is an implementation detail. Entry point: In each design there needs to be a single part where “it all starts”. That´s the entry point for all processing. It´s like Program.Main() in C# programs. Mark the entry point part with an “(E)”. Quite often this will be the GUI part. How the entry point is started is an implementation detail. Just consider it the first FU to start do its job. Patterns / Standard Parts If more than a single wire is attached to an output-pin that´s called a split (or fork). The same data is flowing on all of the wires. Remember: Flow-Designs are synchronous by default. So a split does not mean data is processed in parallel afterwards. Processing still happens synchronously and thus one branch after another. Do not assume any specific order of the processing on the different branches after the split. It is common to do a split and let only parts of the original data flow on through the branches. This effectively means a map is needed after a split. This map can be implicit or explicit. Although FUs can have multiple input-pins it is preferrable in most cases to combine input data from different branches using an explicit join: The default output of a join is a tuple of its input values. The default behavior of a join is to output a value whenever a new input is received. However, to produce its first output a join needs an input for all its input-pins. Other join behaviors can be: reset all inputs after an output only produce output if data arrives on certain input-pins

Read the article

google link for the RGBA library?

- by Navruk

I want google link for the RGBA library <script type='text/javascript' src='jquery.color-RGBa-patch.js'></script> This file contains /* * jQuery Color Animations */ (function(jQuery){ // We override the animation for all of these color styles jQuery.each(['backgroundColor', 'borderBottomColor', 'borderLeftColor', 'borderRightColor', 'borderTopColor', 'color', 'outlineColor'], function(i,attr){ jQuery.fx.step[attr] = function(fx){ if ( fx.colorFunction == undefined || fx.state == 0 ) { fx.start = getColor( fx.elem, attr ); fx.end = getRGB( fx.end ); if ( fx.start == undefined ) { fx.start = [ 255,255,255,0 ]; } else { if ( fx.start[3] == undefined ) // if alpha channel is not spotted fx.start[3] = 1; // assume it is fully opaque if ( fx.start[3] == 0 ) // if alpha is present and fully transparent fx.start[0] = fx.start[1] = fx.start[2] = 255; // assume starting with white } if ( fx.end[3] == undefined ) // if alpha channel is not spotted fx.end[3] = 1; // assume it is fully opaque fx.colorFunction = ( fx.start[3] == 1 && fx.end[3] == 1 ? calcRGB : calcRGBa ); } fx.elem.style[attr] = fx.colorFunction(); } }); var calcRGB = function() { return 'rgb(' + Math.max(Math.min( parseInt((this.pos * (this.end[0] - this.start[0])) + this.start[0]), 255), 0) + ',' + Math.max(Math.min( parseInt((this.pos * (this.end[1] - this.start[1])) + this.start[1]), 255), 0) + ',' + Math.max(Math.min( parseInt((this.pos * (this.end[2] - this.start[2])) + this.start[2]), 255), 0) + ')'; }; var calcRGBa = function() { return 'rgba(' + Math.max(Math.min( parseInt((this.pos * (this.end[0] - this.start[0])) + this.start[0]), 255), 0) + ',' + Math.max(Math.min( parseInt((this.pos * (this.end[1] - this.start[1])) + this.start[1]), 255), 0) + ',' + Math.max(Math.min( parseInt((this.pos * (this.end[2] - this.start[2])) + this.start[2]), 255), 0) + ',' + Math.max(Math.min( parseFloat((this.pos * (this.end[3] - this.start[3])) + this.start[3]), 1), 0) + ')'; }; // Color Conversion functions from highlightFade // By Blair Mitchelmore // http://jquery.offput.ca/highlightFade/ // Parse strings looking for color tuples [255,255,255] function getRGB(color) { var result; // Check if we're already dealing with an array of colors if ( color && color.constructor == Array && color.length >= 3 ) return color; // Look for rgb(num,num,num) if (result = /rgba?$\s*([0-9]{1,3})\s*,\s*([0-9]{1,3})\s*,\s*([0-9]{1,3})\s*,?\s*((?:[0-9](?:\.[0-9]+)?)?)\s*$/.exec(color)) return [ parseInt(result[1]), parseInt(result[2]), parseInt(result[3]), parseFloat(result[4]||1) ]; // Look for rgb(num%,num%,num%) if (result = /rgba?$\s*([0-9]+(?:\.[0-9]+)?)\%\s*,\s*([0-9]+(?:\.[0-9]+)?)\%\s*,\s*([0-9]+(?:\.[0-9]+)?)\%\s*,?\s*((?:[0-9](?:\.[0-9]+)?)?)\s*$/.exec(color)) return [parseFloat(result[1])*2.55, parseFloat(result[2])*2.55, parseFloat(result[3])*2.55, parseFloat(result[4]||1)]; // Look for #a0b1c2 if (result = /#([a-fA-F0-9]{2})([a-fA-F0-9]{2})([a-fA-F0-9]{2})/.exec(color)) return [parseInt(result[1],16), parseInt(result[2],16), parseInt(result[3],16)]; // Look for #fff if (result = /#([a-fA-F0-9])([a-fA-F0-9])([a-fA-F0-9])/.exec(color)) return [parseInt(result[1]+result[1],16), parseInt(result[2]+result[2],16), parseInt(result[3]+result[3],16)]; // Otherwise, we're most likely dealing with a named color var colorName = jQuery.trim(color).toLowerCase(); if ( colors[colorName] != undefined ) return colors[colorName]; return [ 255, 255, 255, 0 ]; } function getColor(elem, attr) { var color; do { color = jQuery.curCSS(elem, attr); // Keep going until we find an element that has color, or we hit the body if ( color != '' && color != 'transparent' || jQuery.nodeName(elem, "body") ) break; attr = "backgroundColor"; } while ( elem = elem.parentNode ); return getRGB(color); }; // Some named colors to work with // From Interface by Stefan Petre // http://interface.eyecon.ro/ var colors = { aqua:[0,255,255], azure:[240,255,255], beige:[245,245,220], black:[0,0,0], blue:[0,0,255], brown:[165,42,42], cyan:[0,255,255], darkblue:[0,0,139], darkcyan:[0,139,139], darkgrey:[169,169,169], darkgreen:[0,100,0], darkkhaki:[189,183,107], darkmagenta:[139,0,139], darkolivegreen:[85,107,47], darkorange:[255,140,0], darkorchid:[153,50,204], darkred:[139,0,0], darksalmon:[233,150,122], darkviolet:[148,0,211], fuchsia:[255,0,255], gold:[255,215,0], green:[0,128,0], indigo:[75,0,130], khaki:[240,230,140], lightblue:[173,216,230], lightcyan:[224,255,255], lightgreen:[144,238,144], lightgrey:[211,211,211], lightpink:[255,182,193], lightyellow:[255,255,224], lime:[0,255,0], magenta:[255,0,255], maroon:[128,0,0], navy:[0,0,128], olive:[128,128,0], orange:[255,165,0], pink:[255,192,203], purple:[128,0,128], violet:[128,0,128], red:[255,0,0], silver:[192,192,192], white:[255,255,255], yellow:[255,255,0] }; })(jQuery);

Read the article

Python - pickling fails for numpy.void objects

- by I82Much

>>> idmapfile = open("idmap", mode="w") >>> pickle.dump(idMap, idmapfile) >>> idmapfile.close() >>> idmapfile = open("idmap") >>> unpickled = pickle.load(idmapfile) >>> unpickled == idMap False idMap[1] {1537: (552, 1, 1537, 17.793827056884766, 3), 1540: (4220, 1, 1540, 19.31205940246582, 3), 1544: (592, 1, 1544, 18.129131317138672, 3), 1675: (529, 1, 1675, 18.347782135009766, 3), 1550: (4048, 1, 1550, 19.31205940246582, 3), 1424: (1528, 1, 1424, 19.744396209716797, 3), 1681: (1265, 1, 1681, 19.596025466918945, 3), 1560: (3457, 1, 1560, 20.530569076538086, 3), 1690: (477, 1, 1690, 17.395542144775391, 3), 1691: (554, 1, 1691, 13.446117401123047, 3), 1436: (3010, 1, 1436, 19.596025466918945, 3), 1434: (3183, 1, 1434, 19.744396209716797, 3), 1441: (3570, 1, 1441, 20.589576721191406, 3), 1435: (476, 1, 1435, 19.640911102294922, 3), 1444: (527, 1, 1444, 17.98480224609375, 3), 1478: (1897, 1, 1478, 19.596025466918945, 3), 1575: (614, 1, 1575, 19.371648788452148, 3), 1586: (2189, 1, 1586, 19.31205940246582, 3), 1716: (3470, 1, 1716, 19.158674240112305, 3), 1590: (2278, 1, 1590, 19.596025466918945, 3), 1463: (991, 1, 1463, 19.31205940246582, 3), 1594: (1890, 1, 1594, 19.596025466918945, 3), 1467: (1087, 1, 1467, 19.31205940246582, 3), 1596: (3759, 1, 1596, 19.744396209716797, 3), 1602: (3011, 1, 1602, 20.530569076538086, 3), 1547: (490, 1, 1547, 17.994071960449219, 3), 1605: (658, 1, 1605, 19.31205940246582, 3), 1606: (1794, 1, 1606, 16.964881896972656, 3), 1719: (1826, 1, 1719, 19.596025466918945, 3), 1617: (583, 1, 1617, 11.894925117492676, 3), 1492: (3441, 1, 1492, 20.500667572021484, 3), 1622: (3215, 1, 1622, 19.31205940246582, 3), 1628: (2761, 1, 1628, 19.744396209716797, 3), 1502: (1563, 1, 1502, 19.596025466918945, 3), 1632: (1108, 1, 1632, 15.457141876220703, 3), 1468: (3779, 1, 1468, 19.596025466918945, 3), 1642: (3970, 1, 1642, 19.744396209716797, 3), 1518: (612, 1, 1518, 18.570245742797852, 3), 1647: (854, 1, 1647, 16.964881896972656, 3), 1650: (2099, 1, 1650, 20.439058303833008, 3), 1651: (540, 1, 1651, 18.552841186523438, 3), 1653: (613, 1, 1653, 19.237197875976563, 3), 1532: (537, 1, 1532, 18.885730743408203, 3)} >>> unpickled[1] {1537: (64880, 1638, 56700, -1.0808743559293829e+18, 152), 1540: (64904, 1638, 0, 0.0, 0), 1544: (54472, 1490, 0, 0.0, 0), 1675: (6464, 1509, 0, 0.0, 0), 1550: (43592, 1510, 0, 0.0, 0), 1424: (43616, 1510, 0, 0.0, 0), 1681: (0, 0, 0, 0.0, 0), 1560: (400, 152, 400, 2.1299736657737219e-43, 0), 1690: (408, 152, 408, 2.7201111331839077e+26, 34), 1435: (424, 152, 61512, 1.0122952080313192e-39, 0), 1436: (400, 152, 400, 20.250289916992188, 3), 1434: (424, 152, 62080, 1.0122952080313192e-39, 0), 1441: (400, 152, 400, 12.250144958496094, 3), 1691: (424, 152, 42608, 15.813941955566406, 3), 1444: (400, 152, 400, 19.625289916992187, 3), 1606: (424, 152, 42432, 5.2947192852601414e-22, 41), 1575: (400, 152, 400, 6.2537390010262572e-36, 0), 1586: (424, 152, 42488, 1.0122601755697111e-39, 0), 1716: (400, 152, 400, 6.2537390010262572e-36, 0), 1590: (424, 152, 64144, 1.0126357235581501e-39, 0), 1463: (400, 152, 400, 6.2537390010262572e-36, 0), 1594: (424, 152, 32672, 17.002994537353516, 3), 1467: (400, 152, 400, 19.750289916992187, 3), 1596: (424, 152, 7176, 1.0124003054161436e-39, 0), 1602: (400, 152, 400, 18.500289916992188, 3), 1547: (424, 152, 7000, 1.0124003054161436e-39, 0), 1605: (400, 152, 400, 20.500289916992188, 3), 1478: (424, 152, 42256, -6.0222748507426518e+30, 222), 1719: (400, 152, 400, 6.2537390010262572e-36, 0), 1617: (424, 152, 16472, 1.0124283313854301e-39, 0), 1492: (400, 152, 400, 6.2537390010262572e-36, 0), 1622: (424, 152, 35304, 1.0123190301052127e-39, 0), 1628: (400, 152, 400, 6.2537390010262572e-36, 0), 1502: (424, 152, 63152, 19.627988815307617, 3), 1632: (400, 152, 400, 19.375289916992188, 3), 1468: (424, 152, 38088, 1.0124213248931084e-39, 0), 1642: (400, 152, 400, 6.2537390010262572e-36, 0), 1518: (424, 152, 63896, 1.0127436235399031e-39, 0), 1647: (400, 152, 400, 6.2537390010262572e-36, 0), 1650: (424, 152, 53424, 16.752857208251953, 3), 1651: (400, 152, 400, 19.250289916992188, 3), 1653: (424, 152, 50624, 1.0126497365427934e-39, 0), 1532: (400, 152, 400, 6.2537390010262572e-36, 0)} The keys come out fine, the values are screwed up. I tried same thing loading file in binary mode; didn't fix the problem. Any idea what I'm doing wrong? Edit: Here's the code with binary. Note that the values are different in the unpickled object. >>> idmapfile = open("idmap", mode="wb") >>> pickle.dump(idMap, idmapfile) >>> idmapfile.close() >>> idmapfile = open("idmap", mode="rb") >>> unpickled = pickle.load(idmapfile) >>> unpickled==idMap False >>> unpickled[1] {1537: (12176, 2281, 56700, -1.0808743559293829e+18, 152), 1540: (0, 0, 15934, 2.7457842047810522e+26, 108), 1544: (400, 152, 400, 4.9518498821046956e+27, 53), 1675: (408, 152, 408, 2.7201111331839077e+26, 34), 1550: (456, 152, 456, -1.1349175514578289e+18, 152), 1424: (432, 152, 432, 4.5939047815653343e-40, 11), 1681: (408, 152, 408, 2.1299736657737219e-43, 0), 1560: (376, 152, 376, 2.1299736657737219e-43, 0), 1690: (376, 152, 376, 2.1299736657737219e-43, 0), 1435: (376, 152, 376, 2.1299736657737219e-43, 0), 1436: (376, 152, 376, 2.1299736657737219e-43, 0), 1434: (376, 152, 376, 2.1299736657737219e-43, 0), 1441: (376, 152, 376, 2.1299736657737219e-43, 0), 1691: (376, 152, 376, 2.1299736657737219e-43, 0), 1444: (376, 152, 376, 2.1299736657737219e-43, 0), 1606: (25784, 2281, 376, -3.2883343074537754e+26, 34), 1575: (24240, 2281, 376, 2.1299736657737219e-43, 0), 1586: (24240, 2281, 376, 2.1299736657737219e-43, 0), 1716: (24240, 2281, 376, -3.0093091599657311e-35, 26), 1590: (24240, 2281, 376, 2.1299736657737219e-43, 0), 1463: (24240, 2281, 376, 2.1299736657737219e-43, 0), 1594: (24240, 2281, 376, -4123208450048.0, 196), 1467: (25784, 2281, 376, 2.1299736657737219e-43, 0), 1596: (25784, 2281, 376, 2.1299736657737219e-43, 0), 1602: (25784, 2281, 376, -5.9963281433905448e+26, 76), 1547: (25784, 2281, 376, -218106240.0, 139), 1605: (25784, 2281, 376, -3.7138649803377281e+27, 56), 1478: (376, 152, 376, 2.1299736657737219e-43, 0), 1719: (25784, 2281, 376, 2.1299736657737219e-43, 0), 1617: (25784, 2281, 376, -1.4411779941597184e+17, 237), 1492: (25784, 2281, 376, 2.8596493694487798e-30, 80), 1622: (25784, 2281, 376, 184686084096.0, 93), 1628: (1336, 152, 1336, 3.1691839245470052e+29, 179), 1502: (1272, 152, 1272, -5.2042207205116645e-17, 99), 1632: (1208, 152, 1208, 2.1299736657737219e-43, 0), 1468: (1144, 152, 1144, 2.1299736657737219e-43, 0), 1642: (1080, 152, 1080, 2.1299736657737219e-43, 0), 1518: (1016, 152, 1016, 4.0240902787680023e+35, 145), 1647: (952, 152, 952, -985172619034624.0, 237), 1650: (888, 152, 888, 12094787289088.0, 66), 1651: (824, 152, 824, 2.1299736657737219e-43, 0), 1653: (760, 152, 760, 0.00018310768064111471, 238), 1532: (696, 152, 696, 8.8978061885676389e+26, 125)} OK I've isolated the problem, but don't know why it's so. First, apparently what I'm pickling are not tuples (though they look like it), but instead numpy.void types. Here is a series to illustrate the problem. first = run0.detections[0] >>> first (1, 19, 1578, 82.637763977050781, 1) >>> type(first) <type 'numpy.void'> >>> firstTuple = tuple(first) >>> theFile = open("pickleTest", "w") >>> pickle.dump(first, theFile) >>> theTupleFile = open("pickleTupleTest", "w") >>> pickle.dump(firstTuple, theTupleFile) >>> theFile.close() >>> theTupleFile.close() >>> first (1, 19, 1578, 82.637763977050781, 1) >>> firstTuple (1, 19, 1578, 82.637764, 1) >>> theFile = open("pickleTest", "r") >>> theTupleFile = open("pickleTupleTest", "r") >>> unpickledTuple = pickle.load(theTupleFile) >>> unpickledVoid = pickle.load(theFile) >>> type(unpickledVoid) <type 'numpy.void'> >>> type(unpickledTuple) <type 'tuple'> >>> unpickledTuple (1, 19, 1578, 82.637764, 1) >>> unpickledTuple == firstTuple True >>> unpickledVoid == first False >>> unpickledVoid (7936, 1705, 56700, -1.0808743559293829e+18, 152) >>> first (1, 19, 1578, 82.637763977050781, 1)

Read the article

Mulitple full joins in Postgres is slow

- by blast83

I have a program to use the IMDB database and am having very slow performance on my query. It appears that it doesn't use my where condition until after it materializes everything. I looked around for hints to use but nothing seems to work. Here is my query: SELECT * FROM name as n1 FULL JOIN aka_name ON n1.id = aka_name.person_id FULL JOIN cast_info as t2 ON n1.id = t2.person_id FULL JOIN person_info as t3 ON n1.id = t3.person_id FULL JOIN char_name as t4 ON t2.person_role_id = t4.id FULL JOIN role_type as t5 ON t2.role_id = t5.id FULL JOIN title as t6 ON t2.movie_id = t6.id FULL JOIN aka_title as t7 ON t6.id = t7.movie_id FULL JOIN complete_cast as t8 ON t6.id = t8.movie_id FULL JOIN kind_type as t9 ON t6.kind_id = t9.id FULL JOIN movie_companies as t10 ON t6.id = t10.movie_id FULL JOIN movie_info as t11 ON t6.id = t11.movie_id FULL JOIN movie_info_idx as t19 ON t6.id = t19.movie_id FULL JOIN movie_keyword as t12 ON t6.id = t12.movie_id FULL JOIN movie_link as t13 ON t6.id = t13.linked_movie_id FULL JOIN link_type as t14 ON t13.link_type_id = t14.id FULL JOIN keyword as t15 ON t12.keyword_id = t15.id FULL JOIN company_name as t16 ON t10.company_id = t16.id FULL JOIN company_type as t17 ON t10.company_type_id = t17.id FULL JOIN comp_cast_type as t18 ON t8.status_id = t18.id WHERE n1.id = 2003 Very table is related to each other on the join via foreign-key constraints and have indexes for all the mentioned columns. The query plan details: "Hash Left Join (cost=5838187.01..13756845.07 rows=15579622 width=835) (actual time=146879.213..146891.861 rows=20 loops=1)" " Hash Cond: (t8.status_id = t18.id)" " -> Hash Left Join (cost=5838185.92..13542624.18 rows=15579622 width=822) (actual time=146879.199..146891.833 rows=20 loops=1)" " Hash Cond: (t10.company_type_id = t17.id)" " -> Hash Left Join (cost=5838184.83..13328403.29 rows=15579622 width=797) (actual time=146879.165..146891.781 rows=20 loops=1)" " Hash Cond: (t10.company_id = t16.id)" " -> Hash Left Join (cost=5828372.95..10061752.03 rows=15579622 width=755) (actual time=146426.483..146429.756 rows=20 loops=1)" " Hash Cond: (t12.keyword_id = t15.id)" " -> Hash Left Join (cost=5825164.23..6914088.45 rows=15579622 width=731) (actual time=146372.411..146372.529 rows=20 loops=1)" " Hash Cond: (t13.link_type_id = t14.id)" " -> Merge Left Join (cost=5825162.82..6699867.24 rows=15579622 width=715) (actual time=146372.366..146372.472 rows=20 loops=1)" " Merge Cond: (t6.id = t13.linked_movie_id)" " -> Merge Left Join (cost=5684009.29..6378956.77 rows=15579622 width=699) (actual time=144019.620..144019.711 rows=20 loops=1)" " Merge Cond: (t6.id = t12.movie_id)" " -> Merge Left Join (cost=5182403.90..5622400.75 rows=8502523 width=687) (actual time=136849.731..136849.809 rows=20 loops=1)" " Merge Cond: (t6.id = t19.movie_id)" " -> Merge Left Join (cost=4974472.00..5315778.48 rows=8502523 width=637) (actual time=134972.032..134972.099 rows=20 loops=1)" " Merge Cond: (t6.id = t11.movie_id)" " -> Merge Left Join (cost=1830064.81..2033131.89 rows=1341632 width=561) (actual time=63784.035..63784.062 rows=2 loops=1)" " Merge Cond: (t6.id = t10.movie_id)" " -> Nested Loop Left Join (cost=1417360.29..1594294.02 rows=1044480 width=521) (actual time=59279.246..59279.264 rows=1 loops=1)" " Join Filter: (t6.kind_id = t9.id)" " -> Merge Left Join (cost=1417359.22..1429787.34 rows=1044480 width=507) (actual time=59279.222..59279.224 rows=1 loops=1)" " Merge Cond: (t6.id = t8.movie_id)" " -> Merge Left Join (cost=1405731.84..1414378.65 rows=1044480 width=491) (actual time=59121.773..59121.775 rows=1 loops=1)" " Merge Cond: (t6.id = t7.movie_id)" " -> Sort (cost=1346206.04..1348817.24 rows=1044480 width=416) (actual time=58095.230..58095.231 rows=1 loops=1)" " Sort Key: t6.id" " Sort Method: quicksort Memory: 17kB" " -> Hash Left Join (cost=172406.29..456387.53 rows=1044480 width=416) (actual time=57969.371..58095.208 rows=1 loops=1)" " Hash Cond: (t2.movie_id = t6.id)" " -> Hash Left Join (cost=104700.38..256885.82 rows=1044480 width=358) (actual time=49981.493..50006.303 rows=1 loops=1)" " Hash Cond: (t2.role_id = t5.id)" " -> Hash Left Join (cost=104699.11..242522.95 rows=1044480 width=343) (actual time=49981.441..50006.250 rows=1 loops=1)" " Hash Cond: (t2.person_role_id = t4.id)" " -> Hash Left Join (cost=464.96..12283.95 rows=1044480 width=269) (actual time=0.071..0.087 rows=1 loops=1)" " Hash Cond: (n1.id = t3.person_id)" " -> Nested Loop Left Join (cost=0.00..49.39 rows=7680 width=160) (actual time=0.051..0.066 rows=1 loops=1)" " -> Nested Loop Left Join (cost=0.00..17.04 rows=3 width=119) (actual time=0.038..0.041 rows=1 loops=1)" " -> Index Scan using name_pkey on name n1 (cost=0.00..8.68 rows=1 width=39) (actual time=0.022..0.024 rows=1 loops=1)" " Index Cond: (id = 2003)" " -> Index Scan using aka_name_idx_person on aka_name (cost=0.00..8.34 rows=1 width=80) (actual time=0.010..0.010 rows=0 loops=1)" " Index Cond: ((aka_name.person_id = 2003) AND (n1.id = aka_name.person_id))" " -> Index Scan using cast_info_idx_pid on cast_info t2 (cost=0.00..10.77 rows=1 width=41) (actual time=0.011..0.020 rows=1 loops=1)" " Index Cond: ((t2.person_id = 2003) AND (n1.id = t2.person_id))" " -> Hash (cost=463.26..463.26 rows=136 width=109) (actual time=0.010..0.010 rows=0 loops=1)" " -> Index Scan using person_info_idx_pid on person_info t3 (cost=0.00..463.26 rows=136 width=109) (actual time=0.009..0.009 rows=0 loops=1)" " Index Cond: (person_id = 2003)" " -> Hash (cost=42697.62..42697.62 rows=2442362 width=74) (actual time=49305.872..49305.872 rows=2442362 loops=1)" " -> Seq Scan on char_name t4 (cost=0.00..42697.62 rows=2442362 width=74) (actual time=14.066..22775.087 rows=2442362 loops=1)" " -> Hash (cost=1.12..1.12 rows=12 width=15) (actual time=0.024..0.024 rows=12 loops=1)" " -> Seq Scan on role_type t5 (cost=0.00..1.12 rows=12 width=15) (actual time=0.012..0.014 rows=12 loops=1)" " -> Hash (cost=31134.07..31134.07 rows=1573507 width=58) (actual time=7841.225..7841.225 rows=1573507 loops=1)" " -> Seq Scan on title t6 (cost=0.00..31134.07 rows=1573507 width=58) (actual time=21.507..2799.443 rows=1573507 loops=1)" " -> Materialize (cost=59525.80..63203.88 rows=294246 width=75) (actual time=812.376..984.958 rows=192075 loops=1)" " -> Sort (cost=59525.80..60261.42 rows=294246 width=75) (actual time=812.363..922.452 rows=192075 loops=1)" " Sort Key: t7.movie_id" " Sort Method: external merge Disk: 24880kB" " -> Seq Scan on aka_title t7 (cost=0.00..6646.46 rows=294246 width=75) (actual time=24.652..164.822 rows=294246 loops=1)" " -> Materialize (cost=11627.38..12884.43 rows=100564 width=16) (actual time=123.819..149.086 rows=41907 loops=1)" " -> Sort (cost=11627.38..11878.79 rows=100564 width=16) (actual time=123.807..138.530 rows=41907 loops=1)" " Sort Key: t8.movie_id" " Sort Method: external merge Disk: 3136kB" " -> Seq Scan on complete_cast t8 (cost=0.00..1549.64 rows=100564 width=16) (actual time=0.013..10.744 rows=100564 loops=1)" " -> Materialize (cost=1.08..1.15 rows=7 width=14) (actual time=0.016..0.029 rows=7 loops=1)" " -> Seq Scan on kind_type t9 (cost=0.00..1.07 rows=7 width=14) (actual time=0.011..0.013 rows=7 loops=1)" " -> Materialize (cost=412704.52..437969.09 rows=2021166 width=40) (actual time=3420.356..4278.545 rows=1028995 loops=1)" " -> Sort (cost=412704.52..417757.43 rows=2021166 width=40) (actual time=3420.349..3953.483 rows=1028995 loops=1)" " Sort Key: t10.movie_id" " Sort Method: external merge Disk: 90960kB" " -> Seq Scan on movie_companies t10 (cost=0.00..35214.66 rows=2021166 width=40) (actual time=13.271..566.893 rows=2021166 loops=1)" " -> Materialize (cost=3144407.19..3269057.42 rows=9972019 width=76) (actual time=65485.672..70083.219 rows=5039009 loops=1)" " -> Sort (cost=3144407.19..3169337.23 rows=9972019 width=76) (actual time=65485.667..68385.550 rows=5038999 loops=1)" " Sort Key: t11.movie_id" " Sort Method: external merge Disk: 735512kB" " -> Seq Scan on movie_info t11 (cost=0.00..212815.19 rows=9972019 width=76) (actual time=15.750..15715.608 rows=9972019 loops=1)" " -> Materialize (cost=207925.01..219867.92 rows=955433 width=50) (actual time=1483.989..1785.636 rows=429401 loops=1)" " -> Sort (cost=207925.01..210313.59 rows=955433 width=50) (actual time=1483.983..1654.165 rows=429401 loops=1)" " Sort Key: t19.movie_id" " Sort Method: external merge Disk: 31720kB" " -> Seq Scan on movie_info_idx t19 (cost=0.00..15047.33 rows=955433 width=50) (actual time=7.284..221.597 rows=955433 loops=1)" " -> Materialize (cost=501605.39..537645.64 rows=2883220 width=12) (actual time=5823.040..6868.242 rows=1597396 loops=1)" " -> Sort (cost=501605.39..508813.44 rows=2883220 width=12) (actual time=5823.026..6477.517 rows=1597396 loops=1)" " Sort Key: t12.movie_id" " Sort Method: external merge Disk: 78888kB" " -> Seq Scan on movie_keyword t12 (cost=0.00..44417.20 rows=2883220 width=12) (actual time=11.672..839.498 rows=2883220 loops=1)" " -> Materialize (cost=141143.93..152995.81 rows=948150 width=16) (actual time=1916.356..2253.004 rows=478358 loops=1)" " -> Sort (cost=141143.93..143514.31 rows=948150 width=16) (actual time=1916.344..2125.698 rows=478358 loops=1)" " Sort Key: t13.linked_movie_id" " Sort Method: external merge Disk: 29632kB" " -> Seq Scan on movie_link t13 (cost=0.00..14607.50 rows=948150 width=16) (actual time=27.610..297.962 rows=948150 loops=1)" " -> Hash (cost=1.18..1.18 rows=18 width=16) (actual time=0.020..0.020 rows=18 loops=1)" " -> Seq Scan on link_type t14 (cost=0.00..1.18 rows=18 width=16) (actual time=0.010..0.012 rows=18 loops=1)" " -> Hash (cost=1537.10..1537.10 rows=91010 width=24) (actual time=54.055..54.055 rows=91010 loops=1)" " -> Seq Scan on keyword t15 (cost=0.00..1537.10 rows=91010 width=24) (actual time=0.006..14.703 rows=91010 loops=1)" " -> Hash (cost=4585.61..4585.61 rows=245461 width=42) (actual time=445.269..445.269 rows=245461 loops=1)" " -> Seq Scan on company_name t16 (cost=0.00..4585.61 rows=245461 width=42) (actual time=12.037..309.961 rows=245461 loops=1)" " -> Hash (cost=1.04..1.04 rows=4 width=25) (actual time=0.013..0.013 rows=4 loops=1)" " -> Seq Scan on company_type t17 (cost=0.00..1.04 rows=4 width=25) (actual time=0.009..0.010 rows=4 loops=1)" " -> Hash (cost=1.04..1.04 rows=4 width=13) (actual time=0.006..0.006 rows=4 loops=1)" " -> Seq Scan on comp_cast_type t18 (cost=0.00..1.04 rows=4 width=13) (actual time=0.002..0.003 rows=4 loops=1)" "Total runtime: 147055.016 ms" Is there anyway to force the name.id = 2003 before it tries to join all the tables together? As you can see, the end result is 4 tuples but it seems like it should be a fast join by using the available index after it limited it down with the name clause, although very complex.

Search Results

Search found 214 results on 9 pages for 'tuples'.

Page 9/9 | < Previous Page | 5 6 7 8 9

- by Chris Travers

- by asmaier

- by Carl Smotricz

- by smart

- by shuttle87

- by Eric

- by wageoghe

- by DarqMoth

- by Jubbles

- by Ralf Westphal

- by Navruk

- by I82Much

- by blast83

< Previous Page | 5 6 7 8 9