Search Results

Search found 3047 results on 122 pages for 'subset sum'.

Page 48/122 | < Previous Page | 44 45 46 47 48 49 50 51 52 53 54 55 | Next Page >

SQL INSTR() using CSV. Need exact match rather than part

- by Alastair Pitts

This is a follow up issue relating to the answer for http://stackoverflow.com/questions/2445029/sql-placeholder-in-where-in-issue-inserted-strings-fail Quick background: We have a SQL query that uses a placeholder value to accept a string, which represents a unique tag/id. Usually, this is only a single tag, but we needed the ability to use a csv string for multiple tags, returning a combined result. In the answer we received from the vendor, they suggested the use of the INSTR function, ala: select * from pitotal where tag IN (SELECT tag from pipoint WHERE INSTR(?, tag) <> 0) and time between 'y' and 't' This works perfectly well 99% of the time, the issue is when the tag is also a subset of 2 parts of the CSV string. Eg the placeholder value is: 'northdom,southdom,eastdom,westdom' and possible tags include: north or northdom What happens, as north is a subset of northdom, is that the two tags are return instead of just northdom, which is actually what we want. I'm not strong on SQL so I couldn't work out how to set it as exact, or split the csv string, so help would be appreciated. Is there a way to split the csv string or make it look for an exact match?

Read the article
What is the correct way to implement a massive hierarchical, geographical search for news?

- by Philip Brocoum

The company I work for is in the business of sending press releases. We want to make it possible for interested parties to search for press releases based on a number of criteria, the most important being location. For example, someone might search for all news sent to New York City, Massachusetts, or ZIP code 89134, sent from a governmental institution, under the topic of "traffic". Or whatever. The problem is, we've sent, literally, hundreds of thousands of press releases. Searching is slow and complex. For example, a press release sent to Queens, NY should show up in the search I mentioned above even though it wasn't specifically sent to New York City, because Queens is a subset of New York City. We may also want to implement "and" and "or" and negation and text search to the query to create complex searches. These searches also have to be fast enough to function as dynamic RSS feeds. I really don't know anything about search theory, or how it's properly done. The way we are getting by right now is using a data mart to store the locations the releases were sent to in a single table. However, because of the subset thing mentioned above, the data mart is gigantic with millions of rows. And we haven't even implemented cities yet, and there are about 50,000 cities in the United States, which will exponentially increase the size of the data mart by so much I'm afraid it just won't work anymore. Anyway, I realize this is not a simple question and there won't be a "do this" answer. However, I'm hoping one of you can point me in the right direction where I can learn about how massive searches are done? Because I really know nothing about it. And such a search engine is turning out to be incredibly difficult to make. Thanks! I know there must be a way because if Google can search the entire internet we must be able to search our own database :-)

Read the article
I cannot grok MVC, what it is, and what it is not?

- by Hao

I cannot grok what MVC is, what mindset or programming model should I acquire so MVC stuff can instantly "lightbulb" on my head? If not instantly, what simple programs/projects should I try to do first so I can apply the neat things MVC brings to programming. OOP is intuitive and easier, object is all around us, and the benefits of code reuse using OOP-paradigm instantly click to anyone. You can probably talk to anybody about OOP in a few minutes and lecture some examples and they would get it. While OOP somehow raise the intuitiveness aspect of programming, MVC seems to do the opposite. I'm getting negative thoughts that some future employers(or even clients) would look down upon me for not using MVC technology. Though I probably get the skinnable aspect of MVC, but when I try to apply it to my own project, I don't know where to start. And also some programmers even have diverging views on how to accomplish MVC properly. Take this for instance from Jeff's post about MVC: The view is simply how you lay the data out, how it is displayed. If you want a subset of some data, for example, my opinion is that is a responsibility of the model. So maybe some programmers use MVC, but they somehow inadvertently use the View or the Controller to extract a subset of data. Why we can't have a definitive definition of what and how to accomplish MVC properly? And also, when I search for MVC .NET programs, most of it applies to web programs, not desktop apps, this intrigue me further. My guess is, this is most advantageous to web apps, there's not much problem about intermixed view(html) and controller(program code) in desktop apps.

Read the article
Is there a standard lexer/parser tool for Python?

- by Salim Fadhley

A volunteer job requires us to convert a large number of LaTeX documents into ePub format. It's a series of open-source fiction book which has so far only been produced only on paper via a print on demand service. We'd like to be able to offer the book to users of book-reader devices (such as Kindle) which require the ePub format for best results. Fortunately, ePub is a very simple format, however there's no trivial way for LaTeX to produce the XHTML outut required. We experimented with alternative LaTeX compilers (e.g. plastex) but in the end we figured that it would probably be a lot easier to simply write our own compiler which understands a tiny subset of the LaTeX language and compiles directly to XHTML / ePub. Previously I used a tool on Windows called GOLD. This allowed me to go directly from BNF grammars to a stub parser. It also alllowed me to implement the parser in any language I liked. (I'd choose Python). This product has to work on Linux, so I'm wondering if there's an equivalent toolchain that works as well under Ubutnu / Eclipse / Python. The idea is that we will take the grammar of TeX and just implement a teeny subset of that, but we do not want to spend a huge amount of time worrying about grammar and parsing. A parser generator would obviously save us a great deal of time. Sal UPDATE 1: Bonus marks for a solution with excellent documentation or tutorials.

Read the article
Execute code on assembly load

- by Dmitriy Matveev

I'm working on wrapper for some huge unmanaged library. Almost every of it's functions can call some error handler deeply inside. The default error handler writes error to console and calls abort() function. This behavior is undesirable for managed library, so I want to replace the default error handler with my own which will just throw some exception and let program continue normal execution after handling of this exception. The error handler must be changed before any of the wrapped functions will be called. The wrapper library is written in managed c++ with static linkage to wrapped library, so nothing like "a type with hundreds of dll imports" is present. I also can't find a single type which is used by everything inside wrapper library. So I can't solve that problem by defining static constructor in one single type which will execute code I need. I currently see two ways of solving that problem: Define some static method like Library.Initialize() which must be called one time by client before his code will use any part of the wrapper library. Find the most minimal subset of types which is used by every top-level function (I think the size of this subset will be something like 25-50 types) and add static constructors calling Library.Initialize (which will be internal in that scenario) to every of these types. I've read this and this questions, but they didn't helped me. Is there any proper ways of solving that problem? Maybe some nice hacks available?

Read the article
Fixing color in scatter plots in matplotlib

- by ajhall

Hi guys, I'm going to have to come back and add some examples if you need them, which you might. But, here's the skinny- I'm plotting scatter plots of lab data for my research. I need to be able to visually compare the scatter plots from one plot to the next, so I want to fix the color range on the scatter plots and add in a colorbar to each plot (which will be the same in each figure). Essentially, I'm fixing all aspects of the axes and colorspace etc. so that the plots are directly comparable by eye. For the life of me, I can't seem to get my scatter() command to properly set the color limits in the colorspace (default)... i.e., I figure out my total data's min and total data's max, then apply them to vmin, vmax, for the subset of data, and the color still does not come out properly in both plots. This must come up here and there, I can't be the only one that wants to compare various subsets of data amongst plots... so, how do you fix the colors so that each data keeps it's color between plots and doesn't get remapped to a different color due to the change in max/min of the subset -v- the whole set? I greatly appreciate all your thoughts!!! A mountain-dew and fiery-hot cheetos to all! -Allen

Read the article
Issue with XSLT Processing on PHP

- by monksy

I'm getting a few errors from XSLTProcessor: XSLTProcessor::transformToDoc() [<a href='function.XSLTProcessor-transformToDoc'>function.XSLTProcessor-transformToDoc</a>]: Invalid or inclomplete context XSLTProcessor::transformToDoc() [<a href='function.XSLTProcessor-transformToDoc'>function.XSLTProcessor-transformToDoc</a>]: XSLTProcessor::transformToDoc() [<a href='function.XSLTProcessor-transformToDoc'>function.XSLTProcessor-transformToDoc</a>]: xsltValueOf: text copy failed in Which is parsing this XSLT Line: <xsl:apply-templates select="page/sections/section" mode="subset"/> The section is: <xsl:template match="page/sections/section" mode="subset"> <a href="#{shorttitle}"> <xsl:value-of select="title"/> </a> <xsl:if test="position() != last()"> | </xsl:if> </xsl:template> The XML that the section is parsing is: <shorttitle>About</shorttitle> <title>#~ About</title> The PHP XSLT Code is: $xslt = new XSLTProcessor(); $XSL = new DOMDocument(); $XSL->load( $xsltFile, LIBXML_NOCDATA); $xslt->importStylesheet( $XSL ); print $xslt->transformToXML( $XML ); My suspicion about the the errors is due to content. I'm not getting these errors with Firefox's XSLT rendering, nor am I getting an invalid XML document on the backend.I'm not getting errors on the load, its just on the transformToXML function. Does anyone have a clue on how to solve this? This is with PHP5.

Read the article
Cannot use await in Portable Class Library for Win 8 and Win Phone 8

- by Harry Len

I'm attempting to create a Portable Class Library in Visual Studio 2012 to be used for a Windows 8 Store app and a Windows Phone 8 app. I'm getting the following error: 'await' requires that the type 'Windows.Foundation.IAsyncOperation' have a suitable GetAwaiter method. Are you missing a using directive for 'System'? At this line of code: StorageFolder guidesInstallFolder = await Package.Current.InstalledLocation.GetFolderAsync(guidesFolder); My Portable Class Library is targeted at .NET Framework 4.5, Windows Phone 8 and .NET for Windows Store apps. I don't get this error for this line of code in a pure Windows Phone 8 project, and I don't get it in a Windows Store app either so I don't understand why it won't work in my PCL. The GetAwaiter is an extension method in the class WindowsRuntimeSystemExtensions which is in System.Runtime.WindowsRuntime.dll. Using the Object Browser I can see this dll is available in the .NET for Windows Store apps component set and in the Windows Phone 8 component set but not in the .NET Portable Subset. I just don't understand why it wouldn't be in the Portable Subset if it's available in both my targeted platforms.

Read the article
Performing calculations by subsets of data in R

- by Vivi

I want to perform calculations for each company number in the column PERMNO of my data frame, the summary of which can be seen here: > summary(companydataRETS) PERMNO RET Min. :10000 Min. :-0.971698 1st Qu.:32716 1st Qu.:-0.011905 Median :61735 Median : 0.000000 Mean :56788 Mean : 0.000799 3rd Qu.:80280 3rd Qu.: 0.010989 Max. :93436 Max. :19.000000 My solution so far was to create a variable with all possible company numbers compns <- companydataRETS[!duplicated(companydataRETS[,"PERMNO"]),"PERMNO"] And then use a foreach loop using parallel computing which calls my function get.rho() which in turn perform the desired calculations rhos <- foreach (i=1:length(compns), .combine=rbind) %dopar% get.rho(subset(companydataRETS[,"RET"],companydataRETS$PERMNO == compns[i])) I tested it for a subset of my data and it all works. The problem is that I have 72 million observations, and even after leaving the computer working overnight, it still didn't finish. I am new in R, so I imagine my code structure can be improved upon and there is a better (quicker, less computationally intensive) way to perform this same task (perhaps using apply or with, both of which I don't understand). Any suggestions?

Read the article
Altering an embedded truetype font so it will be useable by Windows GDI

- by Ritsaert Hornstra

I am trying to render PDF content to a GDI device context (a 24bit bitmap to be exact). Parsing the PDF stream into PDF objects and rendering the PDF commands from the content dictionary works well, inclduing font rendering. Embedded fonts are decompressed from their FontFile streams and "loaded" using AddFontMemResourceEx. Now some embedded fonts remove some TrueType tables that are needed by GDI, like the NAME table. Because of this, I tried to modify the font by parsing the TrueType subset font into it's tables and modify those tables that have data missing / missing tables are regenerated with as correct information as possible. I use the Microsoft Font Validator tool to see how "correct" the generated font is. I still get a few errors, like for the maxp table the max values are usually too large (it is a subset) or The xAvgCharWidth field does not equal the calculated value of the OS/2 table is not correct but this does not stop other embedded fonts to be useable.The fonts embedded using PDFCreator are the ones that are problematic. Question: - How can I determine what I need to change to the font file in order for GDI to be able to use it? - Are there any other font validation tools that might give me insight into what is still wrong with the fontfile? If needed: I can make an original fontfile and an altered fontfile available for download somewhere.

Read the article
Hierarchy of meaning

- by asldkncvas

I am looking for a method to build a hierarchy of words. Background: I am a "amateur" natural language processing enthusiast and right now one of the problems that I am interested in is determining the hierarchy of word semantics from a group of words. For example, if I have the set which contains a "super" representation of others, i.e. [cat, dog, monkey, animal, bird, ... ] I am interested to use any technique which would allow me to extract the word 'animal' which has the most meaningful and accurate representation of the other words inside this set. Note: they are NOT the same in meaning. cat != dog != monkey != animal BUT cat is a subset of animal and dog is a subset of animal. I know by now a lot of you will be telling me to use wordnet. Well, I will try to but I am actually interested in doing a very domain specific area which WordNet doesn't apply because: 1) Most words are not found in Wordnet 2) All the words are in another language; translation is possible but is to limited effect. another example would be: [ noise reduction, focal length, flash, functionality, .. ] so functionality includes everything in this set. I have also tried crawling wikipedia pages and applying some techniques on td-idf etc but wikipedia pages doesn't really do much either. Can someone possibly enlighten me as to what direction my research should go towards? (I could use anything)

Read the article
Tools to document/visualize call graph?

- by Dave Griffiths

Hi, having recently joined a project with a vast amount of code to get to grips with, I would like to start documenting and visualizing some of the flows through the call graph to give me a better understanding of how everything fits together. This is what I would like to see in my ideal tool: every node is a function/method nodes are connected if one function can call another optional square box in between detailing conditions under which call is made (or a label icon you can hover over like a tooltip) also icon on edge describing parameters hover over node and description is displayed optional icons for node to display pseudo code scenario/domain view - display subset of complete diagram for particular use-case slide view mode - for each frame, the currently executing function is highlighted plenty of options over what to display to reduce on-screen clutter the interactive use of such a tool is key, I'm not looking for a Graphviz type solution because there would be too much clutter. The ability to form a view of a subset of the entire graph would be very handy (maybe with the unimportant clutter greyed out). Don't need automatic generation from source code, happy to enter it manually. Almost like a mind-map. Does that make sense? If you are not aware of such a tool, do you also think it would be useful? (Just in case I decide to go and scratch that itch one day!)

Read the article
how to remove subsets form given text file

- by user324887

i have a problem like this 10 20 30 40 70 20 30 70 30 40 10 20 29 70 80 90 20 30 40 40 45 65 10 20 80 45 65 20 I want to remove all subset transaction from this file. output file should be like follows 10 20 30 40 70 29 70 80 90 20 30 40 40 45 65 10 20 80 Where records like 20 30 70 30 40 10 20 45 65 20 are removed because of they are subset of other records. i AM using set for this but i am not able to create one set for one line can anybody know how to do this please help me here i am sending you my code include include include using namespace std; using namespace std; set s1; int main() { FILE fp = fopen ( "abc.txt", "r" ); if ( fp != NULL ) { char line [ 128 ]; / or other suitable maximum line size */ while ( fgets ( line, sizeof line, fp ) != NULL ) /* read a line */ { istringstream iss(line); do { string sub; iss >> sub; s1.insert(sub); } while (iss); for (set<string>::const_iterator p = s1.begin( );p != s1.end( ); ++p) cout << *p << endl; } } }

Read the article
Partitioning data set in r based on multiple classes of observations

- by Danny

I'm trying to partition a data set that I have in R, 2/3 for training and 1/3 for testing. I have one classification variable, and seven numerical variables. Each observation is classified as either A, B, C, or D. For simplicity's sake, let's say that the classification variable, cl, is A for the first 100 observations, B for observations 101 to 200, C till 300, and D till 400. I'm trying to get a partition that has 2/3 of the observations for each of A, B, C, and D (as opposed to simply getting 2/3 of the observations for the entire data set since it will likely not have equal amounts of each classification). When I try to sample from a subset of the data, such as sample(subset(data, cl=='A')), the columns are reordered instead of the rows. To summarize, my goal is to have 67 random observations from each of A, B, C, and D as my training data, and store the remaining 33 observations for each of A, B, C, and D as testing data. I have found a very similar question to mine, but it did not factor in multiple variables. I feel silly asking this question because it seems so simple, but I'm stumped. Also, this is my first question on this site, so I apologize in advance for any faux pas on my part.

Read the article
What's the most efficient way to load data from a file to a collection on-demand?

- by Dan

I'm working on a java project that will allows users to parse multiple files with potentially thousands of lines. The information parsed will be stored in different objects, which then will be added to a collection. Since the GUI won't require to load ALL these objects at once and keep them in memory, I'm looking for an efficient way to load/unload data from files, so that data is only loaded into the collection when a user requests it. I'm just evaluation options right now. I've also thought of the case where, after loading a subset of the data into the collection, and presenting it on the GUI, the best way to reload the previously observed data. Re-run the parser/Populate collection/Populate GUI? or probably find a way to keep the collection into memory, or serialize/deserialize the collection itself? I know that loading/unloading subsets of data can get tricky if some sort of data filtering is performed. Let's say that I filter on ID, so my new subset will contain data from two previous analyzed subsets. This would be no problem is I keep a master copy of the whole data in memory. I've read that google-collections are good and efficient when handling big amounts of data, and offer methods that simplify lots of things so this might offer an alternative to allow me to keep the collection in memory. This is just general talking. The question on what collection to use is a separate and complex thing. Do you know what's the general recommendation on this type of task? I'd like to hear what you've done with similar scenarios. I can provide more specifics if needed.

Read the article
Are Thread.stop and friends ever safe in Java?

- by Stephen C

The stop(), suspend(), and resume() in java.lang.Thread are deprecated because they are unsafe. The Sun recommended work around is to use Thread.interrupt(), but that approach doesn't work in all cases. For example, if you are call a library method that doesn't explicitly or implicitly check the interrupted flag, you have no choice but to wait for the call to finish. So, I'm wondering if it is possible to characterize situations where it is (provably) safe to call stop() on a Thread. For example, would it be safe to stop() a thread that did nothing but call find(...) or match(...) on a java.util.regex.Matcher? (If there are any Sun engineers reading this ... a definitive answer would be really appreciated.) EDIT: Answers that simply restate the mantra that you should not call stop() because it is deprecated, unsafe, whatever are missing the point of this question. I know that that it is genuinely unsafe in the majority of cases, and that if there is a viable alternative you should always use that instead. This question is about the subset cases where it is safe. Specifically, what is that subset?

Read the article
Point covering problem

- by Sean

I recently had this problem on a test: given a set of points m (all on the x-axis) and a set n of lines with endpoints [l, r] (again on the x-axis), find the minimum subset of n such that all points are covered by a line. Prove that your solution always finds the minimum subset. The algorithm I wrote for it was something to the effect of: (say lines are stored as arrays with the left endpoint in position 0 and the right in position 1) algorithm coverPoints(set[] m, set[][] n): chosenLines = [] while m is not empty: minX = min(m) bestLine = n[0] for i=1 to length of n: if n[i][0] <= m and n[i][1] > bestLine[1] then bestLine = n[i] add bestLine to chosenLines for i=0 to length of m: if m <= bestLine[1] then delete m[i] from m return chosenLines I'm just not sure if this always finds the minimum solution. It's a simple greedy algorithm so my gut tells me it won't, but one of my friends who is much better than me at this says that for this problem a greedy algorithm like this always finds the minimal solution. For proving mine always finds the minimal solution I did a very hand wavy proof by contradiction where I made an assumption that probably isn't true at all. I forget exactly what I did. If this isn't a minimal solution, is there a way to do it in less than something like O(n!) time? Thanks

Read the article
Using list() to extract a data.table inside of a function

- by Nathan VanHoudnos

I must admit that the data.table J syntax confuses me. I am attempting to use list() to extract a subset of a data.table as a data.table object as described in Section 1.4 of the data.table FAQ, but I can't get this behavior to work inside of a function. An example: require(data.table) ## Setup some test data set.seed(1) test.data <- data.table( X = rnorm(10), Y = rnorm(10), Z = rnorm(10) ) setkey(test.data, X) ## Notice that I can subset the data table easily with literal names test.data[, list(X,Y)] ## X Y ## 1: -0.8356286 -0.62124058 ## 2: -0.8204684 -0.04493361 ## 3: -0.6264538 1.51178117 ## 4: -0.3053884 0.59390132 ## 5: 0.1836433 0.38984324 ## 6: 0.3295078 1.12493092 ## 7: 0.4874291 -0.01619026 ## 8: 0.5757814 0.82122120 ## 9: 0.7383247 0.94383621 ## 10: 1.5952808 -2.21469989 I can even write a function that will return a column of the data.table as a vector when passed the name of a column as a character vector: get.a.vector <- function( my.dt, my.column ) { ## Step 1: Convert my.column to an expression column.exp <- parse(text=my.column) ## Step 2: Return the vector return( my.dt[, eval(column.exp)] ) } get.a.vector( test.data, 'X') ## [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884 0.1836433 0.3295078 ## [7] 0.4874291 0.5757814 0.7383247 1.5952808 But I cannot pull a similar trick for list(). The inline comments are the output from the interactive browser() session. get.a.dt <- function( my.dt, my.column ) { ## Step 1: Convert my.column to an expression column.exp <- parse(text=my.column) ## Step 2: Enter the browser to play around browser() ## Step 3: Verity that a literal X works: my.dt[, list(X)] ## << not shown >> ## Step 4: Attempt to evaluate the parsed experssion my.dt[, list( eval(column.exp)] ## Error in `rownames<-`(`*tmp*`, value = paste(format(rn, right = TRUE), (from data.table.example.R@1032mCJ#7) : ## length of 'dimnames' [1] not equal to array extent return( my.dt[, list(eval(column.exp))] ) } get.a.dt( test.data, "X" ) What am I missing? Update: Due to some confusion as to why I would want to do this I wanted to clarify. My use case is when I need to access a data.table column when when I generate the name. Something like this: set.seed(2) test.data[, X.1 := rnorm(10)] which.column <- 'X' new.column <- paste(which.column, '.1', sep="") get.a.dt( test.data, new.column ) Hopefully that helps.

Read the article
How do I do a table join on two fields in my second table?

- by Cannonade

I have two tables: Messages - Amongst other things, has a to_id and a from_id field. People - Has a corresponding person_id I am trying to figure out how to do the following in a single linq query: Give me all messages that have been sent to and from person x (idself). I had a couple of cracks at this. Not quite right MsgPeople = (from p in db.people join m in db.messages on p.person_id equals m.from_id where (m.from_id == idself || m.to_id == idself) orderby p.name descending select p).Distinct(); This almost works, except I think it misses one case: "people who have never received a message, just sent one to me" How this works in my head So what I really need is something like: join m in db.messages on (p.people_id equals m.from_id or p.people_id equals m.to_id) Gets me a subset of the people I am after It seems you can't do that. I have tried a few other options, like doing two joins: MsgPeople = (from p in db.people join m in AllMessages on p.person_id equals m.from_id join m2 in AllMessages on p.person_id equals m2.to_id where (m2.from_id == idself || m.to_id == idself) orderby p.name descending select p).Distinct(); but this gives me a subset of the results I need, I guess something to do with the order the joins are resolved. My understanding of LINQ (and perhaps even database theory) is embarrassingly superficial and I look forward to having some light shed on my problem.

Read the article
NullPointerException, Collections not storing data?

- by Elliott

Hi there, I posted this question earlier but not with the code in its entirety. The coe below also calls to other classes Background and Hydro which I have included at the bottom. I have a Nullpointerexception at the line indicate by asterisks. Which would suggest to me that the Collections are not storing data properly. Although when I check their size they seem correct. Thanks in advance. PS: If anyone would like to give me advice on how best to format my code to make it readable, it would be appreciated. Elliott package exam0607; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.net.URL; import java.util.Collection; import java.util.Scanner; import java.util.Vector; import exam0607.Hydro; import exam0607.Background;// this may not be necessary???? FIND OUT public class HydroAnalysis { public static void main(String[] args) { Collection hydroList = null; Collection backList = null; try{hydroList = readHydro("http://www.hep.ucl.ac.uk/undergrad/3459/exam_data/2006-07/final/hd_data.dat");} catch (IOException e){ e.getMessage();} try{backList = readBackground("http://www.hep.ucl.ac.uk/undergrad/3459/exam_data/2006-07/final/hd_bgd.dat"); //System.out.println(backList.size()); } catch (IOException e){ e.getMessage();} for(int i =0; i <=14; i++ ){ String nameroot = "HJK"; String middle = Integer.toString(i); String hydroName = nameroot + middle + "X"; System.out.println(hydroName); ALGO_1(hydroName, backList, hydroList); } } public static Collection readHydro(String url) throws IOException { URL u = new URL(url); InputStream is = u.openStream(); InputStreamReader isr = new InputStreamReader(is); BufferedReader b = new BufferedReader(isr); String line =""; Collection data = new Vector(); while((line = b.readLine())!= null){ Scanner s = new Scanner(line); String name = s.next(); System.out.println(name); double starttime = Double.parseDouble(s.next()); System.out.println(+starttime); double increment = Double.parseDouble(s.next()); System.out.println(+increment); double p = 0; double nterms = 0; while(s.hasNextDouble()){ p = Double.parseDouble(s.next()); System.out.println(+p); nterms++; System.out.println(+nterms); } Hydro SAMP = new Hydro(name, starttime, increment, p); data.add(SAMP); } return data; } public static Collection readBackground(String url) throws IOException { URL u = new URL(url); InputStream is = u.openStream(); InputStreamReader isr = new InputStreamReader(is); BufferedReader b = new BufferedReader(isr); String line =""; Vector data = new Vector(); while((line = b.readLine())!= null){ Scanner s = new Scanner(line); String name = s.next(); //System.out.println(name); double starttime = Double.parseDouble(s.next()); //System.out.println(starttime); double increment = Double.parseDouble(s.next()); //System.out.println(increment); double sum = 0; double p = 0; double nterms = 0; while((s.hasNextDouble())){ p = Double.parseDouble(s.next()); //System.out.println(p); nterms++; sum += p; } double pbmean = sum/nterms; Background SAMP = new Background(name, starttime, increment, pbmean); //System.out.println(SAMP); data.add(SAMP); } return data; } public static void ALGO_1(String hydroName, Collection backgs, Collection hydros){ //double aMin = Double.POSITIVE_INFINITY; //double sum = 0; double intensity = 0; double numberPN_SIG = 0; double POSITIVE_PN_SIG =0; //int numberOfRays = 0; for(Hydro hd: hydros){ System.out.println(hd.H_NAME); for(Background back : backgs){ System.out.println(back.H_NAME); if(back.H_NAME.equals(hydroName)){//ERROR HERE double PN_SIG = Math.max(0.0, hd.PN - back.PBMEAN); numberPN_SIG ++; if(PN_SIG 0){ intensity += PN_SIG; POSITIVE_PN_SIG ++; } } } double positive_fraction = POSITIVE_PN_SIG/numberPN_SIG; if(positive_fraction < 0.5){ System.out.println( hydroName + "is faulty" ); } else{System.out.println(hydroName + "is not faulty");} System.out.println(hydroName + "has instensity" + intensity); } } } THE BACKGROUND CLASS package exam0607; public class Background { String H_NAME; double T_START; double DT; double PBMEAN; public Background(String name, double starttime, double increment, double pbmean) { name = H_NAME; starttime = T_START; increment = DT; pbmean = PBMEAN; }} AND THE HYDRO CLASS public class Hydro { String H_NAME; double T_START; double DT; double PN; public double n; public Hydro(String name, double starttime, double increment, double p) { name = H_NAME; starttime = T_START; increment = DT; p = PN; } }

Read the article
Can this Query be corrected or different table structure needed? (database dumps provided)

- by sandeepan

This is a bit lengthy but I have provided sufficient details and kept things very clear. Please see if you can help. (I will surely accept answer if it solves my problem) I am sure a person experienced with this can surely help or suggest me to decide the tables structure. About the system:- There are tutors who create classes A tags based search approach is being followed Tag relations are created/edited when new tutors registers/edits profile data and when tutors create classes (this makes tutors and classes searcheable).For simplicity, let us consider only tutor name and class name are the fields which are matched against search keywords. In this example, I am considering - tutor "Sandeepan Nath" has created a class called "first class" tutor "Bob Cratchit" has created a class called "new class" Desired search results- AND logic to be appied on the search keywords and match against class and tutor data(class name + tutor name), in other words, All those classes be shown such that all the search terms are present in the class name or its tutor name. Example to be clear - Searching "first class" returns class with id_wc = 1. Working Searching "Sandeepan class" should also return class with id_wc = 1. Not working in System 2. Problem with profile editing and searching To tell in one sentence, I am facing a conflict between the ease of profile edition (edition of tag relations when tutor profiles are edited) and the ease of search logic. In the beginning, we had one table structure and search was easy but tag edition logic was very clumsy and unmaintainable(Check System 1 in the section below) . So we created separate tag relations tables to make profile edition simpler but search has become difficult. Please dump the tables so that you can run the search query I have given below and see the results. System 1 (previous system - search easy - profile edition difficult):- Only one table called All_Tag_Relations table had the all the tag relations. The tags table below is common to both systems 1 and 2. CREATE TABLE IF NOT EXISTS `all_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `all_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`, `id_wc`) VALUES (1, 1, 1, NULL), (2, 2, 1, NULL), (3, 1, 1, 1), (4, 2, 1, 1), (5, 3, 1, 1), (6, 4, 1, 1), (7, 6, 2, NULL), (8, 7, 2, NULL), (9, 6, 2, 2), (10, 7, 2, 2), (11, 5, 2, 2), (12, 4, 2, 2); CREATE TABLE IF NOT EXISTS `tags` ( `id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT, `tag` varchar(255) DEFAULT NULL, PRIMARY KEY (`id_tag`), UNIQUE KEY `tag` (`tag`), KEY `id_tag` (`id_tag`), KEY `tag_2` (`tag`), KEY `tag_3` (`tag`), KEY `tag_4` (`tag`), FULLTEXT KEY `tag_5` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ; INSERT INTO `tags` (`id_tag`, `tag`) VALUES (1, 'Sandeepan'), (2, 'Nath'), (3, 'first'), (4, 'class'), (5, 'new'), (6, 'Bob'), (7, 'Cratchit'); Please note that for every class, the tag rels of its tutor have to be duplicated. Example, for class with id_wc=1, the tag rel records with id_tag_rel = 3 and 4 are actually extras if you compare with the tag rel records with id_tag_rel = 1 and 2. System 2 (present system - profile edition easy, search difficult) Two separate tables Tutors_Tag_Relations and Webclasses_Tag_Relations have the corresponding tag relations data (Please dump into a separate database)- CREATE TABLE IF NOT EXISTS `tutors_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `tutors_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`) VALUES (1, 1, 1), (2, 2, 1), (3, 6, 2), (4, 7, 2); CREATE TABLE IF NOT EXISTS `webclasses_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `webclasses_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `webclasses_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`, `id_wc`) VALUES (1, 3, 1, 1), (2, 4, 1, 1), (3, 5, 2, 2), (4, 4, 2, 2); CREATE TABLE IF NOT EXISTS `tags` ( `id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT, `tag` varchar(255) DEFAULT NULL, PRIMARY KEY (`id_tag`), UNIQUE KEY `tag` (`tag`), KEY `id_tag` (`id_tag`), KEY `tag_2` (`tag`), KEY `tag_3` (`tag`), KEY `tag_4` (`tag`), FULLTEXT KEY `tag_5` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ; INSERT INTO `tags` (`id_tag`, `tag`) VALUES (1, 'Sandeepan'), (2, 'Nath'), (3, 'first'), (4, 'class'), (5, 'new'), (6, 'Bob'), (7, 'Cratchit'); CREATE TABLE IF NOT EXISTS `all_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; insert into All_Tag_Relations select NULL,id_tag,id_tutor,NULL from Tutors_Tag_Relations; insert into All_Tag_Relations select NULL,id_tag,id_tutor,id_wc from Webclasses_Tag_Relations; Here you can see how easily tutor first name can be edited only in one place. But search has become really difficult, so on being advised to use a Temporary table, I am creating one at every search request, then dumping all the necessary data and then searching from it, I am creating this All_Tag_Relations table at search run time. Here I am just dumping all the data from the two tables Tutors_Tag_Relations and Webclasses_Tag_Relations. But, I am still not able to get classes if I search with tutor name This is the query which searches "first class". Running them on both the systems shows correct results (returns the class with id_wc = 1). SELECT wtagrels.id_wc,SUM(DISTINCT( wtagrels.id_tag =3)) AS key_1_total_matches, SUM(DISTINCT( wtagrels.id_tag =4)) AS key_2_total_matches FROM all_tag_relations AS wtagrels WHERE ( wtagrels.id_tag =3 OR wtagrels.id_tag =4 ) GROUP BY wtagrels.id_wc HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 LIMIT 0, 20 But, searching for "Sandeepan class" works only with the 1st system Here is the query which searches "Sandeepan class" SELECT wtagrels.id_wc,SUM(DISTINCT( wtagrels.id_tag =1)) AS key_1_total_matches, SUM(DISTINCT( wtagrels.id_tag =4)) AS key_2_total_matches FROM all_tag_relations AS wtagrels WHERE ( wtagrels.id_tag =1 OR wtagrels.id_tag =4 ) GROUP BY wtagrels.id_wc HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 LIMIT 0, 20 Can anybody alter this query and somehow do a proper join or something to get correct results. That solves my problem in a nice way. As you can figure out, the reason why it does not work in system 2 is that in system 1, for every class, one additional tag relation linking class and tutor name is present. e.g. for class first class, (records with id_tag_rel 3 and 4) which returns the class on searching with tutor name. So, you see the trade-off between the search and profile edition difficulty with the two systems. How do I overcome both. I have to reach a conclusion soon. So far my reasoning is it is definitely not good from a code maintainability point of view to follow the single tag rel table structure of system one, because in a real system while editing a field like "tutor qualifications", there can be as many records in tag rels table as there are words in qualification of a tutor (one word in a field = one tag relation). Now suppose a tutor has 100 classes. When he edits his qualification, all the tag rel rows corresponding to him are deleted and then as many copies are to be created (as per the new qualification data) as there are classes. This becomes particularly difficult if later more searcheable fields are added. The code cannot be robust. Is the best solution to follow system 2 (edition has to be in one table - no extra work for each and every class) and somehow re-create the all_tag_relations table like system 1 (from the tables tutor_tag_relations and webclasses_tag_relations), creating the extra tutor tag rels for each and every class by a tutor (which is currently missing in system 2's temporary all_tag_relations table). That would be a time consuming logic script. I doubt that table can be recreated without resorting to PHP sript (mysql alone cannot do that). But the problem is that running all this at search time will make search definitely slow. So, how do such systems work? How are such situations handled? I thought about we can run a cron which initiates that PHP script, say every 1 minute and replaces the existing all_tag_relations table as per new tag rels from tutor_tag_relations and webclasses_tag_relations (replaces means creates a new table, deletes the original and renames the new one as all_tag_relations, otherwise search won't work during that period- or is there any better way to that?). Anyway, the result would be that any changes by tutors will reflect in search in the next 1 minute and not immediately. An alternateve would be to initate that PHP script every time a tutor edits his profile. But here again, since many users may edit their profiles concurrently, will the creation of so many tables be a burden and can mysql make the server slow? Any help would be appreciated and working solution will be accepted as answer. Thanks, Sandeepan

Read the article
Can this Query can be corrected or different table structure needed? (question is clear, detailed, d

- by sandeepan

This is a bit lengthy but I have provided sufficient details and kept things very clear. Please see if you can help. (I will surely accept answer if it solves my problem) I am sure a person experienced with this can surely help or suggest me to decide the tables structure. About the system:- There are tutors who create classes A tags based search approach is being followed Tag relations are created/edited when new tutors registers/edits profile data and when tutors create classes (this makes tutors and classes searcheable).For simplicity, let us consider only tutor name and class name are the fields which are matched against search keywords. In this example, I am considering - tutor "Sandeepan Nath" has created a class called "first class" tutor "Bob Cratchit" has created a class called "new class" Desired search results- AND logic to be appied on the search keywords and match against class and tutor data(class name + tutor name), in other words, All those classes be shown such that all the search terms are present in the class name or its tutor name. Example to be clear - Searching "first class" returns class with id_wc = 1. Working Searching "Sandeepan class" should also return class with id_wc = 1. Not working in System 2. Problem with profile editing and searching To tell in one sentence, I am facing a conflict between the ease of profile edition (edition of tag relations when tutor profiles are edited) and the ease of search logic. In the beginning, we had one table structure and search was easy but tag edition logic was very clumsy and unmaintainable(Check System 1 in the section below) . So we created separate tag relations tables to make profile edition simpler but search has become difficult. Please dump the tables so that you can run the search query I have given below and see the results. System 1 (previous system - search easy - profile edition difficult):- Only one table called All_Tag_Relations table had the all the tag relations. The tags table below is common to both systems 1 and 2. CREATE TABLE IF NOT EXISTS `all_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `all_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`, `id_wc`) VALUES (1, 1, 1, NULL), (2, 2, 1, NULL), (3, 1, 1, 1), (4, 2, 1, 1), (5, 3, 1, 1), (6, 4, 1, 1), (7, 6, 2, NULL), (8, 7, 2, NULL), (9, 6, 2, 2), (10, 7, 2, 2), (11, 5, 2, 2), (12, 4, 2, 2); CREATE TABLE IF NOT EXISTS `tags` ( `id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT, `tag` varchar(255) DEFAULT NULL, PRIMARY KEY (`id_tag`), UNIQUE KEY `tag` (`tag`), KEY `id_tag` (`id_tag`), KEY `tag_2` (`tag`), KEY `tag_3` (`tag`), KEY `tag_4` (`tag`), FULLTEXT KEY `tag_5` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ; INSERT INTO `tags` (`id_tag`, `tag`) VALUES (1, 'Sandeepan'), (2, 'Nath'), (3, 'first'), (4, 'class'), (5, 'new'), (6, 'Bob'), (7, 'Cratchit'); Please note that for every class, the tag rels of its tutor have to be duplicated. Example, for class with id_wc=1, the tag rel records with id_tag_rel = 3 and 4 are actually extras if you compare with the tag rel records with id_tag_rel = 1 and 2. System 2 (present system - profile edition easy, search difficult) Two separate tables Tutors_Tag_Relations and Webclasses_Tag_Relations have the corresponding tag relations data (Please dump into a separate database)- CREATE TABLE IF NOT EXISTS `tutors_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `tutors_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`) VALUES (1, 1, 1), (2, 2, 1), (3, 6, 2), (4, 7, 2); CREATE TABLE IF NOT EXISTS `webclasses_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `webclasses_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `webclasses_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`, `id_wc`) VALUES (1, 3, 1, 1), (2, 4, 1, 1), (3, 5, 2, 2), (4, 4, 2, 2); CREATE TABLE IF NOT EXISTS `tags` ( `id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT, `tag` varchar(255) DEFAULT NULL, PRIMARY KEY (`id_tag`), UNIQUE KEY `tag` (`tag`), KEY `id_tag` (`id_tag`), KEY `tag_2` (`tag`), KEY `tag_3` (`tag`), KEY `tag_4` (`tag`), FULLTEXT KEY `tag_5` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ; INSERT INTO `tags` (`id_tag`, `tag`) VALUES (1, 'Sandeepan'), (2, 'Nath'), (3, 'first'), (4, 'class'), (5, 'new'), (6, 'Bob'), (7, 'Cratchit'); CREATE TABLE IF NOT EXISTS `all_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; insert into All_Tag_Relations select NULL,id_tag,id_tutor,NULL from Tutors_Tag_Relations; insert into All_Tag_Relations select NULL,id_tag,id_tutor,id_wc from Webclasses_Tag_Relations; Here you can see how easily tutor first name can be edited only in one place. But search has become really difficult, so on being advised to use a Temporary table, I am creating one at every search request, then dumping all the necessary data and then searching from it, I am creating this All_Tag_Relations table at search run time. Here I am just dumping all the data from the two tables Tutors_Tag_Relations and Webclasses_Tag_Relations. But, I am still not able to get classes if I search with tutor name This is the query which searches "first class". Running them on both the systems shows correct results (returns the class with id_wc = 1). SELECT wtagrels.id_wc,SUM(DISTINCT( wtagrels.id_tag =3)) AS key_1_total_matches, SUM(DISTINCT( wtagrels.id_tag =4)) AS key_2_total_matches FROM all_tag_relations AS wtagrels WHERE ( wtagrels.id_tag =3 OR wtagrels.id_tag =4 ) GROUP BY wtagrels.id_wc HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 LIMIT 0, 20 But, searching for "Sandeepan class" works only with the 1st system Here is the query which searches "Sandeepan class" SELECT wtagrels.id_wc,SUM(DISTINCT( wtagrels.id_tag =1)) AS key_1_total_matches, SUM(DISTINCT( wtagrels.id_tag =4)) AS key_2_total_matches FROM all_tag_relations AS wtagrels WHERE ( wtagrels.id_tag =1 OR wtagrels.id_tag =4 ) GROUP BY wtagrels.id_wc HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 LIMIT 0, 20 Can anybody alter this query and somehow do a proper join or something to get correct results. That solves my problem in a nice way. As you can figure out, the reason why it does not work in system 2 is that in system 1, for every class, one additional tag relation linking class and tutor name is present. e.g. for class first class, (records with id_tag_rel 3 and 4) which returns the class on searching with tutor name. So, you see the trade-off between the search and profile edition difficulty with the two systems. How do I overcome both. I have to reach a conclusion soon. So far my reasoning is it is definitely not good from a code maintainability point of view to follow the single tag rel table structure of system one, because in a real system while editing a field like "tutor qualifications", there can be as many records in tag rels table as there are words in qualification of a tutor (one word in a field = one tag relation). Now suppose a tutor has 100 classes. When he edits his qualification, all the tag rel rows corresponding to him are deleted and then as many copies are to be created (as per the new qualification data) as there are classes. This becomes particularly difficult if later more searcheable fields are added. The code cannot be robust. Is the best solution to follow system 2 (edition has to be in one table - no extra work for each and every class) and somehow re-create the all_tag_relations table like system 1 (from the tables tutor_tag_relations and webclasses_tag_relations), creating the extra tutor tag rels for each and every class by a tutor (which is currently missing in system 2's temporary all_tag_relations table). That would be a time consuming logic script. I doubt that table can be recreated without resorting to PHP sript (mysql alone cannot do that). But the problem is that running all this at search time will make search definitely slow. So, how do such systems work? How are such situations handled? I thought about we can run a cron which initiates that PHP script, say every 1 minute and replaces the existing all_tag_relations table as per new tag rels from tutor_tag_relations and webclasses_tag_relations (replaces means creates a new table, deletes the original and renames the new one as all_tag_relations, otherwise search won't work during that period- or is there any better way to that?). Anyway, the result would be that any changes by tutors will reflect in search in the next 1 minute and not immediately. An alternateve would be to initate that PHP script every time a tutor edits his profile. But here again, since many users may edit their profiles concurrently, will the creation of so many tables be a burden and can mysql make the server slow? Any help would be appreciated and working solution will be accepted as answer. Thanks, Sandeepan

Read the article
How to synchronize cuda threads when they are in the same loop and we need to synchronize them to ex

- by Vickey

Hi all, I have written a code and Now I want to implement this on cuda GPU but I'm new to synchronization so please help me with this, It's little urgent to me. Below I'm presenting the code and I want to that LOOP1 to be executed by all threads (heance I want to this portion to take advantage of cuda and the remaining portion (the portion other from the LOOP1) is to be executed by only a single thread. do{ point_set = master_Q[(*num_mas) - 1].q; List* temp = point_set; List* pa = point_set; if(master_Q[num_mas[0] - 1].max) max_level = (int) (ceilf(il2 * log(master_Q[num_mas[0] - 1].max))); *num_mas = (*num_mas) - 1; while(point_set){ List* insert_ele = temp; while(temp){ insert_ele = temp; if((insert_ele->dist[insert_ele->dist_index-1] <= pow(2, max_level-1)) || (top_level == max_level)){ if(point_set == temp){ point_set = temp->next; pa = temp->next; } else{ pa->next = temp->next; } temp = NULL; List* new_point_set = point_set; float maximum_dist = 0; if(parent->p_index != insert_ele->point_index){ List* tmp = new_point_set; float *b = &(data[(insert_ele->point_index)*point_len]); **LOOP 1:** while(tmp){ float *c = &(data[(tmp->point_index)*point_len]); float sum = 0.; for(int j = 0; j < point_len; j+=2){ float d1 = b[j] - c[j]; float d2 = b[j+1] - c[j+1]; d1 *= d1; d2 *= d2; sum = sum + d1 + d2; } tmp->dist[tmp->dist_index] = sqrt(sum); if(maximum_dist < tmp->dist[tmp->dist_index]) maximum_dist = tmp->dist[tmp->dist_index]; tmp->dist_index = tmp->dist_index+1; tmp = tmp->next; } max_distance = maximum_dist; } while(new_point_set || insert_ele){ List* far, *par, *tmp, *tmp_new; far = NULL; tmp = new_point_set; tmp_new = NULL; float level_dist = pow(2, max_level-1); float maxdist = 0, maxp = 0; while(tmp){ if(tmp->dist[(tmp->dist_index)-1] > level_dist){ if(maxdist < tmp->dist[tmp->dist_index-1]) maxdist = tmp->dist[tmp->dist_index-1]; if(tmp == new_point_set){ new_point_set = tmp->next; par = tmp->next; } else{ par->next = tmp->next; } if(far == NULL){ far = tmp; tmp_new = far; } else{ tmp_new->next = tmp; tmp_new = tmp; } if(parent->p_index != insert_ele->point_index) tmp->dist_index = tmp->dist_index - 1; tmp = tmp->next; tmp_new->next = NULL; } else{ par = tmp; if(maxp < tmp->dist[(tmp->dist_index)-1]) maxp = tmp->dist[(tmp->dist_index)-1]; tmp = tmp->next; } } if(0 == maxp){ tmp = new_point_set; aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level--; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; while(tmp){ aloc_mem[*tree_index].p_index = tmp->point_index; aloc_mem[(*tree_index)].no_child = 0; aloc_mem[(*tree_index)].level = master_Q[(*cur_count_Q)-1].level; parent->children_index[parent->no_child] = *tree_index; parent->no_child = parent->no_child + 1; (*tree_index)++; tmp = tmp->next; } cur_count_Q[0] = cur_count_Q[0]-1; new_point_set = NULL; } master_Q[*num_mas].q = far; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].max = maxdist; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; if(0 != maxp){ aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; if(maxp){ int new_level = ((int) (ceilf(il2 * log(maxp)))) +1; if (new_level < (max_level-1)) max_level = new_level; else max_level--; } else max_level--; } if( 0 == maxp ) insert_ele = NULL; } } else{ if(NULL == temp->next){ master_Q[*num_mas].q = point_set; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; } pa = temp; temp = temp->next; } } if((*num_mas) > 1){ List *temp2 = master_Q[(*num_mas)-1].q; while(temp2){ List* temp3 = master_Q[(*num_mas)-2].q; master_Q[(*num_mas)-2].q = temp2; if((master_Q[(*num_mas)-1].parent)->p_index != (master_Q[(*num_mas)-2].parent)->p_index){ temp2->dist_index = temp2->dist_index - 1; } temp2 = temp2->next; master_Q[(*num_mas)-2].q->next = temp3; } num_mas[0] = num_mas[0]-1; } point_set = master_Q[(*num_mas)-1].q; temp = point_set; pa = point_set; parent = master_Q[(*num_mas)-1].parent; max_level = master_Q[(*num_mas)-1].level; if(master_Q[(*num_mas)-1].max) if( max_level > ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1) max_level = ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1; num_mas[0] = num_mas[0]-1; } }while(*num_mas > 0);

Read the article
How Should I Generate Trade Statistics For CouchDB/Rails3 Application?

- by James

My Problem: I am trying to developing a web application for currency traders. The application allows traders to enter or upload information about their trades and I want to calculate a wide variety of statistics based on what the user entered. Now, normally I would use a relational database for this, but I have two requirements that don't fit well with a relational database so I am attempting to use couchdb. Those two problems are: 1) Primarily, I have a companion desktop application that users will be able to work with and replicate to the site using couchdb's awesome replication feature and 2) I would like to allow users to be able to define their own custom things to track about trades and generate results based off of what they enter. The schema less nature of couch seems perfect here, but it may end up being harder than it sounds. (I already know couch requires you to define views in advance and such so I was just planning on sticking all the custom attributes in an array and then emitting the array in the view and further processing from there.) What I Am Doing: Right now I am just emitting each trade in couch keyed by each user's system and querying with the key of the system to get an array of trades per system. Simple. I am not using a reduce function currently to calculate any stats because I couldn't figure out how to get everything I need without getting a reduce overflow error. Here is an example of rows that are getting emitted from couch: {"total_rows":134,"offset":0,"rows":[ {"id":"5b1dcd47221e160d8721feee4ccc64be", "key":["80e40ba2fa43589d57ec3f1d19db41e6","2010/05/14 04:32:37 +0000"], null, "doc":{ "_id":"5b1dcd47221e160d8721feee4ccc64be", "_rev":"1-bc9fe763e2637694df47d6f5efb58e5b", "couchrest-type":"Trade", "system":"80e40ba2fa43589d57ec3f1d19db41e6", "pair":"EUR/USD", "direction":"Buy", "entry":12600, "exit":12700, "stop_loss":12500, "profit_target":12700, "status":"Closed", "slug":"101332132375", "custom_tracking": [{"name":"signal", "value":"Pin Bar"}] "updated_at":"2010/05/14 04:32:37 +0000", "created_at":"2010/05/14 04:32:37 +0000", "result":100}} ]} In my rails 3 controller I am basically just populating an array of trades such as the one above and then extracting out the relevant data into smaller arrays that I can compute my statistics on. Here is my show action for the page that I want to display the stats and all the trades: def show @trades = Trade.by_system(:startkey => [@system.id], :endkey => [@system.id, Time.now ]) @trades.each do |trade| if trade.result > 0 @winning_trades << trade.result elsif trade.result < 0 @losing_trades << trade.result else @breakeven_trades << trade.result end if trade.direction == "Buy" @long_trades << trade.result else @short_trades << trade.result end if trade["custom_tracking"] != nil @custom_tracking << {"result" => trade.result, "variables" => trade["custom_tracking"]} end end end I am omitting some other stuff that is going on, but that is the gist of what I am doing. Then I am calculating stuff in the view layer to produce some results: <% winning_long_trades = @long_trades.reject {|trade| trade <= 0 } %> <% winning_short_trades = @short_trades.reject {|trade| trade <= 0 } %> <ul> <li>Total Trades: <%= @trades.count %></li> <li>Winners: <%= @winning_trades.size %></li> <li>Biggest Winner (Pips): <%= @winning_trades.max %></li> <li>Average Win(Pips): <%= @winning_trades.sum/@winning_trades.size %></li> <li>Losers: <%= @losing_trades.size %></li> <li>Biggest Loser (Pips): <%= @losing_trades.min %></li> <li>Average Loss(Pips): <%= @losing_trades.sum/@losing_trades.size %></li> <li>Breakeven Trades: <%= @breakeven_trades.size %></li> <li>Long Trades: <%= @long_trades.size %></li> <li>Winning Long Trades: <%= winning_long_trades.size %></li> <li>Short Trades: <%= @short_trades.size %></li> <li>Winning Short Trades: <%= winning_short_trades.size %></li> <li>Total Pips: <%= @winning_trades.sum + @losing_trades.sum %></li> <li>Win Rate (%): <%= @winning_trades.size/@trades.count.to_f * 100 %></li> </ul> This produces the following results, which aside from a few things is exactly what I want: Total Trades: 134 Winners: 70 Biggest Winner (Pips): 1488 Average Win(Pips): 440 Losers: 58 Biggest Loser (Pips): -516 Average Loss(Pips): -225 Breakeven Trades: 6 Long Trades: 125 Winning Long Trades: 67 Short Trades: 9 Winning Short Trades: 3 Total Pips: 17819 Win Rate (%): 52.23880597014925 What I Am Wondering- Finally The Actual Questions: I am starting to get really skeptical of how well this method will work when a user has 5,000 trades instead of just 134 like in this example. I anticipate most users will only have somewhere under 200 per year, but some users may have a couple thousand trades per year. Probably no more than 5,000 per year. It seems to work ok now, but the page load times are already getting a tad high for my tastes. (About 800ms to generate the page according to rails logs with about a 250ms of that spent in the view layer.) I will end up caching this page I am sure, but I still need the regenerate the page each time a trade is updated and I can't afford to have this be too slow. Sooo..... Is doing something similar here possible with a straight couchdb reduce function? I am assuming handing this off to couch would possibly help with larger data sets. I couldn't figure out how, but I suppose that doesn't mean it isn't possible. If possible, any hints will be helpful. Could I use a list function if a reduce was not available due to reduce constraints? Are couchdb list functions suitable for this type of calculations? Anyone have any idea of whether or not list functions perform well? Any hints what one would look like for the type of calculations I am trying to achieve? I thought about other options such as running the calculations at the time each trade was saved or nightly if I had to and saving the results to a statistics doc that I could then query so that all the processing was done ahead of time. I would like this to be the last resort because then I can't really filter out trades by time periods dynamically like I would really like to. (I want to have a slider that a user can slide to only show trades from that time period using the startkey and endkey in couchdb if I can.) If I should continue running the calculations inside the rails app at the time of the page view, what can I do to improve my current implementation. I am new to rails, couch and programming in general. I am sure that I could be doing something better here. Do I need to create an array for each stat or is there a better way to do that. I guess I just would really like some advice on how to tackle this problem. I want to keep the page generation time minimal since I anticipate these being some of the highest trafficked pages. My gut is that I will need to offload the statistics calculation to either couch or run the stats in advance of when they are called, but I am not sure. Lastly: Like I mentioned above, one of the primary reasons for using couch is to allow users to define their own things to track per trade. Getting the data into couch is no problem, but how would I be able to take the custom_tracking array and find how many winning trades for each named tracking attribute. If anyone can give me any hints to the possibility of doing this that would be great. Thanks a bunch. Would really appreciate any help. Willing to fork out some $$$ if someone wants to take on the problem for me. (Don't know if that is allowed on stack overflow or not.)

Read the article
C#/.NET Little Wonders: The Useful But Overlooked Sets

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. Today we will be looking at two set implementations in the System.Collections.Generic namespace: HashSet<T> and SortedSet<T>. Even though most people think of sets as mathematical constructs, they are actually very useful classes that can be used to help make your application more performant if used appropriately. A Background From Math In mathematical terms, a set is an unordered collection of unique items. In other words, the set {2,3,5} is identical to the set {3,5,2}. In addition, the set {2, 2, 4, 1} would be invalid because it would have a duplicate item (2). In addition, you can perform set arithmetic on sets such as: Intersections: The intersection of two sets is the collection of elements common to both. Example: The intersection of {1,2,5} and {2,4,9} is the set {2}. Unions: The union of two sets is the collection of unique items present in either or both set. Example: The union of {1,2,5} and {2,4,9} is {1,2,4,5,9}. Differences: The difference of two sets is the removal of all items from the first set that are common between the sets. Example: The difference of {1,2,5} and {2,4,9} is {1,5}. Supersets: One set is a superset of a second set if it contains all elements that are in the second set. Example: The set {1,2,5} is a superset of {1,5}. Subsets: One set is a subset of a second set if all the elements of that set are contained in the first set. Example: The set {1,5} is a subset of {1,2,5}. If We’re Not Doing Math, Why Do We Care? Now, you may be thinking: why bother with the set classes in C# if you have no need for mathematical set manipulation? The answer is simple: they are extremely efficient ways to determine ownership in a collection. For example, let’s say you are designing an order system that tracks the price of a particular equity, and once it reaches a certain point will trigger an order. Now, since there’s tens of thousands of equities on the markets, you don’t want to track market data for every ticker as that would be a waste of time and processing power for symbols you don’t have orders for. Thus, we just want to subscribe to the stock symbol for an equity order only if it is a symbol we are not already subscribed to. Every time a new order comes in, we will check the list of subscriptions to see if the new order’s stock symbol is in that list. If it is, great, we already have that market data feed! If not, then and only then should we subscribe to the feed for that symbol. So far so good, we have a collection of symbols and we want to see if a symbol is present in that collection and if not, add it. This really is the essence of set processing, but for the sake of comparison, let’s say you do a list instead: 1: // class that handles are order processing service 2: public sealed class OrderProcessor 3: { 4: // contains list of all symbols we are currently subscribed to 5: private readonly List<string> _subscriptions = new List<string>(); 6: 7: ... 8: } Now whenever you are adding a new order, it would look something like: 1: public PlaceOrderResponse PlaceOrder(Order newOrder) 2: { 3: // do some validation, of course... 4: 5: // check to see if already subscribed, if not add a subscription 6: if (!_subscriptions.Contains(newOrder.Symbol)) 7: { 8: // add the symbol to the list 9: _subscriptions.Add(newOrder.Symbol); 10: 11: // do whatever magic is needed to start a subscription for the symbol 12: } 13: 14: // place the order logic! 15: } What’s wrong with this? In short: performance! Finding an item inside a List<T> is a linear - O(n) – operation, which is not a very performant way to find if an item exists in a collection. (I used to teach algorithms and data structures in my spare time at a local university, and when you began talking about big-O notation you could immediately begin to see eyes glossing over as if it was pure, useless theory that would not apply in the real world, but I did and still do believe it is something worth understanding well to make the best choices in computer science). Let’s think about this: a linear operation means that as the number of items increases, the time that it takes to perform the operation tends to increase in a linear fashion. Put crudely, this means if you double the collection size, you might expect the operation to take something like the order of twice as long. Linear operations tend to be bad for performance because they mean that to perform some operation on a collection, you must potentially “visit” every item in the collection. Consider finding an item in a List<T>: if you want to see if the list has an item, you must potentially check every item in the list before you find it or determine it’s not found. Now, we could of course sort our list and then perform a binary search on it, but sorting is typically a linear-logarithmic complexity – O(n * log n) - and could involve temporary storage. So performing a sort after each add would probably add more time. As an alternative, we could use a SortedList<TKey, TValue> which sorts the list on every Add(), but this has a similar level of complexity to move the items and also requires a key and value, and in our case the key is the value. This is why sets tend to be the best choice for this type of processing: they don’t rely on separate keys and values for ordering – so they save space – and they typically don’t care about ordering – so they tend to be extremely performant. The .NET BCL (Base Class Library) has had the HashSet<T> since .NET 3.5, but at that time it did not implement the ISet<T> interface. As of .NET 4.0, HashSet<T> implements ISet<T> and a new set, the SortedSet<T> was added that gives you a set with ordering. HashSet<T> – For Unordered Storage of Sets When used right, HashSet<T> is a beautiful collection, you can think of it as a simplified Dictionary<T,T>. That is, a Dictionary where the TKey and TValue refer to the same object. This is really an oversimplification, but logically it makes sense. I’ve actually seen people code a Dictionary<T,T> where they store the same thing in the key and the value, and that’s just inefficient because of the extra storage to hold both the key and the value. As it’s name implies, the HashSet<T> uses a hashing algorithm to find the items in the set, which means it does take up some additional space, but it has lightning fast lookups! Compare the times below between HashSet<T> and List<T>: Operation HashSet<T> List<T> Add() O(1) O(1) at end O(n) in middle Remove() O(1) O(n) Contains() O(1) O(n) Now, these times are amortized and represent the typical case. In the very worst case, the operations could be linear if they involve a resizing of the collection – but this is true for both the List and HashSet so that’s a less of an issue when comparing the two. The key thing to note is that in the general case, HashSet is constant time for adds, removes, and contains! This means that no matter how large the collection is, it takes roughly the exact same amount of time to find an item or determine if it’s not in the collection. Compare this to the List where almost any add or remove must rearrange potentially all the elements! And to find an item in the list (if unsorted) you must search every item in the List. So as you can see, if you want to create an unordered collection and have very fast lookup and manipulation, the HashSet is a great collection. And since HashSet<T> implements ICollection<T> and IEnumerable<T>, it supports nearly all the same basic operations as the List<T> and can use the System.Linq extension methods as well. All we have to do to switch from a List<T> to a HashSet<T> is change our declaration. Since List and HashSet support many of the same members, chances are we won’t need to change much else. 1: public sealed class OrderProcessor 2: { 3: private readonly HashSet<string> _subscriptions = new HashSet<string>(); 4: 5: // ... 6: 7: public PlaceOrderResponse PlaceOrder(Order newOrder) 8: { 9: // do some validation, of course... 10: 11: // check to see if already subscribed, if not add a subscription 12: if (!_subscriptions.Contains(newOrder.Symbol)) 13: { 14: // add the symbol to the list 15: _subscriptions.Add(newOrder.Symbol); 16: 17: // do whatever magic is needed to start a subscription for the symbol 18: } 19: 20: // place the order logic! 21: } 22: 23: // ... 24: } 25: Notice, we didn’t change any code other than the declaration for _subscriptions to be a HashSet<T>. Thus, we can pick up the performance improvements in this case with minimal code changes. SortedSet<T> – Ordered Storage of Sets Just like HashSet<T> is logically similar to Dictionary<T,T>, the SortedSet<T> is logically similar to the SortedDictionary<T,T>. The SortedSet can be used when you want to do set operations on a collection, but you want to maintain that collection in sorted order. Now, this is not necessarily mathematically relevant, but if your collection needs do include order, this is the set to use. So the SortedSet seems to be implemented as a binary tree (possibly a red-black tree) internally. Since binary trees are dynamic structures and non-contiguous (unlike List and SortedList) this means that inserts and deletes do not involve rearranging elements, or changing the linking of the nodes. There is some overhead in keeping the nodes in order, but it is much smaller than a contiguous storage collection like a List<T>. Let’s compare the three: Operation HashSet<T> SortedSet<T> List<T> Add() O(1) O(log n) O(1) at end O(n) in middle Remove() O(1) O(log n) O(n) Contains() O(1) O(log n) O(n) The MSDN documentation seems to indicate that operations on SortedSet are O(1), but this seems to be inconsistent with its implementation and seems to be a documentation error. There’s actually a separate MSDN document (here) on SortedSet that indicates that it is, in fact, logarithmic in complexity. Let’s put it in layman’s terms: logarithmic means you can double the collection size and typically you only add a single extra “visit” to an item in the collection. Take that in contrast to List<T>’s linear operation where if you double the size of the collection you double the “visits” to items in the collection. This is very good performance! It’s still not as performant as HashSet<T> where it always just visits one item (amortized), but for the addition of sorting this is a good thing. Consider the following table, now this is just illustrative data of the relative complexities, but it’s enough to get the point: Collection Size O(1) Visits O(log n) Visits O(n) Visits 1 1 1 1 10 1 4 10 100 1 7 100 1000 1 10 1000 Notice that the logarithmic – O(log n) – visit count goes up very slowly compare to the linear – O(n) – visit count. This is because since the list is sorted, it can do one check in the middle of the list, determine which half of the collection the data is in, and discard the other half (binary search). So, if you need your set to be sorted, you can use the SortedSet<T> just like the HashSet<T> and gain sorting for a small performance hit, but it’s still faster than a List<T>. Unique Set Operations Now, if you do want to perform more set-like operations, both implementations of ISet<T> support the following, which play back towards the mathematical set operations described before: IntersectWith() – Performs the set intersection of two sets. Modifies the current set so that it only contains elements also in the second set. UnionWith() – Performs a set union of two sets. Modifies the current set so it contains all elements present both in the current set and the second set. ExceptWith() – Performs a set difference of two sets. Modifies the current set so that it removes all elements present in the second set. IsSupersetOf() – Checks if the current set is a superset of the second set. IsSubsetOf() – Checks if the current set is a subset of the second set. For more information on the set operations themselves, see the MSDN description of ISet<T> (here). What Sets Don’t Do Don’t get me wrong, sets are not silver bullets. You don’t really want to use a set when you want separate key to value lookups, that’s what the IDictionary implementations are best for. Also sets don’t store temporal add-order. That is, if you are adding items to the end of a list all the time, your list is ordered in terms of when items were added to it. This is something the sets don’t do naturally (though you could use a SortedSet with an IComparer with a DateTime but that’s overkill) but List<T> can. Also, List<T> allows indexing which is a blazingly fast way to iterate through items in the collection. Iterating over all the items in a List<T> is generally much, much faster than iterating over a set. Summary Sets are an excellent tool for maintaining a lookup table where the item is both the key and the value. In addition, if you have need for the mathematical set operations, the C# sets support those as well. The HashSet<T> is the set of choice if you want the fastest possible lookups but don’t care about order. In contrast the SortedSet<T> will give you a sorted collection at a slight reduction in performance. Technorati Tags: C#,.Net,Little Wonders,BlackRabbitCoder,ISet,HashSet,SortedSet

Read the article

Search Results

Search found 3047 results on 122 pages for 'subset sum'.

Page 48/122 | < Previous Page | 44 45 46 47 48 49 50 51 52 53 54 55 | Next Page >

- by Alastair Pitts

- by Philip Brocoum

- by Hao

- by Salim Fadhley

- by Dmitriy Matveev

- by ajhall

- by monksy

- by Harry Len

- by Vivi

- by Ritsaert Hornstra

- by asldkncvas

- by Dave Griffiths

- by user324887

- by Danny

- by Dan

- by Stephen C

- by Sean

- by Nathan VanHoudnos

- by Cannonade

- by Elliott

- by sandeepan

- by sandeepan

- by Vickey

- by James

- by James Michael Hare

< Previous Page | 44 45 46 47 48 49 50 51 52 53 54 55 | Next Page >