Search Results

Search found 2210 results on 89 pages for 'sum'.

Page 78/89 | < Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85  | Next Page >

  • Performing a SVD on tweets. Memory problem

    - by plotti
    I have generated a huge csv file as an output from my pos tagging and stemming. It looks like this: word1, word2, word3, ..., word14400 person1 1 2 0 1 person2 0 0 1 0 ... person650 It contains the word counts for each person. Like this I am getting characteristic vectors for each person. I want to run a SVD on this beast, but it seems the matrix is too big to be held in memory to perform the operation. My quesion is: should i reduce the column size by removing words which have a column sum of for example 1, which means that they have been used only once. Do I bias the data too much with this attempt? I tried the rapidminer attempt, by loading the csv into the db. and then sequentially reading it in with batches for processing, like rapidminer proposes. But Mysql can't store that many columns in a table. If i transpose the data, and then retranspose it on import it also takes ages.... -- So in general I am asking for advice how to perform a svd on such a corpus.

    Read the article

  • Can C++ do something like an ML case expression?

    - by Nathan Andrew Mullenax
    So, I've run into this sort of thing a few times in C++ where I'd really like to write something like case (a,b,c,d) of (true, true, _, _ ) => expr | (false, true, _, false) => expr | ... But in C++, I invariably end up with something like this: bool c11 = color1.count(e.first)>0; bool c21 = color2.count(e.first)>0; bool c12 = color1.count(e.second)>0; bool c22 = color2.count(e.second)>0; // no vertex in this edge is colored // requeue if( !(c11||c21||c12||c22) ) { edges.push(e); } // endpoints already same color // failure condition else if( (c11&&c12)||(c21&&c22) ) { results.push_back("NOT BICOLORABLE."); return true; } // nothing to do: nodes are already // colored and different from one another else if( (c11&&c22)||(c21&&c12) ) { } // first is c1, second is not set else if( c11 && !(c12||c22) ) { color2.insert( e.second ); } // first is c2, second is not set else if( c21 && !(c12||c22) ) { color1.insert( e.second ); } // first is not set, second is c1 else if( !(c11||c21) && c12 ) { color2.insert( e.first ); } // first is not set, second is c2 else if( !(c11||c21) && c22 ) { color1.insert( e.first ); } else { std::cout << "Something went wrong.\n"; } I'm wondering if there's any way to clean all of those if's and else's up, as it seems especially error prone. It would be even better if it were possible to get the compiler complain like SML does when a case expression (or statement in C++) isn't exhaustive. I realize this question is a bit vague. Maybe, in sum, how would one represent an exhaustive truth table with an arbitrary number of variables in C++ succinctly? Thanks in advance.

    Read the article

  • Typical Hadoop setup for remote job submission

    - by Artii
    So I am still a bit new to hadoop and am currently in the process of setting up a small test cluster on Amazonaws. So my question relates to some tips on the structuring of the cluster so it is possible to work submit jobs from remote machines. Currently I have 5 machines. 4 are basically the Hadoop cluster with the NameNodes, Yarn etc. One machine is used as a manager machine( Cloudera Manager). I am gonna describe my thinking process on the setup and if anyone can chime in the points I am not clear with, that would be great. I was thinking what was the best setup for a small cluster. So I decided to expose only one manager machine and probably use that to submit all the jobs through it. The other machines will see each other etc, but not be accessible from the outside world. I am have conceptual idea on how to do this,but I am not sure how to properly go about doing this though, if anyone could point me in the right direction that would great. Also another big point is, I want to be able to submit jobs to the cluster through exposed machine from a client machine (might be Windows). I am not so clear on this setup as well. Do I need to have Hadoop installed on the machine in order to use the normal hadoop commands, and to write/submit jobs say from Eclipse or something similar. So to sum it up my questions are, Is this an ok setup for a small test cluster How can I go about using one exposed machine to submit/route jobs to the cluster, without having any of the Hadoop nodes on it. How do I setup a client machine to submit jobs to a remote cluster, and an example on how to do it on Windows. Also if there are any reason not to use Windows as a client machine in this setup. Thanks I would greatly appreciate any advice or help on this.

    Read the article

  • How to speed-up python nested loop?

    - by erich
    I'm performing a nested loop in python that is included below. This serves as a basic way of searching through existing financial time series and looking for periods in the time series that match certain characteristics. In this case there are two separate, equally sized, arrays representing the 'close' (i.e. the price of an asset) and the 'volume' (i.e. the amount of the asset that was exchanged over the period). For each period in time I would like to look forward at all future intervals with lengths between 1 and INTERVAL_LENGTH and see if any of those intervals have characteristics that match my search (in this case the ratio of the close values is greater than 1.0001 and less than 1.5 and the summed volume is greater than 100). My understanding is that one of the major reasons for the speedup when using NumPy is that the interpreter doesn't need to type-check the operands each time it evaluates something so long as you're operating on the array as a whole (e.g. numpy_array * 2), but obviously the code below is not taking advantage of that. Is there a way to replace the internal loop with some kind of window function which could result in a speedup, or any other way using numpy/scipy to speed this up substantially in native python? Alternatively, is there a better way to do this in general (e.g. will it be much faster to write this loop in C++ and use weave)? ARRAY_LENGTH = 500000 INTERVAL_LENGTH = 15 close = np.array( xrange(ARRAY_LENGTH) ) volume = np.array( xrange(ARRAY_LENGTH) ) close, volume = close.astype('float64'), volume.astype('float64') results = [] for i in xrange(len(close) - INTERVAL_LENGTH): for j in xrange(i+1, i+INTERVAL_LENGTH): ret = close[j] / close[i] vol = sum( volume[i+1:j+1] ) if ret > 1.0001 and ret < 1.5 and vol > 100: results.append( [i, j, ret, vol] ) print results

    Read the article

  • How to track auto-generated id's in select-insert statement

    - by k rey
    I have two tables detail and head. The detail table will be written first. Later, the head table will be written. The head is a summary of the detail table. I would like to keep a reference from the detail to the head table. I have a solution but it is not elegant and requires duplicating the joins and filters that were used during summation. I am looking for a better solution. The below is an example of what I currently have. In this example, I have simplified the table structure. In the real world, the summation is very complex. -- Preparation create table #detail ( detail_id int identity(1,1) , code char(4) , amount money , head_id int null ); create table #head ( head_id int identity(1,1) , code char(4) , subtotal money ); insert into #detail ( code, amount ) values ( 'A', 5 ); insert into #detail ( code, amount ) values ( 'A', 5 ); insert into #detail ( code, amount ) values ( 'B', 2 ); insert into #detail ( code, amount ) values ( 'B', 2 ); -- I would like to somehow simplify the following two queries insert into #head ( code, subtotal ) select code, sum(amount) from #detail group by code update #detail set head_id = h.head_id from #detail d inner join #head h on d.code = h.code -- This is the desired end result select * from #detail Desired end result of detail table: detail_id code amount head_id 1 A 5.00 1 2 A 5.00 1 3 B 2.00 2 4 B 2.00 2

    Read the article

  • Is there a reason why SSIS significantly slows down after a few minutes?

    - by Mark
    I'm running a fairly substantial SSIS package against SQL 2008 - and I'm getting the same results both in my dev environment (Win7-x64 + SQL-x64-Developer) and the production environment (Server 2008 x64 + SQL Std x64). The symptom is that initial data loading screams at between 50K - 500K records per second, but after a few minutes the speed drops off dramatically and eventually crawls embarrasingly slowly. The database is in Simple recovery model, the target tables are empty, and all of the prerequisites for minimally logged bulk inserts are being met. The data flow is a simple load from a RAW input file to a schema-matched table (i.e. no complex transforms of data, no sorting, no lookups, no SCDs, etc.) The problem has the following qualities and resiliences: Problem persists no matter what the target table is. RAM usage is lowish (45%) - there's plenty of spare RAM available for SSIS buffers or SQL Server to use. Perfmon shows buffers are not spooling, disk response times are normal, disk availability is high. CPU usage is low (hovers around 25% shared between sqlserver.exe and DtsDebugHost.exe) Disk activity primarily on TempDB.mdf, but I/O is very low (< 600 Kb/s) OLE DB destination and SQL Server Destination both exhibit this problem. To sum it up, I expect either disk, CPU or RAM to be exhausted before the package slows down, but instead its as if the SSIS package is taking an afternoon nap. SQL server remains responsive to other queries, and I can't find any performance counters or logged events that betray the cause of the problem. I'll gratefully reward any reasonable answers / suggestions.

    Read the article

  • Communication with different social networks, strategy pattern?

    - by bclaessens
    Hi For the last few days I've been thinking how I can solve the following programming problem and find the ideal, flexible programming structure. (note: I'm using Flash as my platform technology but that shouldn't matter since I'm just looking for the ideal design pattern). Our Flash website has multiple situations in which it has to communicate with different social networks (Facebook, Netlog and Skyrock). Now, the communication strategy doesn't have to change multiple times over one "run". The strategy should be picked once (at launch time) for that session. The real problem is the way the communication works between each social network and our website. Some networks force us to ask for a token, others force us to use a webservice, yet another forces us to set up its communication through javascript. The problem becomes more complicated when our website has to run in each network's canvas. Which results in even more (different) ways of communicating. To sum up, our website has to work in the following cases: standalone on the campaign website url (user chooses their favourite network) communicate with netlog OR communicate with facebook OR communicate with skyrock run in a netlog canvas and log in automatically (website checks for netlog parameters) run in a facebook canvas and log in automatically (website checks for facebook params) run in a skyrock canvas and log in automatically (website checks for skyrock params) As you can see, our website needs 6 different ways to communicate with a social network. To be honest, the actual significant difference between all communication strategies is the way they have to connect to their individual network (as stated above in my example). Posting an image, make a comment, ... is the same whether it runs standalone or in the canvas url. WARNING: posting an image, posting a comment DOES differ from network to network. Should I use the strategy pattern and make 6 different communication strategies or is there a better way? An example would be great but isn't required ;) Thanks in advance

    Read the article

  • convert an int to list of individual digitals more faster?

    - by user478514
    All, I want define an int(987654321) <= [9, 8, 7, 6, 5, 4, 3, 2, 1] convertor, if the length of int number < 9, for example 10 the list will be [0,0,0,0,0,0,0,1,0] , and if the length 9, for example 9987654321 , the list will be [9, 9, 8, 7, 6, 5, 4, 3, 2, 1] >>> i 987654321 >>> l [9, 8, 7, 6, 5, 4, 3, 2, 1] >>> z = [0]*(len(unit) - len(str(l))) >>> z.extend(l) >>> l = z >>> unit [100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1] >>> sum([x*y for x,y in zip(l, unit)]) 987654321 >>> int("".join([str(x) for x in l])) 987654321 >>> l1 = [int(x) for x in str(i)] >>> z = [0]*(len(unit) - len(str(l1))) >>> z.extend(l1) >>> l1 = z >>> l1 [9, 8, 7, 6, 5, 4, 3, 2, 1] >>> a = [i//x for x in unit] >>> b = [a[x] - a[x-1]*10 for x in range(9)] >>> if len(b) = len(a): b[0] = a[0] # fix the a[-1] issue >>> b [9, 8, 7, 6, 5, 4, 3, 2, 1] I tested above solutions but found those may not faster/simple enough than I want and may have a length related bug inside, anyone may share me a better solution for this kinds convertion? Thanks!

    Read the article

  • how to use a PHP Constant that gets pulled from a database

    - by Ronedog
    Can you read out the name of a PHP constant from a database and use it inside of a php variable, to display the value of the constant for use in a menu? For example here's what I'm trying to accomplish In SQL: select menu_name AS php_CONSTANT where menu_id=1 the value returned would be L_HOME which is the name of a CONSTANT in a php config page. The php config page looks like this define('L_HOME','Home'); and gets loaded before the database call. The php usage would be $db_returned_constant which has a value of L_HOME that came from the db call, then I would place this into a string such as $string = '<ul><li>' . $db_returned_constant . '</li></ul>' and thus return a string that looks like $string = '<ul><li><a href="#" onclick="path_from_db">Home</a></li></ul>'. To sum up what I'm trying to do Load a config file based on the language preference query the db to return the menu name, which is the name of a CONSTANT in the config file loaded in step one, and also retrieve the menu_link which is used in the "onclick" event. Use a php variable to hold the name of the CONSTANT Place the variable into a string that gets echo'd out to create the menu displaying the value of the CONSTANT. I hope this makes enough sense...is it even possible to use a constant like this? Thanks.

    Read the article

  • Basic question in XSL regarding preceding text

    - by Rachel
    I am new to XSL and i have a basic question on the context of using preceding text. My template match is on the text node. I am iterating over an xml file and within my for loop i am trying to take the preceding text of the text node. Unfortunately preceding::text() is not working if i use it within a for loop. I want to use it within the for loop but how can do it? <xsl:template match="text()"> <xsl:variable name="this" as="text()" select="."/> <xsl:for-each select="$input[@id = generate-id(current())]"> <xsl:variable name="preText" as="xsd:integer" select="sum(preceding::text()[. >> //*[@id=@name]]/string-length(.))"/> ... ... </xsl:for-each> </xsl:template>

    Read the article

  • Very simple python functions takes spends long time in function and not subfunctions

    - by John Salvatier
    I have spent many hours trying to figure what is going on here. The function 'grad_logp' in the code below is called many times in my program, and cProfile and runsnakerun the visualize the results reveals that the function grad_logp spends about .00004s 'locally' every call not in any functions it calls and the function 'n' spends about .00006s locally every call. Together these two times make up about 30% of program time that I care about. It doesn't seem like this is function overhead as other python functions spend far less time 'locally' and merging 'grad_logp' and 'n' does not make my program faster, but the operations that these two functions do seem rather trivial. Does anyone have any suggestions on what might be happening? Have I done something obviously inefficient? Am I misunderstanding how cProfile works? def grad_logp(self, variable, calculation_set ): p = params(self.p,self.parents) return self.n(variable, self.p) def n (self, variable, p ): gradient = self.gg(variable, p) return np.reshape(gradient, np.shape(variable.value)) def gg(self, variable, p): if variable is self: gradient = self._grad_logps['x']( x = self.value, **p) else: gradient = __builtin__.sum([self._pgradient(variable, parameter, value, p) for parameter, value in self.parents.iteritems()]) return gradient

    Read the article

  • Modify passed, nested dict/list

    - by Gerenuk
    I was thinking of writing a function to normalize some data. A simple approach is def normalize(l, aggregate=sum, norm_by=operator.truediv): aggregated=aggregate(l) for i in range(len(l)): l[i]=norm_by(l[i], aggregated) l=[1,2,3,4] normalize(l) l -> [0.1, 0.2, 0.3, 0.4] However for nested lists and dicts where I want to normalize over an inner index this doesnt work. I mean I'd like to get l=[[1,100],[2,100],[3,100],[4,100]] normalize(l, ?? ) l -> [[0.1,100],[0.2,100],[0.3,100],[0.4,100]] Any ideas how I could implement such a normalize function? Maybe it would be crazy cool to write normalize(l[...][0]) Is it possible to make this work?? Or any other ideas? Also not only lists but also dict could be nested. Hmm... EDIT: I just found out that numpy offers such a syntax (for lists however). Anyone know how I would implement the ellipsis trick myself?

    Read the article

  • java number exceeds long.max_value - how to detect?

    - by jurchiks
    I'm having problems detecting if a sum/multiplication of two numbers exceeds the maximum value of a long integer. Example code: long a = 2 * Long.MAX_VALUE; System.out.println("long.max * smth > long.max... or is it? a=" + a); This gives me -2, while I would expect it to throw a NumberFormatException... Is there a simple way of making this work? Because I have some code that does multiplications in nested IF blocks or additions in a loop and I would hate to add more IFs to each IF or inside the loop. Edit: oh well, it seems that this answer from another question is the most appropriate for what I need: http://stackoverflow.com/a/9057367/540394 I don't want to do boxing/unboxing as it adds unnecassary overhead, and this way is very short, which is a huge plus to me. I'll just write two short functions to do these checks and return the min or max long. Edit2: here's the function for limiting a long to its min/max value according to the answer I linked to above: /** * @param a : one of the two numbers added/multiplied * @param b : the other of the two numbers * @param c : the result of the addition/multiplication * @return the minimum or maximum value of a long integer if addition/multiplication of a and b is less than Long.MIN_VALUE or more than Long.MAX_VALUE */ public static long limitLong(long a, long b, long c) { return (((a > 0) && (b > 0) && (c <= 0)) ? Long.MAX_VALUE : (((a < 0) && (b < 0) && (c >= 0)) ? Long.MIN_VALUE : c)); } Tell me if you think this is wrong.

    Read the article

  • How to optimize a postgreSQL server for a "write once, read many"-type infrastructure ?

    - by mhu
    Greetings, I am working on a piece of software that logs entries (and related tagging) in a PostgreSQL database for storage and retrieval. We never update any data once it has been inserted; we might remove it when the entry gets too old, but this is done at most once a day. Stored entries can be retrieved by users. The insertion of new entries can happen rather fast and regularly, thus the database will commonly hold several millions elements. The tables used are pretty simple : one table for ids, raw content and insertion date; and one table storing tags and their values associated to an id. User search mostly concern tags values, so SELECTs usually consist of JOIN queries on ids on the two tables. To sum it up : 2 tables Lots of INSERT no UPDATE some DELETE, once a day at most some user-generated SELECT with JOIN huge data set What would an optimal server configuration (software and hardware, I assume for example that RAID10 could help) be for my PostgreSQL server, given these requirements ? By optimal, I mean one that allows SELECT queries taking a reasonably little amount of time. I can provide more information about the current setup (like tables, indexes ...) if needed.

    Read the article

  • 2D Histogram in R: Converting from Count to Frequency within a Column

    - by Jac
    Would appreciate help with generating a 2D histogram of frequencies, where frequencies are calculated within a column. My main issue: converting from counts to column based frequency. Here's my starting code: # expected packages library(ggplot2) library(plyr) # generate example data corresponding to expected data input x_data = sample(101:200,10000, replace = TRUE) y_data = sample(1:100,10000, replace = TRUE) my_set = data.frame(x_data,y_data) # define x and y interval cut points x_seq = seq(100,200,10) y_seq = seq(0,100,10) # label samples as belonging within x and y intervals my_set$x_interval = cut(my_set$x_data,x_seq) my_set$y_interval = cut(my_set$y_data,y_seq) # determine count for each x,y block xy_df = ddply(my_set, c("x_interval","y_interval"),"nrow") # still need to convert for use with dplyr # convert from count to frequency based on formula: freq = count/sum(count in given x interval) ################ TRYING TO FIGURE OUT ################# # plot results fig_count <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = nrow)) # count fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = freq)) # frequency I would appreciate any help in how to calculate the frequency within a column. Thanks! jac EDIT: I think the solution will require the following steps 1) Calculate and store overall counts for each x-interval factor 2) Divide the individual bin count by its corresponding x-interval factor count to obtain frequency. Not sure how to carry this out though. .

    Read the article

  • How to reload a tableView i.e call the viewDidLoad method if a condition is met

    - by Kquane Ingram
    The problem is this i need a way to basically erase all the entry data a user placed into my arrays if a condition is met. Im new to Objective-C and iOS programming, but i believed the solution might be in calling the viewDidLoad method, thus it would virtually refresh the applications with the values of the array reset to default. If there is any other logical way of doing this i would appreciate the help. In short i need to refresh the arrays as they were when the application first launched and the user did not select anything. This is the part where i need it to refresh. if ([gradeRecieved objectAtIndex:i]==nil) { break; // if this condition is met the program must begin anew. Edit* I need to recall the - (void)viewDidLoad method here is more of the code. -(IBAction)button:(id)sender{ int i = 0; int sum = 0; int gradeEarned; int creditHours = 3; for ( i=0;i<8 ; i++) { if ([[points objectAtIndex:i] tag]==GradeA.intValue) { [gradeRecieved replaceObjectAtIndex:i withObject:GradeA]; } if ([[points objectAtIndex:i]tag]==GradeB.intValue) { [gradeRecieved replaceObjectAtIndex:i withObject:GradeB]; } if ([[points objectAtIndex:i]tag]==GradeC.intValue){ [gradeRecieved replaceObjectAtIndex:i withObject:GradeC]; } if ([gradeRecieved objectAtIndex:i]==nil) { break; // if this condition is met the program must restart. } } while ( i<[gradeRecieved count]) { if ([gradeRecieved objectAtIndex:i] == GradeA ) { [finArray replaceObjectAtIndex:i withObject:GradeA]; i++; continue; } if ([gradeRecieved objectAtIndex:i] == GradeB ) { [gradeRecieved replaceObjectAtIndex:i withObject:GradeB]; i++; continue; } if ([gradeRecieved objectAtIndex:i] == GradeC ) { [gradeRecieved replaceObjectAtIndex:i withObject:GradeC]; i++; continue; } }

    Read the article

  • Javascript Recursion

    - by rpophessagr
    I have an ajax call and would like to recall it once I finish parsing and animating the result into the page. And that's where I'm getting stuck. I was able to recall the function, but it seems to not take into account the delays in the animation. i.e. The console keeps outputting the values at a wild pace. I thought setInterval might help with the interval being the sum of the length of my delays, but I can't get that to work... function loadEm(){ var result=new Array(); $.getJSON("jsonCall.php",function(results){ $.each(results, function(i, res){ rand = (Math.floor(Math.random()*11)*1000)+2000; fullRand += rand; console.log(fullRand); $("tr:first").delay(rand).queue(function(next) { doStuff(res); next(); }); }); var int=self.setInterval("loadEm()",fullRand); }); } });

    Read the article

  • How to make cycle over cycles in Java?

    - by Roman
    I would like to make a cycle over the following elements: [1,2,11,12,21,22,111,112,121,122,....,222222] or for example [1,2,3,11,12,13,21,22,23,31,32,33,111,112,113,... 333333333] How can I make it in Java? In my particular case I use 4 digits (1,2,3,4) and the length of the last number can be from 1 to 10. I managed to do it in Python and PHP. In the first case I used list over lists. I started from [[1],[2],] then for every element of the list I added 1 and 2, so I got [[1,1],[1,2],[2,1],[2,2]] and so on: nchips = sum(chips) traj = [[]] last = [[]] while len(last[0]) < nchips: newlast = [] for tr in last: for d in [1,2,3,4]: newlast.append(tr + [d]) last = newlast traj += last When I did it in PHP I used number with base 3. But it was a tricky and non elegant solution. for ($i=-1; $i<=$n; $i+=1) { if ($i>-1) { $n5 = base_convert($i,10,5); $n5_str = strval($n5); $tr = array(); $found = 0; for ($j=0; $j<strlen($n5_str); $j+=1) { $k = $n5_str[$j]; if ($k==0) { $found = 1; break; } array_push($tr,$k); } if ($found==1) continue; } else { $tr = array(); } } Can it be done easily in Java?

    Read the article

  • Matlab fft function

    - by CTZStef
    The code below is from the Matlab 2011a help about fft function. I think there is a problem here : why do they multiply t(1:50) by Fs, and then say it's time in millisecond ? Certainly, it happens to be true in this very particular case, but change the value of Fs to, say, 2000, and it won't work anymore, obviously because of this factor of 2. Right ? Quite misleading, isn't it ? What do I miss ? Fs = 1000; % Sampling frequency T = 1/Fs; % Sample time L = 1000; % Length of signal t = (0:L-1)*T; % Time vector % Sum of a 50 Hz sinusoid and a 120 Hz sinusoid x = 0.7*sin(2*pi*50*t) + sin(2*pi*120*t); y = x + 2*randn(size(t)); % Sinusoids plus noise plot(Fs*t(1:50),y(1:50)) title('Signal Corrupted with Zero-Mean Random Noise') xlabel('time (milliseconds)') Clearer with this : fs = 2000; % Sampling frequency T = 1 / fs; % Sample time L = 1000; % Length of signal t2 = (0:L-1)*T; % Time vector f = 50; % signal frequency s2 = sin(2*pi*f*t2); figure, plot(fs*t2(1:50),s2(1:50)); % NOT good figure, plot(t2(1:50),s2(1:50)); % good

    Read the article

  • Can knowing C actually hurt the code you write in higher level languages?

    - by Jurily
    The question seems settled, beaten to death even. Smart people have said smart things on the subject. To be a really good programmer, you need to know C. Or do you? I was enlightened twice this week. The first one made me realize that my assumptions don't go further than my knowledge behind them, and given the complexity of software running on my machine, that's almost non-existent. But what really drove it home was this Slashdot comment: The end result is that I notice the many naive ways in which traditional C "bare metal" programmers assume that higher level languages are implemented. They make bad "optimization" decisions in projects they influence, because they have no idea how a compiler works or how different a good runtime system may be from the naive macro-assembler model they understand. Then it hit me: C is just one more abstraction, like all others. Even the CPU itself is only an abstraction! I've just never seen it break, because I don't have the tools to measure it. I'm confused. Has my mind been mutilated beyond recovery, like Dijkstra said about BASIC? Am I living in a constant state of premature optimization? Is there hope for me, now that I realized I know nothing about anything? Is there anything to know, even? And why is it so fascinating, that everything I've written in the last five years might have been fundamentally wrong? To sum it up: is there any value in knowing more than the API docs tell me? EDIT: Made CW. Of course this also means now you must post examples of the interpreter/runtime optimizing better than we do :)

    Read the article

  • How to check a file saving is complete using Python?

    - by indrajithk
    I am trying to automate a downloading process. In this I want to know, whether a particular file's save is completed or not. The scenario is like this. Open a site address using either Chrome or Firefox (any browser) Save the page to disk using 'Crtl + S' (I work on windows) Now if the page is very big, then it takes few seconds to save. I want to parse the html once the save is complete. Since I don't have control on the browser save functionality, I don't know whether the save has completed or not. One idea I thought, is to get the md5sum of the file using a while loop, and check against the previous one calculated, and continue the while loop till the md5 sum from the previous and current one matches. This doesn't works I guess, as it seems browser first attempts to save the file in a tmp file and then copies the content to the specified file (or just renames the file). Any ideas? I use python for the automation, hence any idea which can be implemented using python is welcome. Thanks Indrajith

    Read the article

  • What is the optimum way to select the most dissimilar individuals from a population?

    - by Aaron D
    I have tried to use k-means clustering to select the most diverse markers in my population, for example, if we want to select 100 lines I cluster the whole population to 100 clusters then select the closest marker to the centroid from each cluster. The problem with my solution is it takes too much time (probably my function needs optimization), especially when the number of markers exceeds 100000. So, I will appreciate it so much if anyone can show me a new way to select markers that maximize diversity in my population and/or help me optimize my function to make it work faster. Thank you # example: library(BLR) data(wheat) dim(X) mdf<-mostdiff(t(X), 100,1,nstart=1000) Here is the mostdiff function that i used: mostdiff <- function(markers, nClust, nMrkPerClust, nstart=1000) { transposedMarkers <- as.array(markers) mrkClust <- kmeans(transposedMarkers, nClust, nstart=nstart) save(mrkClust, file="markerCluster.Rdata") # within clusters, pick the markers that are closest to the cluster centroid # turn the vector of which markers belong to which clusters into a list nClust long # each element of the list is a vector of the markers in that cluster clustersToList <- function(nClust, clusters) { vecOfCluster <- function(whichClust, clusters) { return(which(whichClust == clusters)) } return(apply(as.array(1:nClust), 1, vecOfCluster, clusters)) } pickCloseToCenter <- function(vecOfCluster, whichClust, transposedMarkers, centers, pickHowMany) { clustSize <- length(vecOfCluster) # if there are fewer than three markers, the center is equally distant from all so don't bother if (clustSize < 3) return(vecOfCluster[1:min(pickHowMany, clustSize)]) # figure out the distance (squared) between each marker in the cluster and the cluster center distToCenter <- function(marker, center){ diff <- center - marker return(sum(diff*diff)) } dists <- apply(transposedMarkers[vecOfCluster,], 1, distToCenter, center=centers[whichClust,]) return(vecOfCluster[order(dists)[1:min(pickHowMany, clustSize)]]) } }

    Read the article

  • Update on: How to model random non-overlapping spheres of non-uniform size in a cube using Matlab?

    - by user3838079
    I am trying to use MATLAB for generating random locations for non-uniform size spheres (non-overlapping) in a cube. The for loop in the code below never seems to end. I don't know what am missing in the code. I have ran the code for no. of spheres (n) = 10; dims = [ 10 10 10 ] function [ c r ] = randomSphere( dims ) % creating one sphere at random inside [0..dims(1)]x[0..dims(2)]x... % radius and center coordinates are sampled from a uniform distribution % over the relevant domain. % output: c - center of sphere (vector cx, cy,... ) % r - radius of sphere (scalar) r = rand(1); % you might want to scale this w.r.t dims or other consideration c = r + rand( size(dims) )./( dims - 2*r ); % make sure sphere does not exceed boundaries function ovlp = nonOverlapping( centers, rads ) % check if several spheres with centers and rads overlap or not ovlp = false; if numel( rads ) == 1 return; % nothing to check for a single sphere end dst = sqrt( sum( bsxfun( @minus, permute( centers, [1 3 2] ),... permute( centers, [3 1 2] ) ).^2, 3) ); ovlp = dst >= bsxfun( @plus, rads, rads.' ); %' all distances must be smaller than r1+r2 ovlp = any( ovlp(:) ); % all must not overlap function [centers rads] = sampleSpheres( dims, n ) % dims is assumed to be a row vector of size 1-by-ndim % preallocate ndim = numel(dims); centers = zeros( n, ndim ); rads = zeros( n, 1 ); ii = 1; while ii <= n [centers(ii,:), rads(ii) ] = randomSphere( dims ); if nonOverlapping( centers(1:ii,:), rads(1:ii) ) ii = ii + 1; % accept and move on end end

    Read the article

  • How to sort data in a table data structure in Java?

    - by rgksugan
    I need to sort data based on the third column of the table data structure. I tried based on the answers for the following question. But my sorting does not work. Please help me in this. Here goes my code. Object[] data = new Object[y]; rst.beforeFirst(); while (rst.next()) { int p_id = Integer.parseInt(rst.getString(1)); String sw2 = "select sum(quantity) from tbl_order_detail where product_id=" + p_id; rst1 = stmt1.executeQuery(sw2); rst1.next(); String sw3 = "select max(order_date) from tbl_order where tbl_order.`Order_ID` in (select tbl_order_detail.`Order_ID` from tbl_order_detail where product_id=" + p_id + ")"; rst2 = stmt2.executeQuery(sw3); rst2.next(); data[i] = new Object[]{new String(rst.getString(2)), new String(rst.getString(3)), new Integer(rst1.getString(1)), new String(rst2.getString(1))}; i++; } ColumnComparator cc = new ColumnComparator(2); Arrays.sort(data, cc); if (i == 0) { table.addCell(""); table.addCell(""); table.addCell(""); table.addCell(""); } else { for (int j = 0; j < y; j++) { Object[] theRow = (Object[]) data[j]; table.addCell((String) theRow[0]); table.addCell((String) theRow[1]); table.addCell((String) theRow[2]); table.addCell((String) theRow[3]); }

    Read the article

  • suggestions on syntax to express mathematical formula concisely

    - by aaa
    hello. I am developing functional domain specific embedded language within C++ to translate formulas into working code as concisely and accurately as possible. I post prototype in the comment, it is about 2 hundred lines long. Right now my language looks something like this (well, actually is going to look like): // implies two nested loops j=0:N, i=0,j (range(i) < j < N)[T(i,j) = (T(i,j) - T(j,i))/e(i+j)]; // implies summation over above expression sum(range(i) < j < N))[(T(i,j) - T(j,i))/e(i+j)]; I am looking for possible syntax improvements/extensions or just different ideas about expressing mathematical formulas as clearly and precisely as possible (in any language, not just C++). Can you give me some syntax examples relating to my question which can be accomplished in your language of choice which consider useful. In particular, if you have some ideas about how to translate the above code segments, I would be happy to hear them. Thank you just to clarify and give actual formula, my short-term goal is to express the following expression concisely where values in <> are already computed as 4-dimensional array

    Read the article

< Previous Page | 74 75 76 77 78 79 80 81 82 83 84 85  | Next Page >