Search Results

Search found 2156 results on 87 pages for 'weighted average'.

Page 9/87 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >

  • My server is slower than the average user's computer, should I still offload Access queries to SQL Server? [closed]

    - by andrewb
    Possible Duplicate: How do you do Load Testing and Capacity Planning for Databases I have a database set up with MS Access 2007 front ends and an SQL Server 2005 back end. At the moment, all the queries are saved in the front end as I've only recently moved to an SQL Server backend. I'm wondering how much of those queries I should save as stored procedures/views on SQL Server. About the system The number of concurrent users is only a handful, though it could be as high as 25 at one time (very unlikely). The average computer has an Intel i3-2120 CPU running at 3.3 GHz, which gets a PassMark score of 3,987, whilst the server has an Intel Xeon E5335 running at 2.0 GHz, which gets a PassMark score of 2,637. Always an awkward situation when an i3 outperforms a Xeon... though the i3 is from Q1 2011 and the Xeon is Q2 2009. There is potential for a server upgrade in the future, though it wouldn't come easy. I'm inclined to move the queries to the back end, as they are beginning to take noticeable time and I figure that is a better way of doing things. I like the idea of throwing everything at the server, then pushing for a server upgrade. It makes more sense in my mind to be upgrading one server rather than 30 PCs. Or am I being overzealous? Why my question isn't a duplicate It seems that my question has been misinterpreted and labelled a duplicate of quite a different question, one about testing and capacity planning. I'll try explain how my question is very different from the linked question. The crux of my question is something like "Even though my server is technically slower, is it better to have it doing more of the queries?" There's two ways that people could have answered this: I agree the server is going to be slower, but the extra benefits of such and such (like the less Access the better) means you should move most to the server anyway. (OR no it doesn't outweigh the benefit, keep them in Access) Actually the server will be faster because of such and such. I'm hoping that people out there could provide some answers like this, and the question in the dupe link doesn't really provide either of these answers. Ok sure, I suppose I could do extensive performance testing to compare Access queries running on a local machine to SQL Server queries running on the server, but that sounds like a very hard task (particularly performance testing of access) compared to someone giving some quick general guidance, and again, my question is looking for a lot more than immediate performance benefit.

    Read the article

  • The code works but when using printf it gives me a weird answer. Help please [closed]

    - by user71458
    //Programmer-William Chen //Seventh Period Computer Science II //Problem Statement - First get the elapsed times and the program will find the //split times for the user to see. // //Algorithm- First the programmer makes the prototype and calls them in the //main function. The programmer then asks the user to input lap time data. //Secondly, you convert the splits into seconds and subtract them so you can //find the splits. Then the average is all the lap time's in seconds. Finally, //the programmer printf all the results for the user to see. #include <iostream> #include <stdlib.h> #include <math.h> #include <conio.h> #include <stdio.h> using namespace std; void thisgetsElapsedTimes( int &m1, int &m2, int &m3, int &m4, int &m5, int &s1, int &s2, int &s3, int &s4, int &s5); //this is prototype void thisconvertstoseconds ( int &m1, int &m2, int &m3, int &m4, int &m5, int &s1, int &s2, int &s3, int &s4, int &s5, int &split1, int &split2, int &split3, int &split4, int &split5);//this too void thisfindsSplits(int &m1, int &m2, int &m3, int &m4, int &m5, int &split1, int &split2, int &split3, int &split4, int &split5, int &split6, int &split7, int &split8, int &split9, int &split10);// this is part of prototype void thisisthesecondconversation (int &split1M, int &split2M, int &split3M, int &split4M, int &split5M, int &split1S,int &split2S, int &split3S, int &split4S, int &split5S, int &split1, int &split2, int &split3, int &split4, int &split5);//this gets a value void thisfindstheaverage(double &average, int &split1, int &split2, int &split3, int &split4, int &split5);//and this void thisprintsstuff( int &split1M, int &split2M, int &split3M, int &split4M, int &split5M, int &split1S, int &split2S, int &split3S, int &split4S, int &split5S, double &average); //this prints int main(int argc, char *argv[]) { int m1, m2, m3, m4, m5, s1, s2, s3, s4, s5, split1, split2, split3, split4, split5, split1M, split2M, split3M, split4M, split5M, split1S, split2S, split3S, split4S, split5S; int split6, split7, split8, split9, split10; double average; char thistakescolon; thisgetsElapsedTimes ( m1, m2, m3, m4, m5, s1, s2, s3, s4, s5); thisconvertstoseconds ( m1, m2, m3, m4, m5, s1, s2, s3, s4, s5, split1, split2, split3, split4, split5); thisfindsSplits ( m1, m2, m3, m4, m5, split1, split2, split3, split4, split5, split6, split7, split8, split9, split10); thisisthesecondconversation ( split1M, split2M, split3M, split4M, split5M, split1S, split2S, split3S, split4S, split5S, split1, split2, split3, split4, split5); thisfindstheaverage ( average, split1, split2, split3, split4, split5); thisprintsstuff ( split1M, split2M, split3M, split4M, split5M, split1S, split2S, split3S, split4S, split5S, average); // these are calling statements and they call from the main function to the other functions. system("PAUSE"); return 0; } void thisgetsElapsedTimes(int &m1, int &m2, int &m3, int &m4, int &m5, int &s1, int &s2, int &s3, int &s4, int &s5) { char thistakescolon; cout << "Enter the elapsed time:" << endl; cout << " Kilometer 1 "; cin m1 thistakescolon s1; cout << " Kilometer 2 "; cin m2 thistakescolon s2; cout << " Kilometer 3 " ; cin m3 thistakescolon s3; cout << " Kilometer 4 "; cin m4 thistakescolon s4; cout << " Kilometer 5 "; cin m5 thistakescolon s5; // this gets the data required to get the results needed for the user to see // . } void thisconvertstoseconds (int &m1, int &m2, int &m3, int &m4, int &m5, int &s1, int &s2, int &s3, int &s4, int &s5, int &split1, int &split2, int &split3, int &split4, int &split5) { split1 = (m1 * 60) + s1;//this converts for minutes to seconds for m1 split2 = (m2 * 60) + s2;//this converts for minutes to seconds for m2 split3 = (m3 * 60) + s3;//this converts for minutes to seconds for m3 split4 = (m4 * 60) + s4;//this converts for minutes to seconds for m4 split5 = (m5 * 60) + s5;//this converts for minutes to seconds for m5 } void thisfindsSplits (int &m1, int &m2, int &m3, int &m4, int &m5,int &split1, int &split2, int &split3, int &split4, int &split5, int &split6, int &split7, int &split8, int &split9, int &split10)//this is function heading { split6 = split1; //this is split for the first lap. split7 = split2 - split1;//this is split for the second lap. split8 = split3 - split2;//this is split for the third lap. split9 = split4 - split3;//this is split for the fourth lap. split10 = split5 - split4;//this is split for the fifth lap. } void thisfindstheaverage(double &average, int &split1, int &split2, int &split3, int &split4, int &split5) { average = (split1 + split2 + split3 + split4 + split5)/5; // this finds the average from all the splits in seconds } void thisisthesecondconversation (int &split1M, int &split2M, int &split3M, int &split4M, int &split5M, int &split1S,int &split2S, int &split3S, int &split4S, int &split5S, int &split1, int &split2, int &split3, int &split4, int &split5) { split1M = split1 * 60; //this finds the split times split1S = split1M - split1 * 60; //then this finds split2M = split2 * 60; //and all of this split2S = split2M - split2 * 60; //does basically split3M = split3 * 60; //the same thing split3S = split3M - split3 * 60; //all of it split4M = split4 * 60; //it's also a split4S = split4M - split4 * 60; //function split5M = split5 * 60; //and it finds the splits split5S = split5M - split5 * 60; //for each lap. } void thisprintsstuff (int &split1M, int &split2M, int &split3M, int &split4M, int &split5M, int &split1S, int &split2S, int &split3S, int &split4S, int &split5S, double &average)// this is function heading { printf("\n kilometer 1 %d" , ":02%d",'split1M','split1S'); printf("\n kilometer 2 %d" , ":02%d",'split2M','split2S'); printf("\n kilometer 3 %d" , ":02%d",'split3M','split3S'); printf("\n kilometer 4 %d" , ":02%d",'split4M','split4S'); printf("\n kilometer 5 %d" , ":02%d",'split5M','split5S'); printf("\n your average pace is ",'average',"per kilometer \n", "William Chen\n"); // this printf so the programmer // can allow the user to see // the results from the data gathered. }

    Read the article

  • Essbase BSO Data Fragmentation

    - by Ann Donahue
    Essbase BSO Data Fragmentation Data fragmentation naturally occurs in Essbase Block Storage (BSO) databases where there are a lot of end user data updates, incremental data loads, many lock and send, and/or many calculations executed.  If an Essbase database starts to experience performance slow-downs, this is an indication that there may be too much fragmentation.  See Chapter 54 Improving Essbase Performance in the Essbase DBA Guide for more details on measuring and eliminating fragmentation: http://docs.oracle.com/cd/E17236_01/epm.1112/esb_dbag/daprcset.html Fragmentation is likely to occur in the following situations: Read/write databases that users are constantly updating data Databases that execute calculations around the clock Databases that frequently update and recalculate dense members Data loads that are poorly designed Databases that contain a significant number of Dynamic Calc and Store members Databases that use an isolation level of uncommitted access with commit block set to zero There are two types of data block fragmentation Free space tracking, which is measured using the Average Fragmentation Quotient statistic. Block order on disk, which is measured using the Average Cluster Ratio statistic. Average Fragmentation Quotient The Average Fragmentation Quotient ratio measures free space in a given database.  As you update and calculate data, empty spaces occur when a block can no longer fit in its original space and will either append at the end of the file or fit in another empty space that is large enough.  These empty spaces take up space in the .PAG files.  The higher the number the more empty spaces you have, therefore, the bigger the .PAG file and the longer it takes to traverse through the .PAG file to get to a particular record.  An Average Fragmentation Quotient value of 3.174765 means the database is 3% fragmented with free space. Average Cluster Ratio Average Cluster Ratio describes the order the blocks actually exist in the database. An Average Cluster Ratio number of 1 means all the blocks are ordered in the correct sequence in the order of the Outline.  As you load data and calculate data blocks, the sequence can start to be out of order.  This is because when you write to a block it may not be able to place back in the exact same spot in the database that it existed before.  The lower this number the more out of order it becomes and the more it affects performance.  An Average Cluster Ratio value of 1 means no fragmentation.  Any value lower than 1 i.e. 0.01032828 means the data blocks are getting further out of order from the outline order. Eliminating Data Block Fragmentation Both types of data block fragmentation can be removed by doing a dense restructure or export/clear/import of the data.  There are two types of dense restructure: 1. Implicit Restructures Implicit dense restructure happens when outline changes are done using EAS Outline Editor or Dimension Build. Essbase restructures create new .PAG files restructuring the data blocks in the .PAG files. When Essbase restructures the data blocks, it regenerates the index automatically so that index entries point to the new data blocks. Empty blocks are NOT removed with implicit restructures. 2. Explicit Restructures Explicit dense restructure happens when a manual initiation of the database restructure is executed. An explicit dense restructure is a full restructure which comprises of a dense restructure as outlined above plus the removal of empty blocks Empty Blocks vs. Fragmentation The existence of empty blocks is not considered fragmentation.  Empty blocks can be created through calc scripts or formulas.  An empty block will add to an existing database block count and will be included in the block counts of the database properties.  There are no statistics for empty blocks.  The only way to determine if empty blocks exist in an Essbase database is to record your current block count, export the entire database, clear the database then import the exported data.  If the block count decreased, the difference is the number of empty blocks that had existed in the database.

    Read the article

  • Great Programmer Productivity - Accounting for 10,000 fold difference?

    - by TheImpact
    "A great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer." - Bill Gates Say there's a "great" software engineer and an "average" software engineer on the same team. How can you account for one engineer being 10,000 times more productive? I can't quite fathom this, given they're both taking on their share of features, bugs and investigations, and consistently deliver with quality. Would my description possibly justify them to be above "average"? "great"? In a corporation like Microsoft, what % of software engineers are "average"? What % "great"?

    Read the article

  • Beginning Java (Working with Arrays; Class Assignment)

    - by Jason
    I am to the point where I feel as if I correctly wrote the code for this homework assignment. We were given a skeleton and 2 classes that we had to import (FileIOHelper and Student). /* * Created: *** put the date here *** * * Author: *** put your name here *** * * The program will read information about students and their * scores from a file, and output the name of each student with * all his/her scores and the total score, plus the average score * of the class, and the name and total score of the students with * the highest and lowest total score. */ // import java.util.Scanner; import java.io.*; // C:\Users\Adam\info.txt public class Lab6 { public static void main(String[] args) throws IOException { // Fill in the body according to the following comments Scanner key boardFile = new Scanner(System.in); // Input file name String filename = getFileName(keyboardFile); //Open the file // Input number of students int numStudents = FileIOHelper.getNumberOfStudents(filename); Student students[] = new Student[numStudents]; // Input all student records and create Student array and // integer array for total scores int totalScore[] = new int[students.length]; for (int i = 0; i < students.length; i++){ for(int j = 1; j < 4; j++){ totalScore[i] = totalScore[i] + students[i].getScore(j); } } // Compute total scores and find students with lowest and // highest total score int maxScore = 0; int minScore = 0; for(int i = 0; i < students.length; i++){ if(totalScore[i] >= totalScore[maxScore]){ maxScore = i; } else if(totalScore[i] <= totalScore[minScore]){ minScore = i; } } // Compute average total score int allScores = 0; int average = 0; for (int i = 0; i < totalScore.length; i++){ allScores = allScores + totalScore[i]; } average = allScores / totalScore.length; // Output results outputResults(students, totalScore, maxScore, minScore, average); } // Given a Scanner in, this method prompts the user to enter // a file name, inputs it, and returns it. private static String getFileName(Scanner in) { // Fill in the body System.out.print("Enter the name of a file: "); String filename = in.next(); return filename; // Do not declare the Scanner variable in this method. // You must use the value this method receives in the // argument (in). } // Given the number of students records n to input, this // method creates an array of Student of the appropriate size, // reads n student records using the FileIOHelper, and stores // them in the array, and finally returns the Student array. private static Student[] getStudents(int n) { Student[] myStudents = new Student[n]; for(int i = 0; i <= n; i++){ myStudents[i] = FileIOHelper.getNextStudent(); } return myStudents; } // Given an array of Student records, an array with the total scores, // the indices in the arrays of the students with the highest and // lowest total scores, and the average total score for the class, // this method outputs a table of all the students appropriately // formatted, plus the total number of students, the average score // of the class, and the name and total score of the students with // the highest and lowest total score. private static void outputResults( Student[] students, int[] totalScores, int maxIndex, int minIndex, int average ) { // Fill in the body System.out.println("\nName \t\tScore1 \tScore2 \tScore3 \tTotal"); System.out.println("--------------------------------------------------------"); for(int i = 0; i < students.length; i++){ outputStudent(students[i], totalScores[i], average); System.out.println(); } System.out.println("--------------------------------------------------------"); outputNumberOfStudents(students.length); outputAverage(average); outputMaxStudent(students[maxIndex], totalScores[maxIndex]); outputMinStudent(students[minIndex], totalScores[minIndex]); System.out.println("--------------------------------------------------------"); } // Given a Student record, the total score for the student, // and the average total score for all the students, this method // outputs one line in the result table appropriately formatted. private static void outputStudent(Student s, int total, int avg) { System.out.print(s.getName() + "\t"); for(int i = 1; i < 4; i++){ System.out.print(s.getScore(i) + "\t"); } System.out.print(total + "\t"); if(total < avg){ System.out.print("-"); }else if(total > avg){ System.out.print("+"); }else{ System.out.print("="); } } // Given the number of students, this method outputs a message // stating what the total number of students in the class is. private static void outputNumberOfStudents(int n) { System.out.println("The total number of students in this class is: \t" + n); } // Given the average total score of all students, this method // outputs a message stating what the average total score of // the class is. private static void outputAverage(int average) { System.out.println("The average total score of the class is: \t" + average); } // Given the Student with highest total score and the student's // total score, this method outputs a message stating the name // of the student and the highest score. private static void outputMaxStudent( Student student, int score ) { System.out.println(student.getName() + " got the maximum total score of: \t" + score); } // Given the Student with lowest total score and the student's // total score, this method outputs a message stating the name // of the student and the lowest score. private static void outputMinStudent( Student student, int score ) { System.out.println(student.getName() + " got the minimum total score of: \t" + score); } } But now I get an error at the line totalScore[i] = totalScore[i] + students[i].getScore(j); Exception in thread "main" java.lang.NullPointerException at Lab6.main(Lab6.java:42)

    Read the article

  • 10 lines of code per day is the global average!? -- true?

    - by Earlz
    Ok so last year I participated in a high school curriculum contest thing at a college(I currently attend this college). I actually got 1st in it but was still a bit angry I didn't get every single one right. The most baffling of questions on there was How many lines of code does the average programmer write per day? A. 5 B. 10 C. 25 D. 30 Aside from being a subjective question which depended on language and everything else I was more baffled at what they had as the correct answer. 10. Even on my bad days at my job I touch more than 10 lines of code(either adding, modifying, or deleting) per day. And when I took this test I had only programmed as a hobby where it was common for me to write a few hundred lines for one of my new projects per day. Where are they getting this random number of ten!? Is this published somewhere? A quick googling found me nothing.

    Read the article

  • How do I sum up weighted arrays in PHP?

    - by christian studer
    Hod do I multiply the values of a multi-dimensional array with weigths and sum up the results into a new array in PHP or in general? The boring way looks like this: $weights = array(0.25, 0.4, 0.2, 0.15); $values = array ( array(5,10,15), array(20,25,30), array(35,40,45), array(50,55,60) ); $result = array(); for($i = 0; $i < count($values[0]); ++$i) { $result[$i] = 0; foreach($weights as $index => $thisWeight) $result[$i] += $thisWeight * $values[$index][$i]; } Is there a more elegant solution?

    Read the article

  • Does Python/Scipy have a firls( ) replacement (i.e. a weighted, least squares, FIR filter design)?

    - by delicasso
    I am porting code from Matlab to Python and am having trouble finding a replacement for the firls( ) routine. It is used for, least-squares linear-phase Finite Impulse Response (FIR) filter design. I looked at scipy.signal and nothing there looked like it would do the trick. Of course I was able to replace my remez and freqz algorithsm, so that's good. On one blog I found an algorithm that implemented this filter without weighting, but I need one with weights. Thanks, David

    Read the article

  • How to correctly do SQL UPDATE with weighted subselect?

    - by luminarious
    I am probably trying to accomplish too much in a single query, but have I an sqlite database with badly formatted recipes. This returns a sorted list of recipes with relevance added: SELECT *, sum(relevance) FROM ( SELECT *,1 AS relevance FROM recipes WHERE ingredients LIKE '%milk%' UNION ALL SELECT *,1 AS relevance FROM recipes WHERE ingredients LIKE '%flour%' UNION ALL SELECT *,1 AS relevance FROM recipes WHERE ingredients LIKE '%sugar%' ) results GROUP BY recipeID ORDER BY sum(relevance) DESC; But I'm now stuck with a special case where I need to write the relevance value to a field on the same row as the recipe. I figured something along these lines: UPDATE recipes SET relevance=(SELECT sum(relevance) ...) But I have not been able to get this working yet. I will keep trying, but meanwhile please let me know how you would approach this?

    Read the article

  • How can I compute the average cost for this solution of the element uniqueness problem?

    - by Alceu Costa
    In the book Introduction to the Design & Analysis of Algorithms, the following solution is proposed to the element uniqueness problem: ALGORITHM UniqueElements(A[0 .. n-1]) // Determines whether all the elements in a given array are distinct // Input: An array A[0 .. n-1] // Output: Returns "true" if all the elements in A are distinct // and false otherwise. for i := 0 to n - 2 do for j := i + 1 to n - 1 do if A[i] = A[j] return false return true How can I compute the average cost (i.e. number of comparisons for a given n) for this algorithm? What is a reasonable assumption about the input?

    Read the article

  • Which parallel sorting algorithm has the best average case performance?

    - by Craig P. Motlin
    Sorting takes O(n log n) in the serial case. If we have O(n) processors we would hope for a linear speedup. O(log n) parallel algorithms exist but they have a very high constant. They also aren't applicable on commodity hardware which doesn't have anywhere near O(n) processors. With p processors, reasonable algorithms should take O(n/p log n/p) time. In the serial case, quick sort has the best runtime complexity on average. A parallel quick sort algorithm is easy to implement (see here and here). However it doesn't perform well since the very first step is to partition the whole collection on a single core. I have found information on many parallel sort algorithms but so far I have not seen anything pointing to a clear winner. I'm looking to sort lists of 1 million to 100 million elements in a JVM language running on 8 to 32 cores.

    Read the article

  • Improving terminal server performance for a specfic app

    - by Matt
    We have a windows 2003 terminal server running 2X application load balancign that is hosting a client's application that is accessed by around 50 users. Each user has there own database. The database is a file based database. The application is developed under Delphi so I think the database may be BDE based. As you can imagine, there is probably quite a lot of disk i/o. Here are some of the perfmon settings. Logged in users (average) 20 - 25 CPU Utilization (average) 80 - 100% Disk Queue Length (average) 1.6 % Disk time (average) 111 Page faults/sec (average) 1400 The application takes on average about a minute to load up. As usual, the budget is tight. Is there basic windows performance tuning tips that people can recommend to improve things before we fork out on more RAM etc. Server is a 2.8GHz Xeon with 3GB of RAM.

    Read the article

  • What kinds of protections against viruses does Linux provide out of the box for the average user?

    - by ChocoDeveloper
    I know others have asked this, but I have other questions related to this. In particular, I'm concerned about the damage that the virus can do the user itself (his files), not the OS in general nor other users of the same machine. This question came to my mind because of that ransomware virus that is encrypting machines all over the world, and then asking the user to send a payment in Bitcoin if he wants to recover his files. I have already received and opened the email that is supposed to contain the virus, so I guess I didn't do that bad because nothing happened. But would I have survived if I opened the attachment and it was aimed at Linux users? I guess not. One of the advantages is that files are not executable by default right after downloading them. Is that just a bad default in Windows and could be fixed with a proper configuration? As a Linux user, I thought my machine was pretty secure by default, and I was even told that I shouldn't bother installing an antivirus. But I have read some people saying that the most important (or only?) difference is that Linux is just less popular, so almost no one writes viruses for it. Is that right? What else can I do to be safe from this kind of ransomware virus? Not automatically executing random files from unknown sources seems to be more than enough, but is it? I can't think of many other things a user can do to protect his own files (not the OS, not other users), because he has full permissions on them.

    Read the article

  • What is a realistic average time difference between servers in the same LAN?

    - by monster
    Until recently, we had at work a small cluster of about 20 small Windows servers (which have now all been virtualized). They were all configured to synchronize with the local time server. It was on an 1Gb sub-network in our own DC. I never got them to be less than about 100ms away from each other, which I consider to be an incredibly big difference. Is that a normal value? What is a realistic expectation of time difference between machines running on a 1Gb network, and all connected to the same time server, and updating frequently, say every 5 minutes? I would like to know this as setting timeouts and other parameters in a distributed application requires to take that difference into consideration.

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part I

    - by dbayard
    Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls.  R is great for doing advanced computations.  Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns.  This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios.  This information is needed for customer statements and the online web application.  In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations.  But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly.  Rather, IRR is a calculation where approximation techniques need to be used.  In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns.  You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R.  In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale.  The first step we did was to get some sample data.  For a historical IRR calculation, you have a balances and cash flows.  In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel.      The above figure shows part of the spreadsheet of sample data.  The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R.  This is something that R does well.  R offers multiple ways to work with spreadsheet data.  For instance, one could save the spreadsheet as a .csv file.  In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account.  To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files.  We wrote ourselves a little helper function called getsheet() around the RODBC package.  Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself.  The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book.  In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer.  (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique.  For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero.  With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum.  Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer.   To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high.  We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs.  To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range.  See the R help page on optimize()  for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function.  (Our real-world version is  more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation.  In our actual customer proof, we also defined time-weighted rate of return calculations.  The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data.  For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data.  We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory.  In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account.  The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html).  Given that our sample data can fit in memory, one easy approach is to use R’s “by” function.  (Other approaches to Split-Apply-Combine such as plyr can also be used.  See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set.  Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR.  There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

    Read the article

  • How to select from tableA sum of grouped numbers from tableB above their sums average in Oracle?

    - by Nazgulled
    I have data like this: tableA.ID --------- 1 2 3 tableB.ID tableB.NUM -------------------- 1 10 1 15 2 18 3 12 2 15 3 13 1 12 I need to select tableA IDs where the sum of their NUMs in tableB is above the average of all tableA IDs sums. In other words: SUM ID=1 -> 10+15+12 = 37 SUM ID=2 -> 18+12+15 = 45 SUM ID=3 -> 12+13 = 25 AVG ALL IDs -> (37+45+25)/3 = 35 The SELECT must only show ID 1 and 2 because 37 35, 45 35 but 25 < 35. This is my current query which is working fine: SELECT tableA.ID FROM tableA, tableB WHERE tableA.ID = tableB.ID HAVING SUM(tableB.NUM) > ( SELECT AVG(MY_SUM) FROM ( SELECT SUM(tableB.NUM) MY_SUM FROM tableA, tableB WHERE tableA.ID = tableB.ID GROUP BY tableA.ID ) ) GROUP BY tableA.ID But I have a feeling there might be a better way without all those nested SELECTs. Perhaps 2, but 3 feels like too much. I'm probably wrong though. For instance, why can't I do something simple like this: SELECT tableA.ID FROM tableA, tableB WHERE tableA.ID = tableB.ID HAVING SUM(tableB.NUM) > AVG(SUM(tableB.NUM)) GROUP BY tableA.ID Or this: SELECT tableA.ID, SUM(tableB.NUM) MY_SUM FROM tableA, tableB WHERE tableA.ID = tableB.ID HAVING MY_SUM > AVG(MY_SUM) GROUP BY tableA.ID

    Read the article

  • Performance considerations for common SQL queries

    - by Jim Giercyk
    Originally posted on: http://geekswithblogs.net/NibblesAndBits/archive/2013/10/16/performance-considerations-for-common-sql-queries.aspxSQL offers many different methods to produce the same results.  There is a never-ending debate between SQL developers as to the “best way” or the “most efficient way” to render a result set.  Sometimes these disputes even come to blows….well, I am a lover, not a fighter, so I decided to collect some data that will prove which way is the best and most efficient.  For the queries below, I downloaded the test database from SQLSkills:  http://www.sqlskills.com/sql-server-resources/sql-server-demos/.  There isn’t a lot of data, but enough to prove my point: dbo.member has 10,000 records, and dbo.payment has 15,554.  Our result set contains 6,706 records. The following queries produce an identical result set; the result set contains aggregate payment information for each member who has made more than 1 payment from the dbo.payment table and the first and last name of the member from the dbo.member table.   /*************/ /* Sub Query  */ /*************/ SELECT  a.[Member Number] ,         m.lastname ,         m.firstname ,         a.[Number Of Payments] ,         a.[Average Payment] ,         a.[Total Paid] FROM    ( SELECT    member_no 'Member Number' ,                     AVG(payment_amt) 'Average Payment' ,                     SUM(payment_amt) 'Total Paid' ,                     COUNT(Payment_No) 'Number Of Payments'           FROM      dbo.payment           GROUP BY  member_no           HAVING    COUNT(Payment_No) > 1         ) a         JOIN dbo.member m ON a.[Member Number] = m.member_no         /***************/ /* Cross Apply  */ /***************/ SELECT  ca.[Member Number] ,         m.lastname ,         m.firstname ,         ca.[Number Of Payments] ,         ca.[Average Payment] ,         ca.[Total Paid] FROM    dbo.member m         CROSS APPLY ( SELECT    member_no 'Member Number' ,                                 AVG(payment_amt) 'Average Payment' ,                                 SUM(payment_amt) 'Total Paid' ,                                 COUNT(Payment_No) 'Number Of Payments'                       FROM      dbo.payment                       WHERE     member_no = m.member_no                       GROUP BY  member_no                       HAVING    COUNT(Payment_No) > 1                     ) ca /********/                    /* CTEs  */ /********/ ; WITH    Payments           AS ( SELECT   member_no 'Member Number' ,                         AVG(payment_amt) 'Average Payment' ,                         SUM(payment_amt) 'Total Paid' ,                         COUNT(Payment_No) 'Number Of Payments'                FROM     dbo.payment                GROUP BY member_no                HAVING   COUNT(Payment_No) > 1              ),         MemberInfo           AS ( SELECT   p.[Member Number] ,                         m.lastname ,                         m.firstname ,                         p.[Number Of Payments] ,                         p.[Average Payment] ,                         p.[Total Paid]                FROM     dbo.member m                         JOIN Payments p ON m.member_no = p.[Member Number]              )     SELECT  *     FROM    MemberInfo /************************/ /* SELECT with Grouping   */ /************************/ SELECT  p.member_no 'Member Number' ,         m.lastname ,         m.firstname ,         COUNT(Payment_No) 'Number Of Payments' ,         AVG(payment_amt) 'Average Payment' ,         SUM(payment_amt) 'Total Paid' FROM    dbo.payment p         JOIN dbo.member m ON m.member_no = p.member_no GROUP BY p.member_no ,         m.lastname ,         m.firstname HAVING  COUNT(Payment_No) > 1   We can see what is going on in SQL’s brain by looking at the execution plan.  The Execution Plan will demonstrate which steps and in what order SQL executes those steps, and what percentage of batch time each query takes.  SO….if I execute all 4 of these queries in a single batch, I will get an idea of the relative time SQL takes to execute them, and how it renders the Execution Plan.  We can settle this once and for all.  Here is what SQL did with these queries:   Not only did the queries take the same amount of time to execute, SQL generated the same Execution Plan for each of them.  Everybody is right…..I guess we can all finally go to lunch together!  But wait a second, I may not be a fighter, but I AM an instigator.     Let’s see how a table variable stacks up.  Here is the code I executed: /********************/ /*  Table Variable  */ /********************/ DECLARE @AggregateTable TABLE     (       member_no INT ,       AveragePayment MONEY ,       TotalPaid MONEY ,       NumberOfPayments MONEY     ) INSERT  @AggregateTable         SELECT  member_no 'Member Number' ,                 AVG(payment_amt) 'Average Payment' ,                 SUM(payment_amt) 'Total Paid' ,                 COUNT(Payment_No) 'Number Of Payments'         FROM    dbo.payment         GROUP BY member_no         HAVING  COUNT(Payment_No) > 1   SELECT  at.member_no 'Member Number' ,         m.lastname ,         m.firstname ,         at.NumberOfPayments 'Number Of Payments' ,         at.AveragePayment 'Average Payment' ,         at.TotalPaid 'Total Paid' FROM    @AggregateTable at         JOIN dbo.member m ON m.member_no = at.member_no In the interest of keeping things in groupings of 4, I removed the last query from the previous batch and added the table variable query.  Here’s what I got:     Since we first insert into the table variable, then we read from it, the Execution Plan renders 2 steps.  BUT, the combination of the 2 steps is only 22% of the batch.  It is actually faster than the other methods even though it is treated as 2 separate queries in the Execution Plan.  The argument I often hear against Table Variables is that SQL only estimates 1 row for the table size in the Execution Plan.  While this is true, the estimate does not come in to play until you read from the table variable.  In this case, the table variable had 6,706 rows, but it still outperformed the other queries.  People argue that table variables should only be used for hash or lookup tables.  The fact is, you have control of what you put IN to the variable, so as long as you keep it within reason, these results suggest that a table variable is a viable alternative to sub-queries. If anyone does volume testing on this theory, I would be interested in the results.  My suspicion is that there is a breaking point where efficiency goes down the tubes immediately, and it would be interesting to see where the threshold is. Coding SQL is a matter of style.  If you’ve been around since they introduced DB2, you were probably taught a little differently than a recent computer science graduate.  If you have a company standard, I strongly recommend you follow it.    If you do not have a standard, generally speaking, there is no right or wrong answer when talking about the efficiency of these types of queries, and certainly no hard-and-fast rule.  Volume and infrastructure will dictate a lot when it comes to performance, so your results may vary in your environment.  Download the database and try it!

    Read the article

  • How does one unit test an algorithm

    - by Asa Baylus
    I was recently working on a JS slideshow which rotates images using a weighted average algorithm. Thankfully, timgilbert has written a weighted list script which implements the exact algorithm I needed. However in his documentation he's noted under todos: "unit tests!". I'd like to know is how one goes about unit testing an algorithm. In the case of a weighted average how would you create a proof that the averages are accurate when there is the element of randomness? Code samples of similar would be very helpful to my understanding.

    Read the article

  • What container type provides better (average) performance than std::map?

    - by Truncheon
    In the following example a std::map structure is filled with 26 values from A - Z (for key) and 0 - 26 for value. The time taken (on my system) to lookup the last entry (10000000 times) is roughly 250 ms for the vector, and 125 ms for the map. (I compiled using release mode, with O3 option turned on for g++ 4.4) But if for some odd reason I wanted better performance than the std::map, what data structures and functions would I need to consider using? I apologize if the answer seems obvious to you, but I haven't had much experience in the performance critical aspects of C++ programming. UPDATE: This example is rather trivial and hides the true complexity of what I'm trying to achieve. My real world project is a simple scripting language that uses a parser, data tree, and interpreter (instead of a VM stack system). I need to use some kind of data structure (perhaps map) to store the variables names created by script programmers. These are likely to be pretty randomly named, so I need a lookup method that can quickly find a particular key within a (probably) fairly large list of names. #include <ctime> #include <map> #include <vector> #include <iostream> struct mystruct { char key; int value; mystruct(char k = 0, int v = 0) : key(k), value(v) { } }; int find(const std::vector<mystruct>& ref, char key) { for (std::vector<mystruct>::const_iterator i = ref.begin(); i != ref.end(); ++i) if (i->key == key) return i->value; return -1; } int main() { std::map<char, int> mymap; std::vector<mystruct> myvec; for (int i = 'a'; i < 'a' + 26; ++i) { mymap[i] = i - 'a'; myvec.push_back(mystruct(i, i - 'a')); } int pre = clock(); for (int i = 0; i < 10000000; ++i) { find(myvec, 'z'); } std::cout << "linear scan: milli " << clock() - pre << "\n"; pre = clock(); for (int i = 0; i < 10000000; ++i) { mymap['z']; } std::cout << "map scan: milli " << clock() - pre << "\n"; return 0; }

    Read the article

  • What are some good usability guidelines an average developer should follow?

    - by Allain Lalonde
    I'm not a usability specialist, and I really don't care to be one. I just want a small set of rules of thumb that I can follow while coding my User Interfaces so that my product has decent usability. At first I thought that this question would be easy to answer "Use your common sense", but if it's so common among us developers we wouldn't, as a group, have a reputation for our horrible interfaces. Any Suggestions?

    Read the article

  • How to solve High Load average issue in Linux systems?

    - by RoCkStUnNeRs
    The following is the different load with cpu time in different time limit . The below output has parsed from the top command. TIME LOAD US SY NICE ID WA HI SI ST 12:02:27 208.28 4.2%us 1.0%sy 0.2%ni 93.9%id 0.7%wa 0.0%hi 0.0%si 0.0%st 12:23:22 195.48 4.2%us 1.0%sy 0.2%ni 93.9%id 0.7%wa 0.0%hi 0.0%si 0.0%st 12:34:55 199.15 4.2%us 1.0%sy 0.2%ni 93.9%id 0.7%wa 0.0%hi 0.0%si 0.0%st 13:41:50 203.66 4.2%us 1.0%sy 0.2%ni 93.8%id 0.8%wa 0.0%hi 0.0%si 0.0%st 13:42:58 278.63 4.2%us 1.0%sy 0.2%ni 93.8%id 0.8%wa 0.0%hi 0.0%si 0.0%st Following is the additional Information of the system? cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz stepping : 10 cpu MHz : 1992.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm bogomips : 4658.69 clflush size : 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz stepping : 10 cpu MHz : 1992.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm bogomips : 4655.00 clflush size : 64 power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz stepping : 10 cpu MHz : 1992.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm bogomips : 4655.00 clflush size : 64 power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz stepping : 10 cpu MHz : 1992.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm bogomips : 4654.99 clflush size : 64 power management: Memory: total used free shared buffers cached Mem: 2 1 1 0 0 0 Swap: 5 0 5 let me know why the system is getting abnormally this much high load?

    Read the article

  • getting userbase vote average and individual user's vote in the same query?

    - by Andrew Heath
    Here goes: T1 [id] [desc] 1 lovely 2 ugly 3 slender T2 [id] [userid] [vote] 1 1 3 1 2 5 1 3 2 2 1 1 2 2 4 2 3 4 In one query (if possible) I'd like to return: T1.id, T1.desc, AVG(T2.vote), T2.vote (for user viewing the page) I can get the first 3 items with: SELECT T1.id, T1.desc, AVG(T2.vote) FROM T1 LEFT JOIN T2 ON T1.id=T2.id GROUP BY T1.id and I can get the first, second, and fourth items with: SELECT T1.id, T1.desc, T2.vote FROM T1 LEFT JOIN T2 ON T1.id=T2.id WHERE T2.userid='1' GROUP BY T1.id but I'm at a loss as to how to get all four items in one query. I tried inserting a select as the fourth term: SELECT T1.id, T1.desc, AVG(T2.vote), (SELECT T2.vote FROM T2 WHERE T2.userid='1') AS userVote etc etc but I get an error that the select returns more than one row... Help? My reason for wanting to do this in one query instead of two is that I want to be able to sort the data within MySQL rather than one it's been split into a number of arrays.

    Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >