estimate - Page 9 - Developer IT

Remove identical files in UNIX

- by Thrawn

Hi all, I'm dealing with a large amount (30,000) files of about 10MB in size. Some of them (I estimate 2%) are actually duplicated, and I need to keep only a copy for every duplicated pair (or triplet). Would you suggest me an efficient way to do that? I'm working on unix. Thank you :-)

Read the article

rng random number

- by wano3

Hi, I have a question. How can I calculate/estimate upper bound of random number, that a random number can generate ? Thanks in advance!

Read the article

Counting FLOPS/GFLOPS in program - CUDA

- by msx

Already finished my application which multiplies CRS matrix and vector (SpMV) and the only thing to do now is to count FLOPS my application did. In my opinion it's really hard to estimate number of floating point operation in case of sparse matrix - vector multiplication, because the number of multiplies in one row is really "jumpy" or fluent. I only tried to measure time using "cudaprof" ( available in ./CUDA/bin directory) - it works fine. Any sugestions and instruction pastes appreciated !

Read the article

Error using `loess.smooth` but not `loess` or `lowess`

- by Sandy

I need to smooth some simulated data, but occasionally run into problems when the simulated ordinates to be smoothed are mostly the same value. Here is a small reproducible example of the simplest case. > x <- 0:50 > y <- rep(0,51) > loess.smooth(x,y) Error in simpleLoess(y, x, w, span, degree, FALSE, FALSE, normalize = FALSE, : NA/NaN/Inf in foreign function call (arg 1) loess(y~x), lowess(x,y), and their analogue in MATLAB produce the expected results without error on this example. I am using loess.smooth here because I need the estimates evaluated at a set number of points. According to the documentation, I believe loess.smooth and loess are using the same estimation functions, but the former is an "auxiliary function" to handle the evaluation points. The error seems to come from a C function: > traceback() 3: .C(R_loess_raw, as.double(pseudovalues), as.double(x), as.double(weights), as.double(weights), as.integer(D), as.integer(N), as.double(span), as.integer(degree), as.integer(nonparametric), as.integer(order.drop.sqr), as.integer(sum.drop.sqr), as.double(span * cell), as.character(surf.stat), temp = double(N), parameter = integer(7), a = integer(max.kd), xi = double(max.kd), vert = double(2 * D), vval = double((D + 1) * max.kd), diagonal = double(N), trL = double(1), delta1 = double(1), delta2 = double(1), as.integer(0L)) 2: simpleLoess(y, x, w, span, degree, FALSE, FALSE, normalize = FALSE, "none", "interpolate", control$cell, iterations, control$trace.hat) 1: loess.smooth(x, y) loess also calls simpleLoess, but with what appears to be different arguments. Of course, if you vary enough of the y values to be nonzero, loess.smooth runs without error, but I need the program to run in even the most extreme case. Hopefully, someone can help me with one and/or all of the following: Understand why only loess.smooth, and not the other functions, produces this error and find a solution for this problem. Find a work-around using loess but still evaluating the estimate at a specified number of points that can differ from the vector x. For example, I might want to use only x <- seq(0,50,10) in the smoothing, but evaluate the estimate at x <- 0:50. As far as I know, using predict with a new data frame will not properly handle this situation, but please let me know if I am missing something there. Handle the error in a way that doesn't stop the program from moving onto the next simulated data set. Thanks in advance for any help on this problem.

Read the article

Can anyone provide ball park estimates for creating a CRUD application with .NET 4.0 and Entity Fram

- by JoeBlogs1999

I have yet to use Entity Framework or .NET 4 so there will be a learning curve, but assuming good quality C# developers (who are yet to use Entity Framework and .NET 4.0), what would your ball park estimate be for say 50 pages containing text box, dropdowns and grids. Would it be around 1-2 days per page?

Read the article

Can i openly speculate based on App rejection that the iPad has xxx MB of memory?

- by GamingHorror

If i were to calculate the iPad's amount of RAM based on just the one fact that my iPad App got rejected due to memory warnings twice, and me fixing it, would this violate the developer NDA? Obviously i know how much memory my App uses, how much the iPhone OS is likely to use and estimate the amount reserved for video memory, then i can deduct from that that the iPad has xxx MB of memory. I just wonder if i can say that number publicly without violating any NDA?

Read the article

How many lines of code is in StackOverflow? Your guess?

- by Andy

What is your guess, how many lines of code did it take to implement StackOverflow? Collecting all answers, I will publish the final estimate.

Read the article

How to know the exactly probability of a multi-class prediction using libsvm?

- by lgnlgn

Using svm_predict_probability to estimate the probabilty for a multi-class. I need to know what the order of the returned. It does not seem to be in increasing or decreasing order of the class labels, or the array of labels returned by svm_get_labels. So how do I know which class is coresponding to a probability? Many thanks

Read the article

How do I solve this conditional probabilities problem with MATLAB?

- by user198729

If P( cj | xi ) are already known, where i=1,2,...n; j=1,2,...k; How do I calculate/estimate: P( cj | xl , xm , xn ), where j=1,2,...k; l,m,n {1,2,...n} ?

Read the article

How many times do I/you/we hit refresh in the course of developing a 10 page website?

- by j-man86

For a standard sized 10 page website with a blog and cms, how many times do you estimate you hit the refresh button in your browser(s) in the course of the week / weeks you're coding it?

Read the article

Question about Cost in Oracle Explain Plan

- by Will

When Oracle is estimating the 'Cost' for certain queries, does it actually look at the amount of data (rows) in a table? For example: If I'm doing a full table scan of employees for name='Bob', does it estimate the cost by counting the amount of existing rows, or is it always a set cost?

Read the article

Port Android Application to Blackberry

- by Pria

Lets say I have an Android app that uses Google Maps and GPS.Uses custom views and timers. How much time will it require to develop it for Blackberry? What changes will be required? Can the UI be reused? I am totally new to Blackberry, though I know Java. Please help me estimate the time.

Read the article

count of paths from A[a,b] to A[c,d] without duplicating?

- by Sorush Rabiee

I write a sokoban solver for fun and practice, it uses a simple algorithm (something like BFS). now i want to estimate its running time ( O and omega). but i need to know how to calculate count of paths from a vertex to another in a network. each path from a to b is a sequence of edges with no circuit. for example this is a correct path: http://www.imgplace.com/viewimg143/4789/501k.png but this is not: http://www.imgplace.com/viewimg143/6140/202.png

Read the article

Heroku FREE DB and Dyno "power"

- by Tiago

Can anyone have an estimate of how much data can be inserted in a 5MB database? Also, would 1 Dyno handle a slashdot,hackernews, etc frontpage? Thanks.

Read the article

How does your team work together in a remote setup?

- by Carl Rosenberger

Hi, we are a distributed team working on the object database db4o. The way we work: We try to program in pairs only. We use Skype and VNC or SharedView to connect and work together. In our online Tuesday meeting every week (usually about 1 hour) we talk about the tasks done last week we create new pairs for the next week with a random generator so knowledge and friendship distribute evenly we set the priority for any new tasks or bugs that have come in each team picks the tasks it likes to do from the highest prioritized ones. From Tuesday to Wednesday we estimate tasks. We have a unit of work we call "Ideal Developer Session" (IDS), maybe 2 or 3 hours of working together as a pair. It's not perfectly well defined (because we know estimation always is inaccurate) but from our past shared experience we have a common sense of what an IDS is. If we can't estimate a task because it feels too long for a week we break it down into estimatable smaller tasks. During a short meeting on Wednesday we commit to a workload we feel is well doable in a week. We commit to complete. If a team runs out of committed tasks during the week, it can pick new ones from the prioritized queue we have in Jira. When we started working this way, some of us found that remote pair programming takes a lot of energy because you are so focussed. If you pair program for more than 5 or 6 hours per day, you get drained. On the other hand working like this has turned out to be very efficient. The knowledge about our codebase is evenly distributed and we have really learnt lots from eachother. I would be very interested to hear about the experiences from other teams working in a similar way. Things like: How often do you meet? Have you tried different sprint lengths (one week, two week, longer) ? Which tools do you use? Which issue tracker do you use? What do you do about time zone differences? How does it work for you to integrate new people into the team? How many hours do you usually work per week? How does your management interact with the way you are working? Do you get put on a waterfall with hard deadlines? What's your unit of work? What is your normal velocity? (units of work done per week) Programming work should be fun and for us it usually is great fun. I would be happy about any new ideas how to make it even more fun and/or more efficient.

Read the article

What did you develop using a microcontroller?

- by DR

I've always been fascinated by microcontrollers and I'm planning to do a few hobby projects just to satisfy my inner geek :) I'm looking for ideas and motivation, so what did you develop using a microcontroller? If possible please state the microcontroller and/or development environment and an estimate on hardware costs beyond the basic equipment (if applicable). I'm interested in both successful and failed projects and any problems you encountered.

Read the article

is it possible to get the duration of a streaming mp3 in Flash

- by dubbeat

Hi, I'm wondering if it is at all possible to get the total duration of an mp3 being streamed in flash? At the moment I'm using the following code to estimate the lenght but it is always inaccurate var loadTime:Number=_track.bytesLoaded / _track.bytesTotal; var loadPercent:uint=Math.round(100 * loadTime); estimatedLength=Math.ceil(_track.length / (loadTime));

Read the article

Returning the size of available virtual memory at run-time in C++

- by Greenhouse Gases

In C++ is there a predefined library function that will return the size of RAM currently available on a computer a program is being run on, at run-time? For instance, if an object is 4bytes, then can we divide the available virtual memory by 4 bytes to give an estimate of how many more objects could be stored by the program safely? I have used the sizeof() function to return the size of objects within my program. Thanks

Read the article

Indexing with pointer C/C++

- by Leavenotrace

Hey I'm trying to write a program to carry out newtons method and find the roots of the equation exp(-x)-(x^2)+3. It works in so far as finding the root, but I also want it to print out the root after each iteration but I can't get it to work, Could anyone point out my mistake I think its something to do with my indexing? Thanks a million :) #include <stdio.h> #include <math.h> #include <malloc.h> //Define Functions: double evalf(double x) { double answer=exp(-x)-(x*x)+3; return(answer); } double evalfprime(double x) { double answer=-exp(-x)-2*x; return(answer); } double *newton(double initialrt,double accuracy,double *data) { double root[102]; data=root; int maxit = 0; root[0] = initialrt; for (int i=1;i<102;i++) { *(data+i)=*(data+i-1)-evalf(*(data+i-1))/evalfprime(*(data+i-1)); if(fabs(*(data+i)-*(data+i-1))<accuracy) { maxit=i; break; } maxit=i; } if((maxit+1==102)&&(fabs(*(data+maxit)-*(data+maxit-1))>accuracy)) { printf("\nMax iteration reached, method terminated"); } else { printf("\nMethod successful"); printf("\nNumber of iterations: %d\nRoot Estimate: %lf\n",maxit+1,*(data+maxit)); } return(data); } int main() { double root,accuracy; double *data=(double*)malloc(sizeof(double)*102); printf("NEWTONS METHOD PROGRAMME:\nEquation: f(x)=exp(-x)-x^2+3=0\nMax No iterations=100\n\nEnter initial root estimate\n>> "); scanf("%lf",&root); _flushall(); printf("\nEnter accuracy required:\n>>"); scanf("%lf",&accuracy); *data= *newton(root,accuracy,data); printf("Iteration Root Error\n "); printf("%d %lf \n", 0,*(data)); for(int i=1;i<102;i++) { printf("%d %5.5lf %5.5lf\n", i,*(data+i),*(data+i)-*(data+i-1)); if(*(data+i*sizeof(double))-*(data+i*sizeof(double)-1)==0) { break; } } getchar(); getchar(); free(data); return(0); }

Read the article

Guidance regarding website development

- by mehrozdurrani

Dear Friends, I wana start start building a wesite yet i dun have much experience in it, i graduated 6 months back, so u can fairly estimate my calibre for makin a professional website. What i need is some bold guidance from all of u like how to start it or wat sources i can utilize to learn a proper procedure for developing a site .... Best Regards Mehroz

Read the article

Estimating the size of a tree

- by Full Decent

I'd like to estimate the number of leaves in a large tree structure for which I can't visit every node exhaustively. Is this algorithm appropriate? Does it have a name? Also, please pedant if I am using any terms improperly. sum_trials = 0 num_trials = 0 WHILE time_is_not_up bits = 0 ptr = tree.root WHILE count(ptr.children) > 0 bits += log2(count(ptr.children)) ptr = ptr.children[rand()%count(ptr.children)] sum_trials += bits num_trials++ estimated_tree_size = 2^(sum_trials/num_trials)

Read the article

Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Read the article

Getting your bearings and defining the project objective

- by johndoucette

I wrote this two years ago and thought it was worth posting… Some may think this is a daunting task and some may even say “what a waste of time” and want to open MS Project and start typing out tasks because someone asked for an estimate and a task list. Hell, maybe you even use Excel and pump out a spreadsheet with some real scientific formula for guessing how long it will take to code a bunch of classes. However, this short exercise will provide the basis for the entire project, whether small or large and be a great friend when communicating to anyone on your team or even your client. I call this the Project Brief. If you find yourself going beyond a single page, then you must decompose the sections and summarize your findings so there is a complete and clear picture of the project you are working on in a relatively short statement. Here is a great quote from the PMBOK (Project Management Body of Knowledge) relative to what a project is; A project is a temporary endeavor undertaken to create a unique product, service or result. With this in mind, the project brief should encompass the entirety (objective) of the endeavor in its explanation and what it will take (goals) to create the product, service or result (deliverables). Normally the process of identifying the project objective is done during the first stage of a project called the Project Kickoff, but you can perform this very important step anytime to help you get a bearing. There are many more parts to helping a project stay on course, but this is usually the foundation where it can be grounded on. Through a series of 3 exercises, you should be able to come up with the objective, goals and deliverables on your project. Follow these steps, and in no time (about ½ hour), you will have the foundation of your project plan. (See examples below) Exercise 1 – Objectives Begin with the end in mind. Think about your project in business terms with a couple things to help you understand the objective; Reference the business benefit in terms of cost, speed and / or quality, Provide a higher level of what the outcome will look like (future sense) It should be non-measurable, that’s what the goals are all about The output should be a single paragraph with three sentences and take 10 minutes to write. *Typically, agreement must be reached on the objectives of the project before you would proceed to the next steps of the project. Exercise 2 – Goals A project goal is a statement that answers questions about who, what, why, where and when. A good project goal statement; Answers the five “W” questions for the project Is measurable in each of its parts Is published and agreed on by all the owners This helps the Project Manager receive confirmation on defining the project target. Using the established project objective done in the first exercise, think about the things it will take to get the job done. Think about tangible activities which are the top level tasks in a typical Work Breakdown Structure (WBS). The overall goal statement plus all the deliverables (next exercise) can be seen as the project team’s contract with the project owners. Write 3 - 5 goals in about 10 minutes. You should not write the words “Who, what, why, where and when, but merely be able to answer the questions when you read a goal. Exercise 3 – Deliverables Every project creates some type of output and these outputs are called deliverables. There are two classes of deliverables; Internal – produced for project team members to meet their goals External – produced for project owners to meet their expectations The list you enter here provides a checklist for the team’s delivery and/or is a statement of all the expectations of the project owners. Here are some typical project deliverables; Product and product documentation End product/system Requirements/feature documents Installation guides Demo/prototype System design documents User guides/help files Plans Project plan Training plan Conversion/installation/delivery plan Test plans Documentation plan Communication plan Reports and general documentation Progress reports System acceptance tests Outstanding bug list Procedures Risk and issue logs Project history Deliverables should go with each of the goals. Have 3-5 deliverables for each goal. When you are done, you will have established a great foundation for the clarity of your project. This exercise can take some time, but with practice, you should be able to whip this one out in 10 minutes as well, especially if you are intimate with an ongoing project. Samples Objective [Client] is implementing a series of MOSS sites to support external public (Internet), internal employee (Intranet) and an external secure (password protected Internet) applications. This project will focus on the public-facing web site and will provide [Client] with architectural recommendations based on the current design being done by their design partner [Partner] and the internal Content Team. In addition, it will provide [Client] with a development plan and confidence they need to deploy a world class public Internet website. Goals 1. [Consultant] will provide technical guidance and set project team expectations for the implementation of the MOSS Internet site based on provided features/functions within three weeks. 2. [Consultant] will understand phase 2 secure password-protected Internet site design and provide recommendations. Deliverables 1.1 Public Internet (unsecure) Architectural Recommendation Plan 1.2 Physical Site construction Work Breakdown Structure and plan (Time, cost and resources needed) 2.1 Two Factor authentication recommendation document Objective [Client] is currently using an application developed by [Consultant] many years ago called "XXX". This application, although functional, does not meet their new updated business requirements and contains a few defects which [Client] has developed work-around processes. [Client] would like to have a "new and improved" system to support their membership management needs by expanding membership and subscription capabilities, provide accounting integration with internal (GL) and external (VeriSign) systems, and implement hooks to the current CRM solution. This effort will take place through a series of phases, beginning with envisioning. Goals 1. Through discussions with users, [Consultant] will discover current issues/bugs which need to be resolved which must meet the current functionality requirements within three weeks. 2. [Consultant] will gather requirements from the users about what is "needed" vs. "what they have" for enhancements and provide a high level document supporting their needs. 3. [Consultant] will meet with the team members through a series of meetings and help define the overall project plan to deliver a new and improved solution. Deliverables 1.1 Prioritized list of Current application issues/bugs that need to be resolved 1.2 Provide a resolution plan on the issues/bugs identified in the current application 1.3 Risk Assessment Document 2.1 Deliver a Requirements Document showing high-level [Client] needs for the new XXX application. · New feature functionality not in the application today · Existing functionality that will remain in the new functionality 2.2 Reporting Requirements Document 3.1 A Project Plan showing the deliverables and cost for the next (second) phase of this project. 3.2 A Statement of Work for the next (second) phase of this project. 3.3 An Estimate of any work that would need to follow the second phase.

Read the article

Big Visible Charts

- by Robert May

An important part of Agile is the concept of transparency and visibility. In proper functioning teams, stakeholders can look at any team at any time in the iteration or release and see how that team is doing by simply looking at what we call Big Visible Charts. If you’ve done Scrum, you’ve seen these charts. However, interpreting these charts can often be an art form. There are several different charts that can be useful. In this newsletter, I’ll focus on the Iteration Burndown and Cumulative Flow charts. I’ve included a copy of the spreadsheet that I used to create the charts, and if you don’t have a tool that creates them for you, you can use this spreadsheet to do so. Our preferred tool for managing Scrum projects is Rally. Rally creates all of these charts for you, saving you quite a bit of time. The Iteration Burndown and Cumulative Flow Charts This is the main chart that teams use. Although less useful to stakeholders, this chart is critical to the team and provides quite a bit of information to the team about how their iteration is going. Most charts are a combination of the charts below, so you may need to combine aspects of each section to understand what is happening in your iterations. Ideal Ah, isn’t that a pretty picture? Unfortunately, it’s also very unrealistic. I’ve seen iterations that come close to ideal, but never that match perfectly. If your iteration matches perfectly, chances are, someone is playing with the numbers. Reality is just too difficult to have a burndown chart that matches this exactly. Late Planning Iteration started, but the team didn’t. You can tell this by the fact that the real number of estimated hours didn’t appear until day two. In the cumulative flow, you can also see that nothing was defined in Day one and two. You want to avoid situations like this. You’ll note that the team had to burn faster than is ideal to meet the iteration because of the late planning. This often results in long weeks and days. Testing Starved Determining whether or not testing is starved is difficult without the cumulative flow. The pattern in the burndown could be nothing more that developers not completing stories early enough or could be caused by stories being too big. With the cumulative flow, however, you see that only small bites are in progress and stories were completed early, but testing didn’t start testing until the end of the iteration, and didn’t complete testing all stories in the iteration. When this happens, question whether or not your testing resources are sufficient for your team and whether or not acceptance is adequately defined. No Testing With this one, both graphs show the same thing; the team needs testers and testing! Without testing, what was completed cannot be verified to make sure that it is acceptable to the business. If you find yourself in this situation, review your testing practices and acceptance testing process and make changes today. Late Development With this situation, both graphs tell a story. In the top graph, you can see that the hours failed to burn down as quickly as the team expected. This could be caused by the team not correctly estimating their hours or the team could have had illness or some other issue that affected them. Often, when teams are tackling something that is more unknown, they’ll run into technical barriers that cause the burn down to happen slower than expected. In the cumulative flow graph, you can see that not much was completed in the first few days. This could be because of illness or technical barriers or simply poor estimation. Testing was able to keep up with everything that was completed, however. No Tool Updating When you see graphs that look like this, you can be assured that it’s because the team is not updating the tool that generates the graphs. Review your policy for when they are to update. On the teams that I run, I require that each team member updates the tool at least once daily. You should also check to see how well the team is breaking down stories into tasks. If they’re creating few large tasks, graphs can look similar to this. As a general rule, I never allow tasks, other than Unit Testing and Uncertainty, to be greater than eight hours in duration. Scope Increase I always encourage team members to enter in however much time they think they have left on a task, even if that means increasing the total amount of time left to do. You get a much better and more realistic picture this way. Increasing time remaining could explain the burndown graph, but by looking at the cumulative flow graph, we can see that stories were added to the iteration and scope was increased. Since planning should consume all of the hours in the iteration, this is almost always a bad thing. If the scope change happened late in the iteration and the hours remaining were well below the ideal burn, then increasing scope is probably o.k., but estimation needs to get better. However, with the charts above, that’s clearly not what happened and the team was required to do extra work to make the iteration. If you find this happening, your product owner and ScrumMasters need training. The team also needs to learn to say no. Scope Decrease Scope decreases are just as bad as scope increases. Usually, graphs above show that the team did a poor job of estimating their stories and part way through had to reduce scope to change the iteration. This will happen once in a while, but if you find it’s a pattern on your team, you need to re-evaluate planning. Some teams are hopelessly optimistic. In those cases, I’ll introduce a task I call “Uncertainty.” With Uncertainty, the team estimates how many hours they might need if things don’t go well with the tasks they’ve defined. They try to estimate things that could go poorly and increase the time appropriately. Having an Uncertainty task allows them to have a low and high estimate. Uncertainty should not just be an arbitrary buffer. It must correlate to real uncertainty in the tasks that have been defined. Stories are too Big Often, we see graphs like the ones above. Note that the burndown looks fairly good, other than the chunky acceptance of stories. However, when you look at cumulative flow, you can see that at one point, everything is in progress. This is a bad thing. When you see graphs like this, you’re in one of two states. You may just have a very small team and can only handle one or two stories in your iteration. If you have more than one or two people, then the most likely problem is that your stories are far too big. To combat this, break large high hour stories into smaller pieces that can be completed independently and accepted independently. If you don’t, you’ll likely be requiring your testers to do heroic things to complete testing on the last day of the iteration and you’re much more likely to have the entire iteration fail, because of the limited amount of things that can be completed. Summary There are other charts that can be useful when doing scrum. If you don’t have any big visible charts, you really need to evaluate your process and change. These charts can provide the team a wealth of information and help you write better software. If you have any questions about charts that you’re seeing on your team, contact me with a screen capture of the charts and I’ll tell you what I’m seeing in those charts. I always want this information to be useful, so please let me know if you have other questions. Technorati Tags: Agile

Read the article

Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Search Results

Search found 480 results on 20 pages for 'estimate'.

Page 9/20 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

- by Thrawn

- by wano3

- by msx

- by Sandy

- by JoeBlogs1999

- by GamingHorror

- by Andy

- by lgnlgn

- by user198729

- by j-man86

- by Will

- by Pria

- by Sorush Rabiee

- by Tiago

- by Carl Rosenberger

- by DR

- by dubbeat

- by Greenhouse Gases

- by Leavenotrace

- by mehrozdurrani

- by Full Decent

- by Paul White

- by johndoucette

- by Robert May

- by Paul White

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >