Search Results

Search found 2156 results on 87 pages for 'weighted average'.

Page 14/87 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >

Either .each do or .all isn't working how I think it should

- by user1299656

So whenever someone rates a shop, I want the Shop model to calculate its new average rating and store that in the database (instead of calculating the average every time someone looks at it). So I wrote the segment of code that follows, and it doesn't work. The loop always iterates exactly once, no matter how many shop_ratings in the database exist that have the shop's id as their shop_id. I played around with it a bit and found that every time a new rating is submitted the function is called successfully, but it only runs the loop once and sets the average to what the first rating was. I don't know if the "query" that sets the ratings variable is wrong or if it's the loop that's wrong. class Shop < ActiveRecord::Base has_many :shop_ratings attr_accessible :name, :latitude, :longitude validates_presence_of :name validates_presence_of :latitude validates_presence_of :longitude def distance_to(lat, long) return (self.longitude - long) + (self.latitude - lat) end def find_average total = 0 count = 0 ratings = ShopRating.all(:conditions => {:shop_id => id}) ratings.each do |submission| total = total + submission.rating count = count + 1 end update_attribute :average_rating, total/count end end

Read the article
OOP PHP simple question

- by Tristan

Hello, I'm new to OOP in PHP, is that to seems correct ? class whatever { Function Maths() { $this->sql->query($requete); $i = 0; while($val = mysql_fetch_array($this)) { $tab[i][average] = $val['average']; $tab[i][randomData] = $val['sum']; $i=$i+1; } return $tab; } I want to access the data contained in the array $foo = new whatever(); $foo->Maths(); for ($i, $i <= endOfTheArray; i++) { echo Maths->tab[i][average]; echo Maths->tab[i][randomData]; } Thank you ;)

Read the article
Handling a binary operation that makes sense only for part of a hierarchy.

- by usersmarvin_

I have a hierarchy, which I'll simplify greatly, of implementations of interface Value. Assume that I have two implementations, NumberValue, and StringValue. There is an average operation which only makes sense for NumberValue, with the signature NumberValue average(NumberValue numberValue){ ... } At some point after creating such variables and using them in various collections, I need to average a collection which I know is only of type NumberValue, there are three possible ways of doing this I think: Very complicated generic signatures which preserve the type info in compile time (what I'm doing now, and results in hard to maintain code) Moving the operation to the Value level, and: throwing an unsupportedOperationException for StringValue, and casting for NumberValue. Casting at the point where I know for sure that I have a NumberValue, using slightly less complicated generics to insure this. Does anybody have any better ideas, or a recommendation on oop best practices?

Read the article
Where to declare variable? C#

- by user1303781

I am trying to make an average function... 'Total' adds them, then 'Total' is divided by n, the number of entries... No matter where I put 'double Total;', I get an error message. In this example I get... Use of unassigned local variable 'Total' If I put it before the comment, both references show up as error... I'm sure it's something simple..... namespace frmAssignment3 { class StatisticalFunctions { public static class Statistics { //public static double Average(List<MachineData.MachineRecord> argMachineDataList) public static double Average(List<double> argMachineDataList) { double Total; int n; for (n = 1; n <= argMachineDataList.Count; n++) { Total = argMachineDataList[n]; } return Total / n; } public static double StDevSample(List<MachineData.MachineRecord> argMachineDataList) { return -1; } } } }

Read the article
A Taxonomy of Numerical Methods v1

- by JoshReuben

Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

Read the article
Google Webmasters tools search queries position

- by user1592845

In my website account on Google Webmasters tools, some search queries show average position 1.0. This make me understand that it should be displayed as the first result. When I search for this query I could not able to find my website's page listed as a result?! In some cases I navigate to the third or the fourth result page and I could not find it! What are factors that make my website loss its average position for a search query? and when Google webmasters tools updates their values?

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article
8 Reasons Why Even Microsoft Agrees the Windows Desktop is a Nightmare

- by Chris Hoffman

Let’s be honest: The Windows desktop is a mess. Sure, it’s extremely powerful and has a huge software library, but it’s not a good experience for average people. It’s not even a good experience for geeks, although we tolerate it. Even Microsoft agrees about this. Microsoft’s Surface tablets with Windows RT don’t support any third-party desktop apps. They consider this a feature — users can’t install malware and other desktop junk, so the system will always be speedy and secure. Malware is Still Common Malware may not affect geeks, but it certainly continues to affect average people. Securing Windows, keeping it secure, and avoiding unsafe programs is a complex process. There are over 50 different file extensions that can contain harmful code to keep track of. It’s easy to have theoretical discussions about how malware could infect Mac computers, Android devices, and other systems. But Mac malware is extremely rare, and has generally been caused by problem with the terrible Java plug-in. Macs are configured to only run executables from identified developers by default, whereas Windows will run everything. Android malware is talked about a lot, but Android malware is rare in the real world and is generally confined to users who disable security protections and install pirated apps. Google has also taken action, rolling out built-in antivirus-like app checking to all Android devices, even old ones running Android 2.3, via Play Services. Whatever the reason, Windows malware is still common while malware for other systems isn’t. We all know it — anyone who does tech support for average users has dealt with infected Windows computers. Even users who can avoid malware are stuck dealing with complex and nagging antivirus programs, especially since it’s now so difficult to trust Microsoft’s antivirus products. Manufacturer-Installed Bloatware is Terrible Sit down with a new Mac, Chromebook, iPad, Android tablet, Linux laptop, or even a Surface running Windows RT and you can enjoy using your new device. The system is a clean slate for you to start exploring and installing your new software. Sit down with a new Windows PC and the system is a mess. Rather than be delighted, you’re stuck reinstalling Windows and then installing the necessary drivers or you’re forced to start uninstalling useless bloatware programs one-by-one, trying to figure out which ones are actually useful. After uninstalling the useless programs, you may end up with a system tray full of icons for ten different hardware utilities anyway. The first experience of using a new Windows PC is frustration, not delight. Yes, bloatware is still a problem on Windows 8 PCs. Manufacturers can customize the Refresh image, preventing bloatware rom easily being removed. Finding a Desktop Program is Dangerous Want to install a Windows desktop program? Well, you’ll have to head to your web browser and start searching. It’s up to you, the user, to know which programs are safe and which are dangerous. Even if you find a website for a reputable program, the advertisements on that page will often try to trick you into downloading fake installers full of adware. While it’s great to have the ability to leave the app store and get software that the platform’s owner hasn’t approved — as on Android — this is no excuse for not providing a good, secure software installation experience for typical users installing typical programs. Even Reputable Desktop Programs Try to Install Junk Even if you do find an entirely reputable program, you’ll have to keep your eyes open while installing it. It will likely try to install adware, add browse toolbars, change your default search engine, or change your web browser’s home page. Even Microsoft’s own programs do this — when you install Skype for Windows desktop, it will attempt to modify your browser settings t ouse Bing, even if you’re specially chosen another search engine and home page. With Microsoft setting such an example, it’s no surprise so many other software developers have followed suit. Geeks know how to avoid this stuff, but there’s a reason program installers continue to do this. It works and tricks many users, who end up with junk installed and settings changed. The Update Process is Confusing On iOS, Android, and Windows RT, software updates come from a single place — the app store. On Linux, software updates come from the package manager. On Mac OS X, typical users’ software updates likely come from the Mac App Store. On the Windows desktop, software updates come from… well, every program has to create its own update mechanism. Users have to keep track of all these updaters and make sure their software is up-to-date. Most programs now have their act together and automatically update by default, but users who have old versions of Flash and Adobe Reader installed are vulnerable until they realize their software isn’t automatically updating. Even if every program updates properly, the sheer mess of updaters is clunky, slow, and confusing in comparison to a centralized update process. Browser Plugins Open Security Holes It’s no surprise that other modern platforms like iOS, Android, Chrome OS, Windows RT, and Windows Phone don’t allow traditional browser plugins, or only allow Flash and build it into the system. Browser plugins provide a wealth of different ways for malicious web pages to exploit the browser and open the system to attack. Browser plugins are one of the most popular attack vectors because of how many users have out-of-date plugins and how many plugins, especially Java, seem to be designed without taking security seriously. Oracle’s Java plugin even tries to install the terrible Ask toolbar when installing security updates. That’s right — the security update process is also used to cram additional adware into users’ machines so unscrupulous companies like Oracle can make a quick buck. It’s no wonder that most Windows PCs have an out-of-date, vulnerable version of Java installed. Battery Life is Terrible Windows PCs have bad battery life compared to Macs, IOS devices, and Android tablets, all of which Windows now competes with. Even Microsoft’s own Surface Pro 2 has bad battery life. Apple’s 11-inch MacBook Air, which has very similar hardware to the Surface Pro 2, offers double its battery life when web browsing. Microsoft has been fond of blaming third-party hardware manufacturers for their poorly optimized drivers in the past, but there’s no longer any room to hide. The problem is clearly Windows. Why is this? No one really knows for sure. Perhaps Microsoft has kept on piling Windows component on top of Windows component and many older Windows components were never properly optimized. Windows Users Become Stuck on Old Windows Versions Apple’s new OS X 10.9 Mavericks upgrade is completely free to all Mac users and supports Macs going back to 2007. Apple has also announced their intention that all new releases of Mac OS X will be free. In 2007, Microsoft had just shipped Windows Vista. Macs from the Windows Vista era are being upgraded to the latest version of the Mac operating system for free, while Windows PCs from the same era are probably still using Windows Vista. There’s no easy upgrade path for these people. They’re stuck using Windows Vista and maybe even the outdated Internet Explorer 9 if they haven’t installed a third-party web browser. Microsoft’s upgrade path is for these people to pay $120 for a full copy of Windows 8.1 and go through a complicated process that’s actaully a clean install. Even users of Windows 8 devices will probably have to pay money to upgrade to Windows 9, while updates for other operating systems are completely free. If you’re a PC geek, a PC gamer, or someone who just requires specialized software that only runs on Windows, you probably use the Windows desktop and don’t want to switch. That’s fine, but it doesn’t mean the Windows desktop is actually a good experience. Much of the burden falls on average users, who have to struggle with malware, bloatware, adware bundled in installers, complex software installation processes, and out-of-date software. In return, all they get is the ability to use a web browser and some basic Office apps that they could use on almost any other platform without all the hassle. Microsoft would agree with this, touting Windows RT and their new “Windows 8-style” app platform as the solution. Why else would Microsoft, a “devices and services” company, position the Surface — a device without traditional Windows desktop programs — as their mass-market device recommended for average people? This isn’t necessarily an endorsement of Windows RT. If you’re tech support for your family members and it comes time for them to upgrade, you may want to get them off the Windows desktop and tell them to get a Mac or something else that’s simple. Better yet, if they get a Mac, you can tell them to visit the Apple Store for help instead of calling you. That’s another thing Windows PCs don’t offer — good manufacturer support. Image Credit: Blanca Stella Mejia on Flickr, Collin Andserson on Flickr, Luca Conti on Flickr

Read the article
C#/.NET – Finding an Item’s Index in IEnumerable<T>

- by James Michael Hare

Sorry for the long blogging hiatus. First it was, of course, the holidays hustle and bustle, then my brother and his wife gave birth to their son, so I’ve been away from my blogging for two weeks. Background: Finding an item’s index in List<T> is easy… Many times in our day to day programming activities, we want to find the index of an item in a collection. Now, if we have a List<T> and we’re looking for the item itself this is trivial: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3: 4: // can find the exact item using IndexOf() 5: var pos = list.IndexOf(64); This will return the position of the item if it’s found, or –1 if not. It’s easy to see how this works for primitive types where equality is well defined. For complex types, however, it will attempt to compare them using EqualityComparer<T>.Default which, in a nutshell, relies on the object’s Equals() method. So what if we want to search for a condition instead of equality? That’s also easy in a List<T> with the FindIndex() method: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3: 4: // finds index of first even number or -1 if not found. 5: var pos = list.FindIndex(i => i % 2 == 0); Problem: Finding an item’s index in IEnumerable<T> is not so easy... This is all well and good for lists, but what if we want to do the same thing for IEnumerable<T>? A collection of IEnumerable<T> has no indexing, so there’s no direct method to find an item’s index. LINQ, as powerful as it is, gives us many tools to get us this information, but not in one step. As with almost any problem involving collections, there are several ways to accomplish the same goal. And once again as with almost any problem involving collections, the choice of the solution somewhat depends on the situation. So let’s look at a few possible alternatives. I’m going to express each of these as extension methods for simplicity and consistency. Solution: The TakeWhile() and Count() combo One of the things you can do is to perform a TakeWhile() on the list as long as your find condition is not true, and then do a Count() of the items it took. The only downside to this method is that if the item is not in the list, the index will be the full Count() of items, and not –1. So if you don’t know the size of the list beforehand, this can be confusing. 1: // a collection of extra extension methods off IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // Finds an item in the collection, similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: // note if item not found, result is length and not -1! 8: return list.TakeWhile(i => !finder(i)).Count(); 9: } 10: } Personally, I don’t like switching the paradigm of not found away from –1, so this is one of my least favorites. Solution: Select with index Many people don’t realize that there is an alternative form of the LINQ Select() method that will provide you an index of the item being selected: 1: list.Select( (item,index) => do something here with the item and/or index... ) This can come in handy, but must be treated with care. This is because the index provided is only as pertains to the result of previous operations (if any). For example: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3: 4: // you'd hope this would give you the indexes of the even numbers 5: // which would be 2, 3, 8, but in reality it gives you 0, 1, 2 6: list.Where(item => item % 2 == 0).Select((item,index) => index); The reason the example gives you the collection { 0, 1, 2 } is because the where clause passes over any items that are odd, and therefore only the even items are given to the select and only they are given indexes. Conversely, we can’t select the index and then test the item in a Where() clause, because then the Where() clause would be operating on the index and not the item! So, what we have to do is to select the item and index and put them together in an anonymous type. It looks ugly, but it works: 1: // extensions defined on IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // finds an item in a collection, similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: // if you don't name the anonymous properties they are the variable names 8: return list.Select((item, index) => new { item, index }) 9: .Where(p => finder(p.item)) 10: .Select(p => p.index + 1) 11: .FirstOrDefault() - 1; 12: } 13: } So let’s look at this, because i know it’s convoluted: First Select() joins the items and their indexes into an anonymous type. Where() filters that list to only the ones matching the predicate. Second Select() picks the index of the matches and adds 1 – this is to distinguish between not found and first item. FirstOrDefault() returns the first item found from the previous clauses or default (zero) if not found. Subtract one so that not found (zero) will be –1, and first item (one) will be zero. The bad thing is, this is ugly as hell and creates anonymous objects for each item tested until it finds the match. This concerns me a bit but we’ll defer judgment until compare the relative performances below. Solution: Convert ToList() and use FindIndex() This solution is easy enough. We know any IEnumerable<T> can be converted to List<T> using the LINQ extension method ToList(), so we can easily convert the collection to a list and then just use the FindIndex() method baked into List<T>. 1: // a collection of extension methods for IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // find the index of an item in the collection similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: return list.ToList().FindIndex(finder); 8: } 9: } This solution is simplicity itself! It is very concise and elegant and you need not worry about anyone misinterpreting what it’s trying to do (as opposed to the more convoluted LINQ methods above). But the main thing I’m concerned about here is the performance hit to allocate the List<T> in the ToList() call, but once again we’ll explore that in a second. Solution: Roll your own FindIndex() for IEnumerable<T> Of course, you can always roll your own FindIndex() method for IEnumerable<T>. It would be a very simple for loop which scans for the item and counts as it goes. There’s many ways to do this, but one such way might look like: 1: // extension methods for IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // Finds an item matching a predicate in the enumeration, much like List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: int index = 0; 8: foreach (var item in list) 9: { 10: if (finder(item)) 11: { 12: return index; 13: } 14: 15: index++; 16: } 17: 18: return -1; 19: } 20: } Well, it’s not quite simplicity, and those less familiar with LINQ may prefer it since it doesn’t include all of the lambdas and behind the scenes iterators that come with deferred execution. But does having this long, blown out method really gain us much in performance? Comparison of Proposed Solutions So we’ve now seen four solutions, let’s analyze their collective performance. I took each of the four methods described above and run them over 100,000 iterations of lists of size 10, 100, 1000, and 10000 and here’s the performance results. Then I looked for targets at the begining of the list (best case), middle of the list (the average case) and not in the list (worst case as must scan all of the list). Each of the times below is the average time in milliseconds for one execution as computer over the 100,000 iterations: Searches Matching First Item (Best Case) 10 100 1000 10000 TakeWhile 0.0003 0.0003 0.0003 0.0003 Select 0.0005 0.0005 0.0005 0.0005 ToList 0.0002 0.0003 0.0013 0.0121 Manual 0.0001 0.0001 0.0001 0.0001 Searches Matching Middle Item (Average Case) 10 100 1000 10000 TakeWhile 0.0004 0.0020 0.0191 0.1889 Select 0.0008 0.0042 0.0387 0.3802 ToList 0.0002 0.0007 0.0057 0.0562 Manual 0.0002 0.0013 0.0129 0.1255 Searches Where Not Found (Worst Case) 10 100 1000 10000 TakeWhile 0.0006 0.0039 0.0381 0.3770 Select 0.0012 0.0081 0.0758 0.7583 ToList 0.0002 0.0012 0.0100 0.0996 Manual 0.0003 0.0026 0.0253 0.2514 Notice something interesting here, you’d think the “roll your own” loop would be the most efficient, but it only wins when the item is first (or very close to it) regardless of list size. In almost all other cases though and in particular the average case and worst case, the ToList()/FindIndex() combo wins for performance, even though it is creating some temporary memory to hold the List<T>. If you examine the algorithm, the reason why is most likely because once it’s in a ToList() form, internally FindIndex() scans the internal array which is much more efficient to iterate over. Thus, it takes a one time performance hit (not including any GC impact) to create the List<T> but after that the performance is much better. Summary If you’re concerned about too many throw-away objects, you can always roll your own FindIndex() method, but for sheer simplicity and overall performance, using the ToList()/FindIndex() combo performs best on nearly all list sizes in the average and worst cases. Technorati Tags: C#,.NET,Litte Wonders,BlackRabbitCoder,Software,LINQ,List

Read the article
SQL SERVER – OLEDB – Link Server – Wait Type – Day 23 of 28

- by pinaldave

When I decided to start writing about this wait type, the very first question that came to my mind was, “What does ‘OLEDB’ stand for?” A quick search on Wikipedia tells me that OLEDB means Object Linking and Embedding Database. (How many of you knew this?) Anyway, I found it very interesting that this wait type was in one of the top 10 wait types in many of the systems I have come across in my performance tuning experience. Books On-Line: ????OLEDB occurs when SQL Server calls the SQL Server Native Client OLE DB Provider. This wait type is not used for synchronization. Instead, it indicates the duration of calls to the OLE DB provider. OLEDB Explanation: This wait type primarily happens when Link Server or Remove Query has been executed. The most common case wherein this wait type is visible is during the execution of Linked Server. When SQL Server is retrieving data from the remote server, it uses OLEDB API to retrieve the data. It is possible that the remote system is not quick enough or the connection between them is not fast enough, leading SQL Server to wait for the result’s return from the remote (or external) server. This is the time OLEDB wait type occurs. Reducing OLEDB wait: Check the Link Server configuration. Checking Disk-Related Perfmon Counters Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good) Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good) Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good) At this point in time, I am not able to think of any more ways on reducing this wait type. Do you have any opinion about this subject? Please share it here and I will share your comment with the rest of the Community, and of course, with due credit unto you. Please read all the post in the Wait Types and Queue series. Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussion of Wait Stats in this blog is generic and varies from system to system. It is recommended that you test this on a development server before implementing it to a production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

Read the article
Windows and Online Banking, Like Oil and Water

Windows may be safe enough for most functions, but it's not safe enough for the average user's online banking.

Read the article
Tangent basis calculation problem

- by Kirill Daybov

I have the problem with seams with calculating a tangent basis in my application. I'm using a seems to be right algorithm, but it gives wrong result on the seams. What am I doing wrong? Is there a problem with an algorithm, or with the model? The designer says that our models with our normal maps are rendered correctly in Xoliul Shader Plugin in 3Ds Max, so there should be a way to calculate correct tangent basis programmatically. Here's an example of the problem I'm talking about. Steps, I've already taken: - Tried different algorithm (from Gamasutra, I can't post the link because I don't have enough reputation yet). I got wrong, much worse, results; - Tried to average basis vectors for vertexes are used in multiple faces; - Tried to average basis vectors for vertexes that have same world coordinates (this would be obviously wrong solution, but I've tried it anyway).

Read the article
How to evaluate SEO/prominence improvement [on hold]

- by Rober

I will work on a website SEO and before starting with it I would like to "take a snapshot" of the present status so that I will be able to compare it with the new situation in a few months and evaluate my work and the real improvement. I don't mean whether the website is well implemented or not, but how well it is seen by Google and others. What prominence it has. I am taking some variables from Google Analytics (average day visits...), from Google Webmaster Tools (Search traffic and average position...) and some other indicators, like automatic SEO audit figures (website estimated worth, real pagerank...). What would you look at before starting SEO improvement?

Read the article
Do you count a Masters in CS as a negative?

- by Pete Hodgson

In my experience interviewing developers I feel like candidates who've achieved a Masters in Comp Sci tend to be worse programmers on average that those who don't have a Masters. Is that just me, or have others noticed this phenomenon? If so, why would that be the case? UPDATE I appreciate the thoughtful comments. I think I should have been clearer in the comparison I'm making. Given two candidates who graduated from college around the same time, someone who went on to gain a Masters seems on average to be a worse programmer than someone who spent all their time in industry.

Read the article
Expected time for an CakePHP MVC form/controller and db make up

- by hephestos

I would like to know, what is an average time for building a form in MVC pattern with for example CakePHP. I build 8 functions, two of them do custom queries, return json data, split them, expand them in a model in memory and delivers to the view. Those are three queries if you consider and an array to feed view for making some combo box. Why? all these, because I have data from json and I split them in order to make row of data like a table. Like that I changed a bit the edit.ctp but not a lot. And I created a javascript outside, with three functions. One collects data the other upon a change of a combo returnes the selected values, and does also some redirection flow. All this, in average how much time should it take ?

Read the article
Faster, Simpler access to Azure Tables with Enzo Azure API

- by Herve Roggero

After developing the latest version of Enzo Cloud Backup I took the time to create an API that would simplify access to Azure Tables (the Enzo Azure API). At first, my goal was to make the code simpler compared to the Microsoft Azure SDK. But as it turns out it is also a little faster; and when using the specialized methods (the fetch strategies) it is much faster out of the box than the Microsoft SDK, unless you start creating complex parallel and resilient routines yourself. Last but not least, I decided to add a few extension methods that I think you will find attractive, such as the ability to transform a list of entities into a DataTable. So let’s review each area in more details. Simpler Code My first objective was to make the API much easier to use than the Azure SDK. I wanted to reduce the amount of code necessary to fetch entities, remove the code needed to add automatic retries and handle transient conditions, and give additional control, such as a way to cancel operations, obtain basic statistics on the calls, and control the maximum number of REST calls the API generates in an attempt to avoid throttling conditions in the first place (something you cannot do with the Azure SDK at this time). Strongly Typed Before diving into the code, the following examples rely on a strongly typed class called MyData. The way MyData is defined for the Azure SDK is similar to the Enzo Azure API, with the exception that they inherit from different classes. With the Azure SDK, classes that represent entities must inherit from TableServiceEntity, while classes with the Enzo Azure API must inherit from BaseAzureTable or implement a specific interface. // With the SDK public class MyData1 : TableServiceEntity { public string Message { get; set; } public string Level { get; set; } public string Severity { get; set; } } // With the Enzo Azure API public class MyData2 : BaseAzureTable { public string Message { get; set; } public string Level { get; set; } public string Severity { get; set; } } Simpler Code Now that the classes representing an Azure Table entity are defined, let’s review the methods that the Azure SDK would look like when fetching all the entities from an Azure Table (note the use of a few variables: the _tableName variable stores the name of the Azure Table, and the ConnectionString property returns the connection string for the Storage Account containing the table): // With the Azure SDK public List<MyData1> FetchAllEntities() { CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConnectionString); CloudTableClient tableClient = storageAccount.CreateCloudTableClient(); TableServiceContext serviceContext = tableClient.GetDataServiceContext(); CloudTableQuery<MyData1> partitionQuery = (from e in serviceContext.CreateQuery<MyData1>(_tableName) select new MyData1() { PartitionKey = e.PartitionKey, RowKey = e.RowKey, Timestamp = e.Timestamp, Message = e.Message, Level = e.Level, Severity = e.Severity }).AsTableServiceQuery<MyData1>(); return partitionQuery.ToList(); } This code gives you automatic retries because the AsTableServiceQuery does that for you. Also, note that this method is strongly-typed because it is using LINQ. Although this doesn’t look like too much code at first glance, you are actually mapping the strongly-typed object manually. So for larger entities, with dozens of properties, your code will grow. And from a maintenance standpoint, when a new property is added, you may need to change the mapping code. You will also note that the mapping being performed is optional; it is desired when you want to retrieve specific properties of the entities (not all) to reduce the network traffic. If you do not specify the properties you want, all the properties will be returned; in this example we are returning the Message, Level and Severity properties (in addition to the required PartitionKey, RowKey and Timestamp). The Enzo Azure API does the mapping automatically and also handles automatic reties when fetching entities. The equivalent code to fetch all the entities (with the same three properties) from the same Azure Table looks like this: // With the Enzo Azure API public List<MyData2> FetchAllEntities() { AzureTable at = new AzureTable(_accountName, _accountKey, _ssl, _tableName); List<MyData2> res = at.Fetch<MyData2>("", "Message,Level,Severity"); return res; } As you can see, the Enzo Azure API returns the entities already strongly typed, so there is no need to map the output. Also, the Enzo Azure API makes it easy to specify the list of properties to return, and to specify a filter as well (no filter was provided in this example; the filter is passed as the first parameter). Fetch Strategies Both approaches discussed above fetch the data sequentially. In addition to the linear/sequential fetch methods, the Enzo Azure API provides specific fetch strategies. Fetch strategies are designed to prepare a set of REST calls, executed in parallel, in a way that performs faster that if you were to fetch the data sequentially. For example, if the PartitionKey is a GUID string, you could prepare multiple calls, providing appropriate filters ([‘a’, ‘b’[, [‘b’, ‘c’[, [‘c’, ‘d[, …), and send those calls in parallel. As you can imagine, the code necessary to create these requests would be fairly large. With the Enzo Azure API, two strategies are provided out of the box: the GUID and List strategies. If you are interested in how these strategies work, see the Enzo Azure API Online Help. Here is an example code that performs parallel requests using the GUID strategy (which executes more than 2 t o3 times faster than the sequential methods discussed previously): public List<MyData2> FetchAllEntitiesGUID() { AzureTable at = new AzureTable(_accountName, _accountKey, _ssl, _tableName); List<MyData2> res = at.FetchWithGuid<MyData2>("", "Message,Level,Severity"); return res; } Faster Results With Sequential Fetch Methods Developing a faster API wasn’t a primary objective; but it appears that the performance tests performed with the Enzo Azure API deliver the data a little faster out of the box (5%-10% on average, and sometimes to up 50% faster) with the sequential fetch methods. Although the amount of data is the same regardless of the approach (and the REST calls are almost exactly identical), the object mapping approach is different. So it is likely that the slight performance increase is due to a lighter API. Using LINQ offers many advantages and tremendous flexibility; nevertheless when fetching data it seems that the Enzo Azure API delivers faster. For example, the same code previously discussed delivered the following results when fetching 3,000 entities (about 1KB each). The average elapsed time shows that the Azure SDK returned the 3000 entities in about 5.9 seconds on average, while the Enzo Azure API took 4.2 seconds on average (39% improvement). With Fetch Strategies When using the fetch strategies we are no longer comparing apples to apples; the Azure SDK is not designed to implement fetch strategies out of the box, so you would need to code the strategies yourself. Nevertheless I wanted to provide out of the box capabilities, and as a result you see a test that returned about 10,000 entities (1KB each entity), and an average execution time over 5 runs. The Azure SDK implemented a sequential fetch while the Enzo Azure API implemented the List fetch strategy. The fetch strategy was 2.3 times faster. Note that the following test hit a limit on my network bandwidth quickly (3.56Mbps), so the results of the fetch strategy is significantly below what it could be with a higher bandwidth. Additional Methods The API wouldn’t be complete without support for a few important methods other than the fetch methods discussed previously. The Enzo Azure API offers these additional capabilities: - Support for batch updates, deletes and inserts - Conversion of entities to DataRow, and List<> to a DataTable - Extension methods for Delete, Merge, Update, Insert - Support for asynchronous calls and cancellation - Support for fetch statistics (total bytes, total REST calls, retries…) For more information, visit http://www.bluesyntax.net or go directly to the Enzo Azure API page (http://www.bluesyntax.net/EnzoAzureAPI.aspx). About Herve Roggero Herve Roggero, Windows Azure MVP, is the founder of Blue Syntax Consulting, a company specialized in cloud computing products and services. Herve's experience includes software development, architecture, database administration and senior management with both global corporations and startup companies. Herve holds multiple certifications, including an MCDBA, MCSE, MCSD. He also holds a Master's degree in Business Administration from Indiana University. Herve is the co-author of "PRO SQL Azure" from Apress and runs the Azure Florida Association (on LinkedIn: http://www.linkedin.com/groups?gid=4177626). For more information on Blue Syntax Consulting, visit www.bluesyntax.net.

Read the article
Do you count a Masters in CS as a negative? [closed]

- by Pete Hodgson

In my experience interviewing developers I feel like candidates who've achieved a Masters in Comp Sci tend to be worse programmers on average that those who don't have a Masters. Is that just me, or have others noticed this phenomenon? If so, why would that be the case? UPDATE I appreciate the thoughtful comments. I think I should have been clearer in the comparison I'm making. Given two candidates who graduated from college around the same time, someone who went on to gain a Masters seems on average to be a worse programmer than someone who spent all their time in industry.

Read the article
easy visualization of usage statistics (web app)

- by sova

I have some usage queries for my web app's database, the results of which I want to display graphically. Is there an easy-to-use api that exists for this purpose? I want to show things like average query-time per user (a small user-base), average query time per day, and things like that. I think it would be cool to show these on a two-axis graph. I am displaying this data on my site, so a jQuery/javascript/html solution for rendering information into graphs would be ideal. Thank you :) P.S. I wasn't sure if I should ask this on SO, but I am looking more for which product to use, not how to program with it.

Read the article
How are Reads Distributed in a Workload

- by Bill Graziano

People have uploaded nearly one millions rows of trace data to TraceTune. That’s enough data to start to look at the results in aggregate. The first thing I want to look at is logical reads. This is the easiest metric to identify and fix. When you upload a trace, I rank each statement based on the total number of logical reads. I also calculate each statement’s percentage of the total logical reads. I do the same thing for CPU, duration and logical writes. When you view a statement you can see all the details like this: This single statement consumed 61.4% of the total logical reads on the system while we were tracing it. I also wanted to see the distribution of reads across statements. That graph looks like this: On average, the highest ranked statement consumed just under 50% of the reads on the system. When I tune a system, I’m usually starting in one of two modes: this “piece” is slow or the whole system is slow. If a given piece (screen, report, query, etc.) is slow you can usually find the specific statements behind it and tune it. You can make that individual piece faster but you may not affect the whole system. When you’re trying to speed up an entire server you need to identity those queries that are using the most disk resources in aggregate. Fixing those will make them faster and it will leave more disk throughput for the rest of the queries. Here are some of the things I’ve learned querying this data: The highest ranked query averages just under 50% of the total reads on the system. The top 3 ranked queries average 73% of the total reads on the system. The top 10 ranked queries average 91% of the total reads on the system. Remember these are averages across all the traces that have been uploaded. And I’m guessing that people mainly upload traces where there are performance problems so your mileage may vary. I also learned that slow queries aren’t the problem. Before I wrote ClearTrace I used to identify queries by filtering on high logical reads using Profiler. That picked out individual queries but those rarely ran often enough to put a large load on the system. If you look at the execution count by rank you’d see that the highest ranked queries also have the highest execution counts. The graph would look very similar to the one above but flatter. These queries don’t look that bad individually but run so often that they hog the disk capacity. The take away from all this is that you really should be tuning the top 10 queries if you want to make your system faster. Tuning individually slow queries will help those specific queries but won’t have much impact on the system as a whole.

Read the article
easy visualization of usage statistics (web app)

- by sova

I have some usage queries for my web app's database, the results of which I want to display graphically. Is there an easy-to-use api that exists for this purpose? I want to show things like average query-time per user (a small user-base), average query time per day, and things like that. I think it would be cool to show these on a two-axis graph. I am displaying this data on my site, so a jQuery/javascript/html solution for rendering information into graphs would be ideal. Thank you :) P.S. I wasn't sure if I should ask this on SO, but I am looking more for which product to use, not how to program with it.

Read the article
Was a Big Fish in a Little Pond, Am Now a Little Fish in a Big Pond. How Do I Grow? [closed]

- by Ziv

I've finished high school where I was in the top three in my class, I studied a little and there too I was pretty much Big Fish in a bigger pond than high school. Now I got into my first job in a very big company, there are some incredibly talented programmers and researchers here (mostly in departments not related to mine) and for the first time I really feel like I'm incredibly average - I do not want to be average. I read technical books all the time, I try to code on my personal time but I don't feel like that's enough. What can I do to become a leading programmer again in this big company? Is there anything specifically that can be done to make myself known here? This is a very big company so in order to advance you must be very good and shine in your field.

Read the article
Ubuntu Gnome 14.04 - 100% CPU usage alternating between cores

- by AwDeOh

I've noticed my Ubuntu Gnome 14.04 has been getting a bit sluggish lately - things like Gnome Shell overview animation are jerky where they were lightning fast, Elder Scrolls Online is stuttering and dropping to low FPS where I previously had a solid 50-60 fps. Out of interest I looked at the CPU History, and when running nothing but the system monitor, I was getting this: That was 15 minutes ago. The 100% load seemed to be alternating between the cores. PC specs: i3 2130 processor. 8gb DDR3 RAM. ASUS P8-Z77M motherboard. Samsung 128gb SSD I've been trying to reproduce the problem, and while I'm not getting the 100% any more at idle, the system monitor is showing an average load of about 20-30%, that's with just Chrome and the System Monitor open. Oddly, if I touch nothing, it'll average out to about 20% - if I start moving the mouse around and do some typing, it's closer to 40%. Is this normal? Any help appreciated, I wouldn't even know where to start here..

Read the article
Feedback from SQLBits 8

- by Peter Larsson

This years SQLBits occurred in Brighton. Although I didn’t have the opportunity to attend the full conference, I did a presentation at Saturday. Getting to Brighton was easy. Drove to Copenhagen airport at 0415, flew 0605 and arrived at Gatwick 0735. Then I took the direct train to Brighton and showed up at 0830, just one hour before presenting. This was the easy part. Getting home was much worse. Presentation ended at 1030 and I had to rush to the train station to get back to London, change to tube for Heathrow. Made it at the gate just 15 seconds before closing. That included a half mile run in the airport… Anyway, yesterday I got the feedback for my presentation. It does look good, especially since English is not my first language. This is the first graph Seems to be just halfway between conference average and best session. I can live with that. Second graph shows more detail about attendees voting. It also look acceptable. A wider spread for the 9’s, but it is an inevitable effect from how attendees percept the session. I did get a lot of 8’s and the lower grades in an descending order. The two people voting 4 and 5 didn’t say why they voted this so I don’t know how to remedy this. Third graph is about each category of votes. Again, I find this acceptable. The Session abstract and Speaker’s knowledge seems to follow attendees expectations compared to conference average. I seem to have met the attendees expectations (and some more) for the other four categories, also compared to conference average. Since this did encourage me, I believe I will present some more at future meetings. I do have a new presentation about something all developers are doing every day but they may not know it. I will also cover this new topic in the next Deep Dives II book. Stay tuned! //Peter

Read the article
Is the tap-to-click issue solved

- by AWE

I'm just an average Joe when it comes to computing (maybe less then the average Joe) but I hate tap-to-click. In system and settings there is no touchpad tab? Is it true that this has been fixed? I'm using Dell inspiron N5110 xinput list: ? Virtual core pointer id=2 [master pointer (3)] ? ? Virtual core XTEST pointer id=4 [slave pointer (2)] ? ? PS/2 Generic Mouse id=13 [slave pointer (2)] This is really strange because Dell is one of the top manufacturers in laptops and Ubuntu one of the top distros in Linux and Canonical claims that they are working closely with Dell.

Read the article
ubuntu 12.04: Why my laptop is consuming more power than 11.10?

- by sanz

I had 11.10 X86 on my Asus laptop (sandybridge, Nvidia 520M). I had 8 hours' battery life with Bumblebee and Jupiter. Average battery discharge rate was around 10w. Later I changed, not upgraded, to 12.04.1 AMD64. I installed Jupiter. But there is no "restricted drivers" available so I guess Bumblebee will not work. So I removed nvidia drivers. Now I only get 4 hours' battery life. Average battery discharge rate is around 19w. The removal of nvidia driver did not make any difference. What's the cause? Nvidia video card not disabled or 64b version of Ubuntu?

Read the article

Search Results

Search found 2156 results on 87 pages for 'weighted average'.

Page 14/87 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >

- by user1299656

- by Tristan

- by usersmarvin_

- by user1303781

- by JoshReuben

- by user1592845

- by jamiet

- by Chris Hoffman

- by James Michael Hare

- by pinaldave

- by Kirill Daybov

- by Rober

- by Pete Hodgson

- by hephestos

- by Herve Roggero

- by Pete Hodgson

- by sova

- by Bill Graziano

- by sova

- by Ziv

- by AwDeOh

- by Peter Larsson

- by AWE

- by sanz

< Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >