Search Results

Search found 75 results on 3 pages for 'deviation'.

Page 3/3 | < Previous Page | 1 2 3 

  • Generating a random displacement on the unit sphere

    - by becko
    Given a unit vector n, I need to generate, as fast as possible, another random unit vector m. The deviation of m from n should be on the order of a positive parameter sigma, and the distribution of m on the unit sphere should be symmetrical around n. I have no specific requirements on the representation of unit vectors, so you can use spherical angles, Cartesian coordinates, or whatever turns out to be convenient. Also, there are no precise requirements on the probability distributions used, as long as it decays when m deviates more than sigma from n. I am working with gsl and C. I have come up with a somewhat convoluted method using Cartesian coordinates. I will post it later if it is useful, but I would like to see people's ideas.

    Read the article

  • R: What are the best functions to deal with concatenating and averaging values in a data.frame?

    - by John
    I have a data.frame from this code: my_df = data.frame("read_time" = c("2010-02-15", "2010-02-15", "2010-02-16", "2010-02-16", "2010-02-16", "2010-02-17"), "OD" = c(0.1, 0.2, 0.1, 0.2, 0.4, 0.5) ) which produces this: > my_df read_time OD 1 2010-02-15 0.1 2 2010-02-15 0.2 3 2010-02-16 0.1 4 2010-02-16 0.2 5 2010-02-16 0.4 6 2010-02-17 0.5 I want to average the OD column over each distinct read_time (notice some are replicated others are not) and I also would like to calculate the standard deviation, producing a table like this: > my_df read_time OD stdev 1 2010-02-15 0.15 0.05 5 2010-02-16 0.3 0.1 6 2010-02-17 0.5 0 Which are the best functions to deal with concatenating such values in a data.frame?

    Read the article

  • .Net Sql Client Provider

    - by sameer
    Have come across a situation where in, if a stored procedure is executed in Query Analyser its execution time is less than a second. But when same Stored Procedure is executed using .NET Sql Client Provide. it is taking 61 seconds. Therefore inorder to troubleshoot this issue we went to SQL Profiler we find the request come to SQL Server less then a second but execution completed after 60 seconds. Can anybody suggest why we have such a deviation. Query is a simple as give below SELECT distinct p1.productID, p1.description FROM Details V INNER JOIN Product P ON V.ProductID=P.ProductID INNER JOIN product p1 on p1.productID=p.parentID WHERE V.MarketID='1159' AND V.FinancialYear='1213' AND V.LEPeriodID= '75' AND p1.parentID=100024 AND p1.statusID = 1 ORDER BY description

    Read the article

  • Count unique visitors by group of visited places

    - by Mathieu
    I'm facing the problem of counting the unique visitors of groups of places. Here is the situation: I have visitors that can visit places. For example, that can be internet users visiting web pages, or customers going to restaurants. A visitor can visit as much places as he wishes, and a place can be visited by several visitors. A visitor can come to the same place several times. The places belong to groups. A group can obviously contain several places, and places can belong to several groups. Given that, for each visitor, we can have a list of visited places, how can I have the number of unique visitors per group of places? Example: I have visitors A, B, C and D; and I have places x, y and z. I have these visiting lists: [ A -> [x,x,y,x], B -> [], C -> [z,z], D -> [y,x,x,z] ] Having these number of unique visitors per place is quite easy: [ x -> 2, // A and D visited x y -> 2, // A and D visited y z -> 2 // C and D visited z ] But if I have these groups: [ G1 -> [x,y,z], G2 -> [x,z], G3 -> [x,y] ] How can I have this information? [ G1 -> 3, // A, C and D visited x or y or z G2 -> 3, // A, C and D visited x or z G3 -> 2 // A and D visited x or y ] Additional notes : There are so many places that it is not possible to store information about every possible group; It's not a problem if approximation are made. I don't need 100% precision. Having a fast algorithm that tells me that there were 12345 visits in a group instead of 12543 is better than a slow algorithm telling the exact number. Let's say there can be ~5% deviation. Is there an algorithm or class of algorithms that addresses this type of problem?

    Read the article

  • Challenge 19 – An Explanation of a Query

    - by Dave Ballantyne
    I have received a number of requests for an explanation of my winning query of TSQL Challenge 19. This involved traversing a hierarchy of employees and rolling a count of orders from subordinates up to superiors. The first concept I shall address is the hierarchyId , which is constructed within the CTE called cteTree.   cteTree is a recursive cte that will expand the parent-child hierarchy of the personnel in the table @emp.  One useful feature with a recursive cte is that data can be ‘passed’ from the parent to the child data.  The hierarchyId column is similar to the hierarchyId data type that was introduced in SQL Server 2008 and represents the position of the person within the organisation. Let us start with a simplistic example Albert manages Bob and Eddie.  Bob manages Carl and Dave. The hierarchyId will represent each person’s position in this relationship in a single field.  In this simple example we could append the userID together into a varchar field as detailed below. This will enable us to select a branch of the tree by filtering using Where hierarchyId  ‘1,2%’ to select Bob and all his subordinates.  Naturally, this is not comprehensive enough to provide a full solution, but as opposed to concatenating the Id’s together into a varchar datatyped column, we can apply the same theory to a varbinary.  By CASTing the ID’s into a datatype of varbinary(4) ,4 is used as 4 bytes of data are used to store an integer and building a hierarchyId  from those.  For example: The important point to bear in mind for later in the query is that the binary data generated is 'byte order comparable'. ie We can ORDER a dataset with it and the resulting data, will be in the order required. Now, would probably be a good time to download the example file and, after the cte ‘cteTree’, uncomment the line ‘select * from cteTree’.  Mark this and all prior code and execute.  This will show you how this theory directly relates to the actual challenge data.  The only deviation from the above, is that instead of using the ID of an employee, I have used the row_number() ranking function to order each level by LastName,Firstname.  This enables me to order by the HierarchyId in the final result set so that the result set is in the required order. Your output should be something like the below.  Notice also the ‘Level’ Column that contains the depth that the employee is within the tree.  I would encourage you to ‘play’ with the query, change the order in the row_number() or the length of the cast in the hierarchyId to see how that effects the outcome.  The next cte, ‘cteTreeWithOrderCount’, is a join between cteTree and the @ord table, and COUNT’s the number of orders per employee.  A LEFT JOIN is employed here to account for the occasion where an employee has made no sales.   Executing a ‘Select * from cteTreeWithOrderCount’ will return the result set as below.  The order here is unimportant as this is only a staging point of the data and only the final result set in a cte chain needs an Order by clause, unless TOP is utilised. cteExplode joins the above result set to the tally table (Nums) for Level Occurances.  So, if level is 2 then 2 rows are required.  This is done to expand the dataset, to create a new column (PathInc), which is the (n+1) integers contained within the heirarchyid.  For example, with the data for Robert King as given above, the below 3 rows will be returned. From this you can see that the pathinc column now contains the values for Andrew Fuller and Steven Buchanan who are Robert King’s superiors within the tree.    Finally cteSumUp, sums the orders for each person and their subordinates using the PathInc generated above, and the final select does the final simple mathematics and filters to restrict the result set to only the ‘original’ row per employee.

    Read the article

  • How does the Trash Can work, and where can I find official documentation, reference, or specification for it?

    - by MestreLion
    When trying to manage trash can from mounted NTFS volumes, I ended up reading FreeDesktop.org's reference on it. Poking around and doing some tests, I realized Ubuntu/Gnome does not follow the specs 100%. Here's why: For non-/ partitions, it always uses <driveroot>/.Trash-<uid>, It never used <driveroot>/.Trash/<uid>, even when i created it in advance. While this works, it's annoying: if I have 15 users, I end up with 15 /.Trash-xxx folders in my drive, while the other approach would still give a single folder (with 15 sub-folders). That "pollution" in my drives is very unpleasant. And specs say "If an $topdir/.Trash directory is absent, an $topdir/.Trash-$uid directory is to be used". Well, it IS present, so why does it never use it? root trash does not work, at least not out of the box. Open nautilus as root and click on trash; it gives an error. Try to delete any file, it says "it can't move to trash". Ok, I know this can be fixed by creating /root/.local/share. But specs says "A “home trash” directory SHOULD be automatically created for any new user. If this directory is needed for a trashing operation but does not exist, the implementation SHOULD automatically create it, without any warnings or delays.". Why the error then? Bug? Why must I change /etc/fstab entries for mounted volumes, adding options like uid and guid, if the volumes are already mounted as RW for everyone? These are just some examples of deviation from the standard. So, the question is: "If Ubuntu does not adhere 100% to the spec, HOW exactly does the trash work? WHERE can i find a technical reference for Ubuntu's implementation of the trash?" By the way: if Ubuntu does happen to follow specs, please tell me what I am doing wrong, especially regarding the /.Trash-<uid> vs /.Trash/<uid> issue. Thanks! EDIT: Some more info: If a given fs has no support for the sticky bit (VFAT, NTFS), it probably doesn't have for permissions either (at least VFAT surely doesn't). So what prevents one user from purging / restoring other users' ./Trash-xxx ? If one can read/write his own Trash, one can do the same for the whole drive, including other's trashes, correct? Or does Gnome have some kind of "extra" protection on ./Trash-xxx folders on VFAT/NTFS fs? If Linux can "emulate" file permissions on NTFS mounting by editing /fstab uid and gid options, can it also "emulate" the sticky bit? I would really prefer to use /.Trash/xxx format... For the root issue: for the / partition, I can use trash as root, and it goes to /root/.local/shate/Trash. But if I click on Nautilus "Trash" (as root), I get an error. Don't you? So files are correctly trashed, but I can't access it. All I can do is manually "purge" them (by deleting files on /root/.local/shate/Trash), but restoring would be very tricky (opening info files and manually moving, etc.). For non-/ partitions (or at least for VFAT/NTFS), I can not even use trash as root: it does not create a ./Trash-0 folder, it simply says "Cannot trash, want to permanently delete?" Why? About fstab: i use it for a permanent mount for my NTFS partitions. I have several, and if not "pre-mounted" they really clutter the desktop and/or Nautilus. I'd rather have it pre-mounted, integrated in my fs, in mounts like /data , /windows/xp , /windows/vista , and so on, and leave /media and its "mount/unmount" flexibility just for truly removable drives. So, if Ubuntu/Gnome truly follows the spec, is there any way to fix the root issues and to "emulate" the sticky bit for (at least) my fstab'ed NTFS fixed partitions?

    Read the article

  • How does Trash Can works? Where can i find official specification / documentation / reference about it?

    - by MestreLion
    When trying to manage trash can from mounted NTFS volumes, I ended up reading FreeDesktop.org's reference on it. Poking around and doing some tests, I realized Ubuntu/Gnome does not follow the specs 100%. Here's why: For non-/ partitions, it always use <driveroot>/.Trash-<uid>, It never used <driveroot>/.Trash/<uid>, even when i created it in advance. While this works, its annoying: if i have 15 users, i end up with 15 /.Trash-xxx folders in my drive, while the other approach would still give a single folder (with 15 sub-folders). That "pollution" in my drives is very unpleasant. And specs say "If an $topdir/.Trash directory is absent, an $topdir/.Trash-$uid directory is to be used". Well, it IS present, so why it never uses it? root trash does not work, at least not out of the box. Open nautilus as root and click on trash, it gives error. Try to delete any file, it says "it cant move to trash". Ok, i know this can be fixed by creating /root/.local/share. But specs says "A “home trash” directory SHOULD be automatically created for any new user. If this directory is needed for a trashing operation but does not exist, the implementation SHOULD automatically create it, without any warnings or delays.". Why error then? Bug? Why do i must change /etc/fstab entries for mounted volumes, adding options like uid and guid, if the volumes are already mounted as RW for everyone? These are just some examples of deviation from standard. So, the question is: "If Ubuntu does not adhere 100% to the spec, HOW exactly does the trash work? WHERE can i find technical reference about Ubuntu's implementation of the trash?" By the way: if Ubuntu does happen to follow specs, please tell me what am i doing wrong, specially regarding the /.Trash-<uid> vs /.Trash/<uid> issue. Thanks! EDIT: Some more info: If a given fs has no support for sticky bit (VFAT, NTFS), it probably dont have for permitions either (at least VFAT surely doesnt). So what prevents one user for purging / restoring other users ./Trash-xxx ? If one can read/write his own Trash, he can also do the same for the whole drive, including other's trashes, isnt it? Or does Gnome has any "extra" protection on ./Trash-xxx folders on VFAT/NTFS fs? If Linux can "emulate" file permitions on NTFS mounting by editing /fstab uid and gid options, can it also "emulate" the sticky bit? I would really want to use /.Trash/xxx format... For the root issue: for the / partition, i can trash as root, and it goes to /root/.local/shate/Trash. But if i click on Nautilus "Trash" (as root), i get an error. Dont you? So files are correctly trashed, but i cant access it. All i can do is manually "purge" them (by deleting files on /root/.local/shate/Trash), but restoring would be very tricky (opening info files and manually moving, etc) For non-/ partitions (or at least for VFAT/NTFS), I can not even trash as root: it does not create a ./Trash-0 folder, it simply says "Cannot trash, want to permantly delete?" Why? About fstab: i use it for a permanent mount for my NTFS partitions. I have several, and if not "pre-mounted" they really cluttter desktop and/or Nautilus. Id rather have it pre mounted, integrated in my fs, in mounts like /data , /windows/xp , /windows/vista , and so on, and leave /media and its "mount/unmount" flexibility just for truly removable drives Si, if Ubuntu/Gnome truly follow the spec, is there any way to fix the root issues and to "emulate" the sticky bit for (at least) my fstab'ed NTFS fixed partitions?

    Read the article

  • Temperature anomaly calculation of time series data

    - by neel
    I have a time series like following: Data <- structure(list(Year = c(1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L), Month = c(8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L), Day = c(30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 28L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L), Hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), temperature = c(72.5, 64, 62.5, 64, 64, 53, 52, 52, 45.5, 49, 50, 50, 59, 63.5, 69.5, 61, 61, NaN, NaN, 39.5, 37, 45.5, 45, 39, 43.5, 52, 53, 56, 64, 66, 66.5, 73.5, 81, 85, 89.5, 87.5, 88.5, 83, 84.5, 74, 60.5, 59, 53, 60.5, 62.5, 64.5, 63, 62, 65.5)), .Names = c("Year", "Month", "Day", "Hour", "temperature"), class = "data.frame", row.names = c(NA, -49L)) and I have to calculate standardized anomaly. The steps to calculate the anomalies are following: Monthly premature departures from the long-term (1991-2007) average are obtained. Then standardized by dividing by the standard deviation of monthly temperature. The standardized monthly anomalies are then weighted by multiplying by the fraction of the average temperature for the given month. These weighted anomalies are then summed over 3 month time period. Can you please help me?

    Read the article

  • How to generate random numbers of lognormal distribution within specific range in Matlab

    - by Harpreet
    My grain sizes are defined as D=[1.19,1.00,0.84,0.71,0.59,0.50,0.42]. The problem is described below in steps. Grain sizes should follow lognormal distribution. The mean of the grain sizes is fixed as 0.84 and the standard deviation should be as low as possible but not zero. 90% of the grains (by weight %) fall in the size range of 1.19 to 0.59, and the rest 10% fall in size range of 0.50 to 0.42. Now I want to find the probabilities (weight percentage) of the grains falling in each grain size. It is allowable to split this grain size distribution into further small sizes but it must always be in the range of 1.19 and 0.42, i.e. 'D' can be continuous but 0.42 < D < 1.19. I need it fast. I tried on my own but I am not able to get the correct result. I am getting negative probabilities (weight percentages). Thanks to anyone who helps. I didn't incorporate the point 3 as I came to know about that condition later. Here are simple steps I tried: %% D=[1.19,1.00,0.84,0.71,0.59,0.50,0.42]; s=0.30; % std dev of the lognormal distribution m=0.84; % mean of the lognormal distribution mu=log(m^2/sqrt(s^2+m^2)); % mean of the associated normal dist. sigma=sqrt(log((s^2/m^2)+1)); % std dev of the associated normal dist. [r,c]=size(D); for i=1:c D(i)=mu+(sigma.*randn(1)); w(i)=(log(D(i))-mu)/sigma; % the probability or the wt. percentage of the grain sizes end grain_size=exp(D); %%

    Read the article

  • [Ruby] Modifying object inside a loop doesn't change object outside of the loop?

    - by Jergason
    I am having problems with modifying objects inside blocks and not getting the expected values outside the blocks. This chunk of code is supposed to transform a bunch of points in 3d space, calculate a score (the rmsd or root mean squared deviation), and store both the score and the set of points that produced that score if it is lower than the current lowest score. At the end, I want to print out the best bunch of points. first = get_transformed_points(ARGV[0]) second = get_transformed_points(ARGV[1]) best_rmsd = first.rmsd(second) best_points = second #transform the points around x, y, and z and get the rmsd. If the new points # have a smaller rmsd, store them. ROTATION = 30 #rotate by ROTATION degrees num_rotations = 360/ROTATION radians = ROTATION * (Math::PI/180) num_rotations.times do |i| second = second * x_rotate num_rotations.times do |j| second = second * y_rotate num_rotations.times do |k| second = second * z_rotate rmsd = first.rmsd(second) if rmsd < best_rmsd then best_points = second best_rmsd = rmsd end end end end File.open("#{ARGV[1]}.out", "w") {|f| f.write(best_points.to_s)} I can print out the points that are getting stored inside the block, and they are getting transformed and stored correctly. However, when I write out the points to a file at the end, they are the same as the initial set of points. Somehow the best_points = second chunk doesn't seem to be doing anything outside of the block. It seems like there are some scoping rules that I don't understand here. I had thought that since I declared and defined best_points above, outside of the blocks, that it would be updated inside the blocks. However, it seems that when the blocks end, it somehow reverts back to the original value. Any ideas how to fix this? Is this a problem with blocks specifically?

    Read the article

  • Fastest reliable way for Clojure (Java) and Ruby apps to communicate

    - by jkndrkn
    Hi There, We have cloud-hosted (RackSpace cloud) Ruby and Java apps that will interact as follows: Ruby app sends a request to Java app. Request consists of map structure containing strings, integers, other maps, and lists (analogous to JSON). Java app analyzes data and sends reply to Ruby App. We are interested in evaluating both messaging formats (JSON, Buffer Protocols, Thrift, etc.) as well as message transmission channels/techniques (sockets, message queues, RPC, REST, SOAP, etc.) Our criteria: Short round-trip time. Low round-trip-time standard deviation. (We understand that garbage collection pauses and network usage spikes can affect this value). High availability. Scalability (we may want to have multiple instances of Ruby and Java app exchanging point-to-point messages in the future). Ease of debugging and profiling. Good documentation and community support. Bonus points for Clojure support. What combination of message format and transmission method would you recommend? Why? I've gathered here some materials we have already collected for review: Comparison of various java serialization options Comparison of Thrift and Protocol Buffers (old) Comparison of various data interchange formats Comparison of Thrift and Protocol Buffers Fallacies of Protocol Buffers RPC features Discussion of RPC in the context of AMQP (Message-Queueing) Comparison of RPC and message-passing in distributed systems (pdf) Criticism of RPC from perspective of message-passing fan Overview of Avro from Ruby programmer perspective

    Read the article

  • Simple aggregating query very slow in PostgreSql, any way to improve?

    - by Ash
    HI I have a table which holds files and their types such as CREATE TABLE files ( id SERIAL PRIMARY KEY, name VARCHAR(255), filetype VARCHAR(255), ... ); and another table for holding file properties such as CREATE TABLE properties ( id SERIAL PRIMARY KEY, file_id INTEGER CONSTRAINT fk_files REFERENCES files(id), size INTEGER, ... // other property fields ); The file_id field has an index. The file table has around 800k lines, and the properties table around 200k (not all files necessarily have/need a properties). I want to do aggregating queries, for example find the average size and standard deviation for all file types. But it's very slow - around 70 seconds for the latter query. I understand it needs a sequential scan, but still it seems too much. Here's the query SELECT f.filetype, avg(size), stddev(size) FROM files as f, properties as pr WHERE f.id = pr.file_id GROUP BY f.filetype; and the explain HashAggregate (cost=140292.20..140293.94 rows=116 width=13) (actual time=74013.621..74013.954 rows=110 loops=1) -> Hash Join (cost=6780.19..138945.47 rows=179564 width=13) (actual time=1520.104..73156.531 rows=179499 loops=1) Hash Cond: (f.id = pr.file_id) -> Seq Scan on files f (cost=0.00..108365.41 rows=1140941 width=9) (actual time=0.998..62569.628 rows=805270 loops=1) -> Hash (cost=3658.64..3658.64 rows=179564 width=12) (actual time=1131.053..1131.053 rows=179499 loops=1) -> Seq Scan on properties pr (cost=0.00..3658.64 rows=179564 width=12) (actual time=0.753..557.171 rows=179574 loops=1) Total runtime: 74014.520 ms Any ideas why it is so slow/how to make it faster?

    Read the article

  • Summary statistics in visual basic

    - by ben
    Below I am trying to write a script the goal of which is to calculate some summary statistics for a few different columns of numbers. I have gotten some help on it up to the "Need help below" mark. But beyond that I am flabergasted as to how to calculate the simple stats (sum, mean, standard deviation, coefficient of variation). I know VB has scripts for these stats, which I have included in my code, but I guess I need to do some extra declaring or something. Advice much appreciated. Thanks. Sub TOAinput() Const n As Integer = 648 Dim stratum(n), hybrid(n), acres(n), hhsz(n), offinc(n) Dim s1 As Integer Dim s2 As Integer Dim i As Integer For i = 1 To n stratum(i) = Worksheets("hhid level").Cells(i + 1, 2).Value Next i s1 = 0 s2 = 0 For i = 1 To n If stratum(i) = 1 Then s1 = s1 + 1 Else: s2 = s2 + 1 End If Next i Dim acres1(), hhsz1(), offinc1(), acres2(), hhsz2(), offinc2() ReDim acres1(s1), hhsz1(s1), offinc1(s1), acres2(s2), hhsz2(s2), offinc2(s2) 'data infiles: acres, hh size, off-farm income, For i = 1 To n acres(i) = Worksheets("hhid level").Cells(i + 1, 4).Value hhsz(i) = Worksheets("hhid level").Cells(i + 1, 5).Value offinc(i) = Worksheets("hhid level").Cells(i + 1, 6).Value Next i s1 = 0 s2 = 0 For i = 1 To n If stratum(i) = 1 Then s1 = s1 + 1 acres1(s1) = acres(i) hhsz1(s1) = hhsz(i) offinc1(s1) = offinc(i) Else: s2 = s2 + 1 acres2(s2) = acres(i) hhsz2(s2) = hhsz(i) offinc2(s2) = offinc(i) End If Next i '**************************** 'Need help below '**************************** Dim sumac1, sumac2, mhhsz1, mhhsz2, cvhhsz1, cvhhsz2 sumac1 = Sum(acres1) sumac2 = Sum(acres2) mhhsz1 = Average(hhsz1) mhhsz2 = Average(hhsz2) cvhhsz1 = StDev(hhsz1) / Average(hhsz1) cvhhsz2 = StDev(hhsz2) / Average(hhsz2) End Sub

    Read the article

  • R Random Data Sets within loops

    - by jugossery
    Here is what I want to do: I have a time series data frame with let us say 100 time-series of length 600 - each in one column of the data frame. I want to pick up 4 of the time-series randomly and then assign them random weights that sum up to one (ie 0.1, 0.5, 0.3, 0.1). Using those I want to compute the mean of the sum of the 4 weighted time series variables (e.g. convex combination). I want to do this let us say 100k times and store each result in the form ts1.name, ts2.name, ts3.name, ts4.name, weight1, weight2, weight3, weight4, mean so that I get a 9*100k df. I tried some things already but R is very bad with loops and I know vector oriented solutions are better because of R design. Thanks Here is what I did and I know it is horrible The df is in the form v1,v2,v2.....v100 1,5,6,.......9 2,4,6,.......10 3,5,8,.......6 2,2,8,.......2 etc e=NULL for (x in 1:100000) { s=sample(1:100,4)#pick 4 variables randomly a=sample(seq(0,1,0.01),1) b=sample(seq(0,1-a,0.01),1) c=sample(seq(0,(1-a-b),0.01),1) d=1-a-b-c e=c(a,b,c,d)#4 random weights average=mean(timeseries.df[,s]%*%t(e)) e=rbind(e,s,average)#in the end i get the 9*100k df } The procedure runs way to slow. EDIT: Thanks for the help i had,i am not used to think R and i am not very used to translate every problem into a matrix algebra equation which is what you need in R. Then the problem becomes a little bit complex if i want to calculate the standard deviation. i need the covariance matrix and i am not sure i can if/how i can pick random elements for each sample from the original timeseries.df covariance matrix then compute the sample variance (t(sampleweights)%*%sample_cov.mat%*%sampleweights) to get in the end the ts.weighted_standard_dev matrix Last question what is the best way to proceed if i want to bootstrap the original df x times and then apply the same computations to test the robustness of my datas thanks

    Read the article

  • Monitoring tools that can take high rate and high volume?

    - by Jon Watte
    We're using Cacti with RRDTool to monitor and graph about 100,000 counters spread across about 1,000 Linux-based nodes. However, our current setup generally only gives us 5-minute graphs (with some data being minute-based); we often make changes where seeing feedback in "near real time" would be of value. I'd like approximately a week of 5- or 10-second data, a year of 1-minute data, and 5 years of 10-minute data. I have SSD disks and a dual-hexa-core server to spare. I tried setting up a Graphite/carbon/whisper server, and had about 15 nodes pipe to it, but it only has "average" for the retention function when promoting to older buckets. This is almost useless -- I'd like min, max, average, standard deviation, and perhaps "total sum" and "number of samples" or perhaps "95th percentile" available. The developer claims there's a new back-end "in beta" that allows you to write your own function, but this appears to still only do 1:1 retention (when saving older data, you really want the statistics calculated into many streams from a single input. Also, "in beta" seems a little risky for this installation. If I'm wrong about this assumption, I'd be happy to be shown my error! I've heard Zabbix recommended, but it puts data into MySQL or some other SQL database. 100,000 counters on a 5 second interval means 20,000 tps, and while I have an SSD, I don't have an 8-way RAID-6 with battery backup cache, which I think I'd need for that to work out :-) Again, if that's actually something that's not a problem, I'd be happy to be shown the error of my ways. Also, can Zabbix do the single data stream - promote with statistics thing? Finally, Munin claims to have a new 2.0 coming out "in beta" right now, and it boasts custom retention plans. However, again, it's that "in beta" part -- has anyone used that for real, and at scale? How did it perform, if so? I'm almost thinking about using a graphing front-end (such as Graphite) and rolling my own retention backend with a simple layer on top of mmap() and some stats. That wouldn't be particularly hard, and would probably perform very well, letting the kernel figure out the balance between frequency of flushing to disk and process operations. Any other suggestions I should look into? Note: it has to have shown itself able to sustain the kinds of data loads I'm suggesting above; if you can point at the specific implementation you're referencing, so much the better!

    Read the article

  • Short Season, Long Models - Dealing with Seasonality

    - by Michel Adar
    Accounting for seasonality presents a challenge for the accurate prediction of events. Examples of seasonality include: ·         Boxed cosmetics sets are more popular during Christmas. They sell at other times of the year, but they rise higher than other products during the holiday season. ·         Interest in a promotion rises around the time advertising on TV airs ·         Interest in the Sports section of a newspaper rises when there is a big football match There are several ways of dealing with seasonality in predictions. Time Windows If the length of the model time windows is short enough relative to the seasonality effect, then the models will see only seasonal data, and therefore will be accurate in their predictions. For example, a model with a weekly time window may be quick enough to adapt during the holiday season. In order for time windows to be useful in dealing with seasonality it is necessary that: The time window is significantly shorter than the season changes There is enough volume of data in the short time windows to produce an accurate model An additional issue to consider is that sometimes the season may have an abrupt end, for example the day after Christmas. Input Data If available, it is possible to include the seasonality effect in the input data for the model. For example the customer record may include a list of all the promotions advertised in the area of residence. A model with these inputs will have to learn the effect of the input. It is possible to learn it specific to the promotion – and by the way learn about inter-promotion cross feeding – by leaving the list of ads as it is; or it is possible to learn the general effect by having a flag that indicates if the promotion is being advertised. For inputs to properly represent the effect in the model it is necessary that: The model sees enough events with the input present. For example, by virtue of the model lifetime (or time window) being long enough to see several “seasons” or by having enough volume for the model to learn seasonality quickly. Proportional Frequency If we create a model that ignores seasonality it is possible to use that model to predict how the specific person likelihood differs from average. If we have a divergence from average then we can transfer that divergence proportionally to the observed frequency at the time of the prediction. Definitions: Ft = trailing average frequency of the event at time “t”. The average is done over a suitable period of to achieve a statistical significant estimate. F = average frequency as seen by the model. L = likelihood predicted by the model for a specific person Lt = predicted likelihood proportionally scaled for time “t”. If the model is good at predicting deviation from average, and this holds over the interesting range of seasons, then we can estimate Lt as: Lt = L * (Ft / F) Considering that: L = (L – F) + F Substituting we get: Lt = [(L – F) + F] * (Ft / F) Which simplifies to: (i)                  Lt = (L – F) * (Ft / F)  +  Ft This latest expression can be understood as “The adjusted likelihood at time t is the average likelihood at time t plus the effect from the model, which is calculated as the difference from average time the proportion of frequencies”. The formula above assumes a linear translation of the proportion. It is possible to generalize the formula using a factor which we will call “a” as follows: (ii)                Lt = (L – F) * (Ft / F) * a  +  Ft It is also possible to use a formula that does not scale the difference, like: (iii)               Lt = (L – F) * a  +  Ft While these formulas seem reasonable, they should be taken as hypothesis to be proven with empirical data. A theoretical analysis provides the following insights: The Cumulative Gains Chart (lift) should stay the same, as at any given time the order of the likelihood for different customers is preserved If F is equal to Ft then the formula reverts to “L” If (Ft = 0) then Lt in (i) and (ii) is 0 It is possible for Lt to be above 1. If it is desired to avoid going over 1, for relatively high base frequencies it is possible to use a relative interpretation of the multiplicative factor. For example, if we say that Y is twice as likely as X, then we can interpret this sentence as: If X is 3%, then Y is 6% If X is 11%, then Y is 22% If X is 70%, then Y is 85% - in this case we interpret “twice as likely” as “half as likely to not happen” Applying this reasoning to (i) for example we would get: If (L < F) or (Ft < (1 / ((L/F) + 1)) Then  Lt = L * (Ft / F) Else Lt = 1 – (F / L) + (Ft * F / L)  

    Read the article

  • Modeling distribution of performance measurements

    - by peterchen
    How would you mathematically model the distribution of repeated real life performance measurements - "Real life" meaning you are not just looping over the code in question, but it is just a short snippet within a large application running in a typical user scenario? My experience shows that you usually have a peak around the average execution time that can be modeled adequately with a Gaussian distribution. In addition, there's a "long tail" containing outliers - often with a multiple of the average time. (The behavior is understandable considering the factors contributing to first execution penalty). My goal is to model aggregate values that reasonably reflect this, and can be calculated from aggregate values (like for the Gaussian, calculate mu and sigma from N, sum of values and sum of squares). In other terms, number of repetitions is unlimited, but memory and calculation requirements should be minimized. A normal Gaussian distribution can't model the long tail appropriately and will have the average biased strongly even by a very small percentage of outliers. I am looking for ideas, especially if this has been attempted/analysed before. I've checked various distributions models, and I think I could work out something, but my statistics is rusty and I might end up with an overblown solution. Oh, a complete shrink-wrapped solution would be fine, too ;) Other aspects / ideas: Sometimes you get "two humps" distributions, which would be acceptable in my scenario with a single mu/sigma covering both, but ideally would be identified separately. Extrapolating this, another approach would be a "floating probability density calculation" that uses only a limited buffer and adjusts automatically to the range (due to the long tail, bins may not be spaced evenly) - haven't found anything, but with some assumptions about the distribution it should be possible in principle. Why (since it was asked) - For a complex process we need to make guarantees such as "only 0.1% of runs exceed a limit of 3 seconds, and the average processing time is 2.8 seconds". The performance of an isolated piece of code can be very different from a normal run-time environment involving varying levels of disk and network access, background services, scheduled events that occur within a day, etc. This can be solved trivially by accumulating all data. However, to accumulate this data in production, the data produced needs to be limited. For analysis of isolated pieces of code, a gaussian deviation plus first run penalty is ok. That doesn't work anymore for the distributions found above. [edit] I've already got very good answers (and finally - maybe - some time to work on this). I'm starting a bounty to look for more input / ideas.

    Read the article

  • C# performance varying due to memory

    - by user1107474
    Hope this is a valid post here, its a combination of C# issues and hardware. I am benchmarking our server because we have found problems with the performance of our quant library (written in C#). I have simulated the same performance issues with some simple C# code- performing very heavy memory-usage. The code below is in a function which is spawned from a threadpool, up to a maximum of 32 threads (because our server has 4x CPUs x 8 cores each). This is all on .Net 3.5 The problem is that we are getting wildly differing performance. I run the below function 1000 times. The average time taken for the code to run could be, say, 3.5s, but the fastest will only be 1.2s and the slowest will be 7s- for the exact same function! I have graphed the memory usage against the timings and there doesnt appear to be any correlation with the GC kicking in. One thing I did notice is that when running in a single thread the timings are identical and there is no wild deviation. I have also tested CPU-bound algorithms and the timings are identical too. This has made us wonder if the memory bus just cannot cope. I was wondering could this be another .net or C# problem, or is it something related to our hardware? Would this be the same experience if I had used C++, or Java?? We are using 4x Intel x7550 with 32GB ram. Is there any way around this problem in general? Stopwatch watch = new Stopwatch(); watch.Start(); List<byte> list1 = new List<byte>(); List<byte> list2 = new List<byte>(); List<byte> list3 = new List<byte>(); int Size1 = 10000000; int Size2 = 2 * Size1; int Size3 = Size1; for (int i = 0; i < Size1; i++) { list1.Add(57); } for (int i = 0; i < Size2; i = i + 2) { list2.Add(56); } for (int i = 0; i < Size3; i++) { byte temp = list1.ElementAt(i); byte temp2 = list2.ElementAt(i); list3.Add(temp); list2[i] = temp; list1[i] = temp2; } watch.Stop(); (the code is just meant to stress out the memory) I would include the threadpool code, but we used a non-standard threadpool library. EDIT: I have reduced "size1" to 100000, which basically doesn't use much memory and I still get a lot of jitter. This suggests it's not the amount of memory being transferred, but the frequency of memory grabs?

    Read the article

  • CodePlex Daily Summary for Tuesday, October 15, 2013

    CodePlex Daily Summary for Tuesday, October 15, 2013Popular ReleasesFFXIV Crafting Simulator: Crafting Simulator 2.4.1: -Fixed the offset for the new patch (Auto Loading function)iBoxDB.EX - Fast Transactional NoSQL Database Resources: iBoxDB.net fast transactional nosql database 1.5.2: Easily process objects and documents, zero configuration. fast embeddable transactional nosql document database, includes CURD, QueryLanguage, Master-Master-Slave Replication, MVCC, etc. supports .net2, .net4, windows phone, mono, unity3d, node.js , copy and run. http://download-codeplex.sec.s-msft.com/Download?ProjectName=iboxdb&DownloadId=737783 Benchmark with MongoDB Compatibility more platforms for java versionneurogoody: slicebox: this is the slice box jsEvent-Based Components AppBuilder: AB3.AppDesigner.55: Iteration 55 (Feature): Moving of TargetEdge (simple wires only) by mouse.Sandcastle Help File Builder: SHFB v1.9.8.0 with Visual Studio Package: General InformationIMPORTANT: On some systems, the content of the ZIP file is blocked and the installer may fail to run. Before extracting it, right click on the ZIP file, select Properties, and click on the Unblock button if it is present in the lower right corner of the General tab in the properties dialog. This new release contains bug fixes and feature enhancements. There are some potential breaking changes in this release as some features of the Help File Builder have been moved into...SharpConfig: SharpConfig 1.2: Implemented comment parsing. Comments are now part of settings and setting categories. New properties: Setting: Comment PreComments SettingCategory: Comment PreCommentsC++ REST SDK (codename "Casablanca"): C++ REST SDK 1.3.0: This release fixes multiple customer reported issues as well as the following: Full support for Dev12 binaries and project files Full support for Windows XP New sample highlighting the Client and Server APIs : BlackJack Expose underlying native handle to set custom options on http_client Improvements to Listener Library Note: Dev10 binaries have been dropped as of this release, however the Dev10 project files are still available in the Source CodeAD ACL Scanner: 1.3.2: Minor bug fixed: Powershell 4.0 will report: Select—Object: Parameter cannot be processed because the parameter name p is ambiguous.Json.NET: Json.NET 5.0 Release 7: New feature - Added support for Immutable Collections New feature - Added WriteData and ReadData settings to DataExtensionAttribute New feature - Added reference and type name handling support to extension data New feature - Added default value and required support to constructor deserialization Change - Extension data is now written when serializing Fix - Added missing casts to JToken Fix - Fixed parsing large floating point numbers Fix - Fixed not parsing some ISO date ...RESX Manager: ResxManager 0.2.1: FIXED: Many critical bugs have been fixed. New Features Error logging for improved exception handling New toolbar Improvements of user interfaceFast YouTube Downloader: YouTube Downloader 2.2.0: YouTube Downloader 2.2.0VidCoder: 1.5.8 Beta: Added hardware acceleration options: Bicubic OpenCL scaling algorithm, QSV decoding/encoding and DXVA decoding. Updated HandBrake core to SVN 5834. Updated VidCoder setup icon. Fixed crash when choosing the mp4v2 container on x86 and opening on x64. Warning: the hardware acceleration features require specific hardware or file types to work correctly: QSV: Need an Intel processor that supports Quick Sync Video encoding, with a monitor hooked up to the Intel HD Graphics output and the lat...ASP.net MVC Awesome - jQuery Ajax Helpers: 3.5.2: version 3.5.2 - fix for setting single value to multivalue controls - datepicker min max date offset fix - html encoding for keys fix - enable Column.ClientFormatFunc to be a function call that will return a function version 3.5.1 - fixed html attributes rendering - fixed loading animation rendering - css improvements version 3.5 ========================== - autosize for all popups ( can be turned off by calling in js awe.autoSize = false ) - added Parent, Paremeter extensions ...Wsus Package Publisher: Release v1.3.1310.12: Allow the Update Creation Wizard to be set in full screen mode. Fix a bug which prevent WPP to Reset Remote Sus Client ID. Change the behavior of links in the Update Detail Viewer. Left-Click to open, Right-Click to copy to the Clipboard.TerrariViewer: TerrariViewer v7 [Terraria Inventory Editor]: This is a complete overhaul but has the same core style. I hope you enjoy it. This version is compatible with 1.2.0.3 Please send issues to my Twitter or https://github.com/TJChap2840WDTVHubGen - Adds Metadata, thumbnails and subtitles to WDTV Live Hubs: WDTVHubGen.v2.1.6.maint: I think this covers all of the issues. new additions: fixed the thumbnail problem for backgrounds. general clean up and error checking. need to get this put through the wringer and all feedback is welcome.BIDS Helper: BIDS Helper 1.6.4: This BIDS Helper release brings the following new features and fixes: New Features: A new Bus Matrix style report option when you run the Printer Friendly Dimension Usage report for an SSAS cube. The Biml engine is now fully in sync with the supported subset of Varigence Mist 3.4. This includes a large number of language enhancements, bugfixes, and project deployment support. Fixed Issues: Fixed Biml execution for project connections fixing a bug with Tabular Translations Editor not a...MoreTerra (Terraria World Viewer): MoreTerra 1.11.3: =========== =New Features= =========== New Markers added for Plantera's Bulb, Heart Fruits and Gold Cache. Markers now correctly display for the gems found in rock debris on the floor. =========== =Compatibility= =========== Fixed header changes found in Terraria 1.0.3.1Media Companion: Media Companion MC3.581b: Fix in place for TVDB xml issue. New* Movie - General Preferences, allow saving of ignored 'The' or 'A' to end of movie title, stored in sorttitle field. * Movie - New Way for Cropping Posters. Fixed* Movie - Rename of folders/filename. caught error message. * Movie - Fixed Bug in Save Cropped image, only saving in Pre-Frodo format if Both model selected. * Movie - Fixed Cropped image didn't take zoomed ratio into effect. * Movie - Separated Folder Renaming and File Renaming fuctions durin...SmartStore.NET - Free ASP.NET MVC Ecommerce Shopping Cart Solution: SmartStore.NET 1.2.0: HighlightsMulti-store support "Trusted Shops" plugins Highly improved SmartStore.biz Importer plugin Add custom HTML content to pages Performance optimization New FeaturesMulti-store-support: now multiple stores can be managed within a single application instance (e.g. for building different catalogs, brands, landing pages etc.) Added 3 new Trusted Shops plugins: Seal, Buyer Protection, Store Reviews Added Display as HTML Widget to CMS Topics (store owner now can add arbitrary HT...New ProjectsArtezio SharePoint 2013 Workflow Activities: SharePoint Workflow 2013 doesn’t provide activities to work with permissions, we've fixed it using HttpSend activity that makes REST API calls.Dependency.Injection: An attempt to write a really simple dependency injection framework. Does property-based and recursive dependency injection. Handles singletons. Yay!DHGMS SUO Killer: SUO Killer is a Visual Studio extension to deal with the removal of SUO file to mitigate SUO related issues in Visual Studio. This project is written in C#.dynamicsheet: dynamicsheetExcel Comparator: Excel Comparator is an add-in for Microsoft Excel that allows the user to compare a range between two sheets. FetchAIP: FetchAIP is a utility to download the various sections of the Aeronautical Information Publication (AIP) for New Zealand.Fluent Method and Type Builder: Still working on the summary.getboost: NuGet package for Boost framework.Goldstone Forum: WebForms Forum - TelerikAcademy Team ProjectGroupMe Software Development Kit: .NET Software Development Kit for http://groupme.com/ chat service.GSLMS: ----Import Excel Files Into SQL Server: Load Excel files into SQL Database without schema changes.Inaction: ?????????? jBegin: Learning ASP.net MVC from beginning, then here will be the source code for jbegin.comKDG's Statistical Quality Control Solver: This tool will include methods that can solve sample standard deviation, sample variance, median, mode, moving average, percentiles, margin of error, etc.kpi: Key Performance Indicator (KPI)????; visual studio 2010 with .NET 4.0 runtimeLECO Remote Control Client Application: Sample code and binaries are provided to demonstrate the remote control capability of a LECO Cornerstone instrument.LinkPad: My first Windows Store app intended for student to sketch up thoughts and concepts in quick diagrams.Modler.NET - Automating Graphical Data Model Co-Evolution: Modler.NET was the tool created for a Master's thesis project, which automates the co-evolution of graphical data models and the database that they represent.MyFileManager1: SummaryNever Lotto: Korean 465 Lotto Analyzer and Simulator. The real purpose of this project is to show that this kind of lotto things are just shit.NHibernate: The purpose of this project is to demo CRUD operations using NHibernate with Mono in Visual Studio 2012 using C# language. OAuth2 Authorizer: OAuth2 Authorizer helps you get the access code for a standard OAuth2 REST service that implements 3-legged authentication.Regular Expression for Excel: Regular Expression For Excel is an Excel Plugin. It provides a regular expressions EXCEL support. We can use it in the EXCEL function.Service Tester: Service Tester is an Azure Cloud based load testing application targeted at Soap Web Services which allows you to invoke your Web Service by random parameters.Simple TypeScript and C# Class Generator: Simple GUI application to generate compatible class source code for C# and TypeScript for communications between C# and TypeScript. Soccer team management: ---Spanner: No more stringly-typed web development! Build statically typed single page web applications in C#, automatically generating all HTML, JavaScript, and Knockout.

    Read the article

  • Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

    - by Mark Hornick
    The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data.frame. Such a capability is very convenient. The function ddply also has a parallel option that if TRUE, will apply the function in parallel, using the backend provided by foreach. This type of functionality is available through Oracle R Enterprise using the ore.groupApply function. In this blog post, we show a few examples from Sean Anderson's "A quick introduction to plyr" to illustrate the correpsonding functionality using ore.groupApply. To get started, we'll create a demo data set and load the plyr package. set.seed(1) d <- data.frame(year = rep(2000:2014, each = 3),         count = round(runif(45, 0, 20))) dim(d) library(plyr) This first example takes the data frame, partitions it by year, and calculates the coefficient of variation of the count, returning a data frame. # Example 1 res <- ddply(d, "year", function(x) {   mean.count <- mean(x$count)   sd.count <- sd(x$count)   cv <- sd.count/mean.count   data.frame(cv.count = cv)   }) To illustrate the equivalent functionality in Oracle R Enterprise, using embedded R execution, we use the ore.groupApply function on the same data, but pushed to the database, creating an ore.frame. The function ore.push creates a temporary table in the database, returning a proxy object, the ore.frame. D <- ore.push(d) res <- ore.groupApply (D, D$year, function(x) {   mean.count <- mean(x$count)   sd.count <- sd(x$count)   cv <- sd.count/mean.count   data.frame(year=x$year[1], cv.count = cv)   }, FUN.VALUE=data.frame(year=1, cv.count=1)) You'll notice the similarities in the first three arguments. With ore.groupApply, we augment the function to return the specific data.frame we want. We also specify the argument FUN.VALUE, which describes the resulting data.frame. From our previous blog posts, you may recall that by default, ore.groupApply returns an ore.list containing the results of each function invocation. To get a data.frame, we specify the structure of the result. The results in both cases are the same, however the ore.groupApply result is an ore.frame. In this case the data stays in the database until it's actually required. This can result in significant memory and time savings whe data is large. R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> head(res)    year cv.count 1 2000 0.3984848 2 2001 0.6062178 3 2002 0.2309401 4 2003 0.5773503 5 2004 0.3069680 6 2005 0.3431743 To make the ore.groupApply execute in parallel, you can specify the argument parallel with either TRUE, to use default database parallelism, or to a specific number, which serves as a hint to the database as to how many parallel R engines should be used. The next ddply example uses the summarise function, which creates a new data.frame. In ore.groupApply, the year column is passed in with the data. Since no automatic creation of columns takes place, we explicitly set the year column in the data.frame result to the value of the first row, since all rows received by the function have the same year. # Example 2 ddply(d, "year", summarise, mean.count = mean(count)) res <- ore.groupApply (D, D$year, function(x) {   mean.count <- mean(x$count)   data.frame(year=x$year[1], mean.count = mean.count)   }, FUN.VALUE=data.frame(year=1, mean.count=1)) R> head(res)    year mean.count 1 2000 7.666667 2 2001 13.333333 3 2002 15.000000 4 2003 3.000000 5 2004 12.333333 6 2005 14.666667 Example 3 uses the transform function with ddply, which modifies the existing data.frame. With ore.groupApply, we again construct the data.frame explicilty, which is returned as an ore.frame. # Example 3 ddply(d, "year", transform, total.count = sum(count)) res <- ore.groupApply (D, D$year, function(x) {   total.count <- sum(x$count)   data.frame(year=x$year[1], count=x$count, total.count = total.count)   }, FUN.VALUE=data.frame(year=1, count=1, total.count=1)) > head(res)    year count total.count 1 2000 5 23 2 2000 7 23 3 2000 11 23 4 2001 18 40 5 2001 4 40 6 2001 18 40 In Example 4, the mutate function with ddply enables you to define new columns that build on columns just defined. Since the construction of the data.frame using ore.groupApply is explicit, you always have complete control over when and how to use columns. # Example 4 ddply(d, "year", mutate, mu = mean(count), sigma = sd(count),       cv = sigma/mu) res <- ore.groupApply (D, D$year, function(x) {   mu <- mean(x$count)   sigma <- sd(x$count)   cv <- sigma/mu   data.frame(year=x$year[1], count=x$count, mu=mu, sigma=sigma, cv=cv)   }, FUN.VALUE=data.frame(year=1, count=1, mu=1,sigma=1,cv=1)) R> head(res)    year count mu sigma cv 1 2000 5 7.666667 3.055050 0.3984848 2 2000 7 7.666667 3.055050 0.3984848 3 2000 11 7.666667 3.055050 0.3984848 4 2001 18 13.333333 8.082904 0.6062178 5 2001 4 13.333333 8.082904 0.6062178 6 2001 18 13.333333 8.082904 0.6062178 In Example 5, ddply is used to partition data on multiple columns before constructing the result. Realizing this with ore.groupApply involves creating an index column out of the concatenation of the columns used for partitioning. This example also allows us to illustrate using the ORE transparency layer to subset the data. # Example 5 baseball.dat <- subset(baseball, year > 2000) # data from the plyr package x <- ddply(baseball.dat, c("year", "team"), summarize,            homeruns = sum(hr)) We first push the data set to the database to get an ore.frame. We then add the composite column and perform the subset, using the transparency layer. Since the results from database execution are unordered, we will explicitly sort these results and view the first 6 rows. BB.DAT <- ore.push(baseball) BB.DAT$index <- with(BB.DAT, paste(year, team, sep="+")) BB.DAT2 <- subset(BB.DAT, year > 2000) X <- ore.groupApply (BB.DAT2, BB.DAT2$index, function(x) {   data.frame(year=x$year[1], team=x$team[1], homeruns=sum(x$hr))   }, FUN.VALUE=data.frame(year=1, team="A", homeruns=1), parallel=FALSE) res <- ore.sort(X, by=c("year","team")) R> head(res)    year team homeruns 1 2001 ANA 4 2 2001 ARI 155 3 2001 ATL 63 4 2001 BAL 58 5 2001 BOS 77 6 2001 CHA 63 Our next example is derived from the ggplot function documentation. This illustrates the use of ddply within using the ggplot2 package. We first create a data.frame with demo data and use ddply to create some statistics for each group (gp). We then use ggplot to produce the graph. We can take this same code, push the data.frame df to the database and invoke this on the database server. The graph will be returned to the client window, as depicted below. # Example 6 with ggplot2 library(ggplot2) df <- data.frame(gp = factor(rep(letters[1:3], each = 10)),                  y = rnorm(30)) # Compute sample mean and standard deviation in each group library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) # Set up a skeleton ggplot object and add layers: ggplot() +   geom_point(data = df, aes(x = gp, y = y)) +   geom_point(data = ds, aes(x = gp, y = mean),              colour = 'red', size = 3) +   geom_errorbar(data = ds, aes(x = gp, y = mean,                                ymin = mean - sd, ymax = mean + sd),              colour = 'red', width = 0.4) DF <- ore.push(df) ore.tableApply(DF, function(df) {   library(ggplot2)   library(plyr)   ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))   ggplot() +     geom_point(data = df, aes(x = gp, y = y)) +     geom_point(data = ds, aes(x = gp, y = mean),                colour = 'red', size = 3) +     geom_errorbar(data = ds, aes(x = gp, y = mean,                                  ymin = mean - sd, ymax = mean + sd),                   colour = 'red', width = 0.4) }) But let's take this one step further. Suppose we wanted to produce multiple graphs, partitioned on some index column. We replicate the data three times and add some noise to the y values, just to make the graphs a little different. We also create an index column to form our three partitions. Note that we've also specified that this should be executed in parallel, allowing Oracle Database to control and manage the server-side R engines. The result of ore.groupApply is an ore.list that contains the three graphs. Each graph can be viewed by printing the list element. df2 <- rbind(df,df,df) df2$y <- df2$y + rnorm(nrow(df2)) df2$index <- c(rep(1,300), rep(2,300), rep(3,300)) DF2 <- ore.push(df2) res <- ore.groupApply(DF2, DF2$index, function(df) {   df <- df[,1:2]   library(ggplot2)   library(plyr)   ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y))   ggplot() +     geom_point(data = df, aes(x = gp, y = y)) +     geom_point(data = ds, aes(x = gp, y = mean),                colour = 'red', size = 3) +     geom_errorbar(data = ds, aes(x = gp, y = mean,                                  ymin = mean - sd, ymax = mean + sd),                   colour = 'red', width = 0.4)   }, parallel=TRUE) res[[1]] res[[2]] res[[3]] To recap, we've illustrated how various uses of ddply from the plyr package can be realized in ore.groupApply, which affords the user explicit control over the contents of the data.frame result in a straightforward manner. We've also highlighted how ddply can be used within an ore.groupApply call.

    Read the article

  • What statistics can be maintained for a set of numerical data without iterating?

    - by Dan Tao
    Update Just for future reference, I'm going to list all of the statistics that I'm aware of that can be maintained in a rolling collection, recalculated as an O(1) operation on every addition/removal (this is really how I should've worded the question from the beginning): Obvious Count Sum Mean Max* Min* Median** Less Obvious Variance Standard Deviation Skewness Kurtosis Mode*** Weighted Average Weighted Moving Average**** OK, so to put it more accurately: these are not "all" of the statistics I'm aware of. They're just the ones that I can remember off the top of my head right now. *Can be recalculated in O(1) for additions only, or for additions and removals if the collection is sorted (but in this case, insertion is not O(1)). Removals potentially incur an O(n) recalculation for non-sorted collections. **Recalculated in O(1) for a sorted, indexed collection only. ***Requires a fairly complex data structure to recalculate in O(1). ****This can certainly be achieved in O(1) for additions and removals when the weights are assigned in a linearly descending fashion. In other scenarios, I'm not sure. Original Question Say I maintain a collection of numerical data -- let's say, just a bunch of numbers. For this data, there are loads of calculated values that might be of interest; one example would be the sum. To get the sum of all this data, I could... Option 1: Iterate through the collection, adding all the values: double sum = 0.0; for (int i = 0; i < values.Count; i++) sum += values[i]; Option 2: Maintain the sum, eliminating the need to ever iterate over the collection just to find the sum: void Add(double value) { values.Add(value); sum += value; } void Remove(double value) { values.Remove(value); sum -= value; } EDIT: To put this question in more relatable terms, let's compare the two options above to a (sort of) real-world situation: Suppose I start listing numbers out loud and ask you to keep them in your head. I start by saying, "11, 16, 13, 12." If you've just been remembering the numbers themselves and nothing more, and then I say, "What's the sum?", you'd have to think to yourself, "OK, what's 11 + 16 + 13 + 12?" before responding, "52." If, on the other hand, you had been keeping track of the sum yourself while I was listing the numbers (i.e., when I said, "11" you thought "11", when I said "16", you thought, "27," and so on), you could answer "52" right away. Then if I say, "OK, now forget the number 16," if you've been keeping track of the sum inside your head you can simply take 16 away from 52 and know that the new sum is 36, rather than taking 16 off the list and them summing up 11 + 13 + 12. So my question is, what other calculations, other than the obvious ones like sum and average, are like this? SECOND EDIT: As an arbitrary example of a statistic that (I'm almost certain) does require iteration -- and therefore cannot be maintained as simply as a sum or average -- consider if I asked you, "how many numbers in this collection are divisible by the min?" Let's say the numbers are 5, 15, 19, 20, 21, 25, and 30. The min of this set is 5, which divides into 5, 15, 20, 25, and 30 (but not 19 or 21), so the answer is 5. Now if I remove 5 from the collection and ask the same question, the answer is now 2, since only 15 and 30 are divisible by the new min of 15; but, as far as I can tell, you cannot know this without going through the collection again. So I think this gets to the heart of my question: if we can divide kinds of statistics into these categories, those that are maintainable (my own term, maybe there's a more official one somewhere) versus those that require iteration to compute any time a collection is changed, what are all the maintainable ones? What I am asking about is not strictly the same as an online algorithm (though I sincerely thank those of you who introduced me to that concept). An online algorithm can begin its work without having even seen all of the input data; the maintainable statistics I am seeking will certainly have seen all the data, they just don't need to reiterate through it over and over again whenever it changes.

    Read the article

  • Problem measuring N times the execution time of a code block

    - by Nazgulled
    EDIT: I just found my problem after writing this long post explaining every little detail... If someone can give me a good answer on what I'm doing wrong and how can I get the execution time in seconds (using a float with 5 decimal places or so), I'll mark that as accepted. Hint: The problem was on how I interpreted the clock_getttime() man page. Hi, Let's say I have a function named myOperation that I need to measure the execution time of. To measure it, I'm using clock_gettime() as it was recommend here in one of the comments. My teacher recommends us to measure it N times so we can get an average, standard deviation and median for the final report. He also recommends us to execute myOperation M times instead of just one. If myOperation is a very fast operation, measuring it M times allow us to get a sense of the "real time" it takes; cause the clock being used might not have the required precision to measure such operation. So, execution myOperation only one time or M times really depends if the operation itself takes long enough for the clock precision we are using. I'm having trouble dealing with that M times execution. Increasing M decreases (a lot) the final average value. Which doesn't make sense to me. It's like this, on average you take 3 to 5 seconds to travel from point A to B. But then you go from A to B and back to A 5 times (which makes it 10 times, cause A to B is the same as B to A) and you measure that. Than you divide by 10, the average you get is supposed to be the same average you take traveling from point A to B, which is 3 to 5 seconds. This is what I want my code to do, but it's not working. If I keep increasing the number of times I go from A to B and back A, the average will be lower and lower each time, it makes no sense to me. Enough theory, here's my code: #include <stdio.h> #include <time.h> #define MEASUREMENTS 1 #define OPERATIONS 1 typedef struct timespec TimeClock; TimeClock diffTimeClock(TimeClock start, TimeClock end) { TimeClock aux; if((end.tv_nsec - start.tv_nsec) < 0) { aux.tv_sec = end.tv_sec - start.tv_sec - 1; aux.tv_nsec = 1E9 + end.tv_nsec - start.tv_nsec; } else { aux.tv_sec = end.tv_sec - start.tv_sec; aux.tv_nsec = end.tv_nsec - start.tv_nsec; } return aux; } int main(void) { TimeClock sTime, eTime, dTime; int i, j; for(i = 0; i < MEASUREMENTS; i++) { printf(" » MEASURE %02d\n", i+1); clock_gettime(CLOCK_REALTIME, &sTime); for(j = 0; j < OPERATIONS; j++) { myOperation(); } clock_gettime(CLOCK_REALTIME, &eTime); dTime = diffTimeClock(sTime, eTime); printf(" - NSEC (TOTAL): %ld\n", dTime.tv_nsec); printf(" - NSEC (OP): %ld\n\n", dTime.tv_nsec / OPERATIONS); } return 0; } Notes: The above diffTimeClock function is from this blog post. I replaced my real operation with myOperation() because it doesn't make any sense to post my real functions as I would have to post long blocks of code, you can easily code a myOperation() with whatever you like to compile the code if you wish. As you can see, OPERATIONS = 1 and the results are: » MEASURE 01 - NSEC (TOTAL): 27456580 - NSEC (OP): 27456580 For OPERATIONS = 100 the results are: » MEASURE 01 - NSEC (TOTAL): 218929736 - NSEC (OP): 2189297 For OPERATIONS = 1000 the results are: » MEASURE 01 - NSEC (TOTAL): 862834890 - NSEC (OP): 862834 For OPERATIONS = 10000 the results are: » MEASURE 01 - NSEC (TOTAL): 574133641 - NSEC (OP): 57413 Now, I'm not a math wiz, far from it actually, but this doesn't make any sense to me whatsoever. I've already talked about this with a friend that's on this project with me and he also can't understand the differences. I don't understand why the value is getting lower and lower when I increase OPERATIONS. The operation itself should take the same time (on average of course, not the exact same time), no matter how many times I execute it. You could tell me that that actually depends on the operation itself, the data being read and that some data could already be in the cache and bla bla, but I don't think that's the problem. In my case, myOperation is reading 5000 lines of text from an CSV file, separating the values by ; and inserting those values into a data structure. For each iteration, I'm destroying the data structure and initializing it again. Now that I think of it, I also that think that there's a problem measuring time with clock_gettime(), maybe I'm not using it right. I mean, look at the last example, where OPERATIONS = 10000. The total time it took was 574133641ns, which would be roughly 0,5s; that's impossible, it took a couple of minutes as I couldn't stand looking at the screen waiting and went to eat something.

    Read the article

  • CodePlex Daily Summary for Friday, March 18, 2011

    CodePlex Daily Summary for Friday, March 18, 2011Popular ReleasesCraig's Utility Library: Craig's Utility Library 2.1: This update contains the following functionality additions: Added Min and Max functions to MathHelper Added code for handling comments to BlogML code. Added WMI based file search ability (at present only searches based on file extension). Added CellularTexture class Added FaultFormation class Added PerlinNoise class Added Akismet helper classes Added Gravatar image link generator Added PipeDelimited class Added FaultFormation class Added FluvialErosion functions to Image c...DirectQ: Release 1.8.7 (RC3): Release Candidate 3 of 1.8.7 fixing many bugs and adding some new functionality.Catel - WPF, Silverlight and WP7 MVVM library: 1.3: Catel history ============= (+) Added (*) Changed (-) Removed (x) Error / bug (fix) For more information about issues or new feature requests, please visit: http://catel.codeplex.com =========== Version 1.3 =========== Release date: ============= 2011/03/18 Added/fixed: ============ (+) Added BindingHelper class to evaluate binding values manually (+) UserControl<TViewModel> now supports master-detail views where the detail view would have a nested UserControl<TViewModel> and no parent ...CBM-Command: Version 2.0 - 2011-03-17 - Final Release: This is the final release of CBM-Command Version 2.0. This version is intended to replace all prior versions you may have downloaded. Please see release notes for prior versions to get comprehensive list of changes. New features since RC1: - (C64) Added double-buffering when displaying the directory in a panel. This eliminates the flickering that users were experiencing when scrolling through long directories. Changes since RC1: - (All Machines) Changed the algorithm for displaying the d...Phalanger - The PHP Language Compiler for the .NET Framework: 2.1 (March 2011) for .NET 4.0: Introducing release of Phalanger 2.1 for .NET 4.0. This release brings big performance boost about 20% in most of the operations. This improvement can be expected also in an overall performance in many PHP applications, e.g. Wordpress. It is the first release that targets .NET Framework 4.0 which allows developers to move the project forward. To migrate your old Phalanger applications from Phalanger 2.0 to 2.1 please follow Migration to 2.1. Installation package also includes basic version o...NodeXL: Network Overview, Discovery and Exploration for Excel: NodeXL Excel Template, version 1.0.1.164: The NodeXL Excel template displays a network graph using edge and vertex lists stored in an Excel 2007 or Excel 2010 workbook. What's NewThis release adds new layout options for groups, makes some minor feature improvements, and fixes a few bugs. See the Complete NodeXL Release History for details. Installation StepsFollow these steps to install and use the template: Download the Zip file. Unzip it into any folder. Use WinZip or a similar program, or just right-click the Zip file in Wi...Leage of Legends Masteries Tool: LoLMasterSave_v1.6.1.274: -Addresses resent LoL update that interfered with the way MasterSave sets / reads masteries - Removed Shift windows since some people experiencing issues If your interested in this function i can provide it as small separate tool.ASP.NET Comet Ajax Library (Reverse Ajax - Server Push): ASP.NET Server Push Samples: This package contains 14 sample projects.LogExpert: 1.4 build 4092: TabControl: Tooltip on dropdown list shows full path names now New menu item "Lock instance" in Options menu. Only available when "Allow only one instance" is disabled in the settings. "Lock instance" will temporary enable the single instance mode. The locked instance will receive all new launched files Some NullPtrExceptions fixed (e.g. in the settings dialog) Note: The debug build is identical to the release build. But the debug version writes a log file. It also contains line numbers ...Facebook C# SDK: 5.0.6 (BETA): This is seventh BETA release of the version 5 branch of the Facebook C# SDK. Remember this is a BETA build. Some things may change or not work exactly as planned. We are absolutely looking for feedback on this release to help us improve the final 5.X.X release. New in this release: Version 5.0.6 is almost completely backward compatible with 4.2.1 and 5.0.3 (BETA) Bug fixes and helpers to simplify many common scenarios For more information about this release see the following blog posts: F...DotNetNuke® Community Edition: 06.00.00 CTP: CTP 1 (Build 155) is firmly focused around our conversion to C#. As many people have noted, this is a significant change to the platform and affects all areas of the product. This is one of the driving factors in why we felt it was important to get this release into your hands as soon as possible. We have already done quite a bit of testing on this feature internally and have fixed a number of issues in this area. We also recognize that there are probably still some more bugs to be found ...Kooboo CMS: Kooboo 3.0 RC: Bug fixes Inline editing toolbar positioning for websites with complicate CSS. Inline editing is turned on by default now for the samplesite template. MongoDB version content query for multiple filters. . Add a new 404 page to guide users to login and create first website. Naming validation for page name and datarule name. Files in this download kooboo_CMS.zip: The Kooboo application files Content_DBProvider.zip: Additional content database implementation of MSSQL,SQLCE, RavenDB ...SQL Monitor - tracking sql server activities: SQL Monitor 3.2: 1. introduce sql color syntax highlighting with http://www.codeproject.com/KB/edit/FastColoredTextBox_.aspxUmbraco CMS: Umbraco 4.7.0: Service release fixing 50+ issues! Getting Started A great place to start is with our Getting Started Guide: Getting Started Guide: http://umbraco.codeplex.com/Project/Download/FileDownload.aspx?DownloadId=197051 Make sure to check the free foundation videos on how to get started building Umbraco sites. They're available from: Introduction for webmasters: http://umbraco.tv/help-and-support/video-tutorials/getting-started Understand the Umbraco concepts: http://umbraco.tv/help-and-support...ProDinner - ASP.NET MVC EF4 Code First DDD jQuery Sample App: first release: ProDinner is an ASP.NET MVC sample application, it uses DDD, EF4 Code First for Data Access, jQuery and MvcProjectAwesome for Web UI, it has Multi-language User Interface Features: CRUD and search operations for entities Multi-Language User Interface upload and crop Images (make thumbnail) for meals pagination using "more results" button very rich and responsive UI (using Mvc Project Awesome) Multiple UI themes (using jQuery UI themes)BEPUphysics: BEPUphysics v0.15.1: Latest binary release. Version HistoryLiveChat Starter Kit: LCSK v1.1: This release contains couple of new features and bug fixes including: Features: Send chat transcript via email Operator can now invite visitor to chat (pro-active chat request) Bug Fixes: Operator management (Save and Delete) bug fixes Operator Console chat small fixesIronRuby: 1.1.3: IronRuby 1.1.3 is a servicing release that keeps on improving compatibility with Ruby 1.9.2 and includes IronRuby integration to Visual Studio 2010. We decided to drop 1.8.6 compatibility mode in all post-1.0 releases. We recommend using IronRuby 1.0 if you need 1.8.6 compatibility. The main purpose of this release is to sync with IronPython 2.7 release, i.e. to keep the Dynamic Language Runtime that both these languages build on top shareable. This release also fixes a few bugs: 5763 Use...SQL Server PowerShell Extensions: 2.3.2.1 Production: Release 2.3.2.1 implements SQLPSX as PowersShell version 2.0 modules. SQLPSX consists of 13 modules with 163 advanced functions, 2 cmdlets and 7 scripts for working with ADO.NET, SMO, Agent, RMO, SSIS, SQL script files, PBM, Performance Counters, SQLProfiler, Oracle and MySQL and using Powershell ISE as a SQL and Oracle query tool. In addition optional backend databases and SQL Server Reporting Services 2008 reports are provided with SQLServer and PBM modules. See readme file for details.IronPython: 2.7: On behalf of the IronPython team, I'm very pleased to announce the release of IronPython 2.7. This release contains all of the language features of Python 2.7, as well as several previously missing modules and numerous bug fixes. IronPython 2.7 also includes built-in Visual Studio support through IronPython Tools for Visual Studio. IronPython 2.7 requires .NET 4.0 or Silverlight 4. To download IronPython 2.7, visit http://ironpython.codeplex.com/releases/view/54498. Any bugs should be report...New Projects4711 Widget: Time Widget For 4711Baidu map api: Baidu map apiBookmooch.NET: A .NET wrapper for the Bookmooch.com API.Crash Cite - Electronic Crash Reporting Software: While we were in the electronic citation business - we wrote Indiana's eCWS - we decided to write our own crash reporting software. We exited from that biz because it's too politically charged. Here for you to use is an almost complete electronic crash reporting solution. Enjoy.Curso apb: Proyecto para gestionar un curso academico basado en apb. Ejemplo para aprendizaje personalEle: simple database changes log web applicationEmptor by Intrigue Deviation: An idea for collaboration and relations between customers and businesses.FixEd: FixEd is a text editor for working with text files that require enforcement of column widths and line lengths. Fixed-Width files...Typical from mainframe datasets or csv files. It provides a plugin parser system, so that files may be exported to and imported from any format.Free inventory: InventoryFurcadia Heimdall Tester: An application that helps Furcadia technicians test the integrity of the game server. It checks for availability of each heimdall, its connectivity to the rest of the system (horton/tribble) and how often it receives a user compared to the rest of them.GestOre: Software per gestire le ore di lavoro. Creazione di report mensili.Grauers Google Chart WebPart: SharePoint Foundation Google WebPart Chart. Create a custom list and connect Grauers Google Chart with the list. The Chart is interactive. Created by Ola GrauershOcr2Pdf.NET: hOcr2Pdf.NET is program/library to convert .hOcr files into a searchable pdf. It is currently being written and tested again the .hocr files products by the Tesseract ocr engine.Hyper-V Guest Console: This application provides a console for manage VMs use within a Hyper-V guest. With HVGC an Hyper-V guest can: - Start VMs - Shutdow VMs - Stop VMs - Monitoring VMs HVGC is develop in Visual Studio 2008 using VB.NET.JUpload.Net: JUpload.Net is an Asp Net control which encapsulates the popular Jupload java applet (http://jupload.sourceforge.net). myMvcBlog: my mvc blogNGI.Framework: NGI.FrameworkOnlineLicense: OnlineLicense is a PHP, mySQL and C# project. PHP and mySQL runing at Web Server side to store the online user information, and return the proper license for desktop applications which write by C#. SharePoint 2010 Managed Metadata Import Tool: Command-line SharePoint 2010 Managed Metadata import tool. Uses (almost) same CSV format as the Microsoft importer and supports static term GUIDs. Unlike the MS tool, this importer does not cause data corruption issues when taxonomy terms are used with Content Organizer rules.Sphere Layout Library: ItemsControl library targetting WPF, Silverlight for desktop and Silverlight for WP7, allowing to layout things on a 3d Sphere.TestProject for C#: Test project to test a Team Foundation Server.TfsUtils: Simple Project to store some TFS API utility code.This Is My New Project: FasdfTidy UI: Tidy UI is a "write less - do more" framework for .NET focusing on both client components and code for easy data access. The main goal of the framework is to enable ASP.NET developers to focus more on functionality and less on the technical aspects of the development process.TönnenKlapps: XNA game where you try to smash a 3D spinning barrel using the correct coloured buttons and the right timing.yihuaisha_Window: Glest research

    Read the article

  • CodePlex Daily Summary for Monday, March 07, 2011

    CodePlex Daily Summary for Monday, March 07, 2011Popular ReleasesDotNetAge -a lightweight Mvc jQuery CMS: DotNetAge 2: What is new in DotNetAge 2.0 ? Completely update DJME to DJME2, enhance user experience ,more beautiful and more interactively visit DJME project home to lean more about DJME http://www.dotnetage.com/sites/home/djme.html A new widget engine has came! Faster and easiler. Runtime performance enhanced. SEO enhanced. UI Designer enhanced. A new web resources explorer. Page manager enhanced. BlogML supports added that allows you import/export your blog data to/from dotnetage publishi...Master Data Services Manager: stable 1.0.3: Update 2011-03-07 : bug fixes added external configuration File : configuration.config added TreeView Display of model (still in dev) http://img96.imageshack.us/img96/5067/screenshot073l.jpg added Connection Parameters (username, domain, password, stored encrypted in configuration file) http://img402.imageshack.us/img402/5350/screenshot072qc.jpgSharePoint Content Inventory: Release 1.1: Release 1.1Menu and Context Menu for Silverlight 4.0: Silverlight Menu and Context Menu v2.4 Beta: - Moved the core of the PopupMenu class to the new PopupMenuBase class. - Renamed the MenuTriggerElement class to MenuTriggerRelationship. - Renamed the ApplicationMenus property to MenuTriggers. - Renamed the ImageLeftOpacity property to ImageOpacity. - Renamed the ImageLeftVisibility property to ImageVisibility. - Renamed the ImageLeftMinWidth property to ImageMinWidth. - Renamed the ImagePathForRightMargin property to ImageRightPath. - Renamed the ImageSourceForRightMargin property to Ima...Kooboo CMS: Kooboo CMS 3.0 Beta: Files in this downloadkooboo_CMS.zip: The kooboo application files Content_DBProvider.zip: Additional content database implementation of MSSQL,SQLCE, RavenDB and MongoDB. Default is XML based database. To use them, copy the related dlls into web root bin folder and remove old content provider dlls. Content provider has the name like "Kooboo.CMS.Content.Persistence.SQLServer.dll" View_Engines.zip: Supports of Razor, webform and NVelocity view engine. Copy the dlls into web root bin folder t...ASP.NET MVC Project Awesome, jQuery Ajax helpers (controls): 1.7.2: A rich set of helpers (controls) that you can use to build highly responsive and interactive Ajax-enabled Web applications. These helpers include Autocomplete, AjaxDropdown, Lookup, Confirm Dialog, Popup Form, Popup and Pager added fullscreen for the popup and popupformIronPython: 2.7 Release Candidate 2: On behalf of the IronPython team, I am pleased to announce IronPython 2.7 Release Candidate 2. The releases contains a few minor bug fixes, including a working webbrowser module. Please see the release notes for 61395 for what was fixed in previous releases.LINQ to Twitter: LINQ to Twitter Beta v2.0.20: Mono 2.8, Silverlight, OAuth, 100% Twitter API coverage, streaming, extensibility via Raw Queries, and added documentation.IIS Tuner: IIS Tuner 1.0: IIS and ASP.NET performance optimization toolMinemapper: Minemapper v0.1.6: Once again supports biomes, thanks to an updated Minecraft Biome Extractor, which added support for the new Minecraft beta v1.3 map format. Updated mcmap to support new biome format.CRM 2011 OData Query Designer: CRM 2011 OData Query Designer: The CRM 2011 OData Query Designer is a Silverlight 4 application that is packaged as a Managed CRM 2011 Solution. This tool allows you to build OData queries by selecting filter criteria, select attributes and order by attributes. The tool also allows you to Execute the query and view the ATOM and JSON data returned. The look and feel of this component will improve and new functionality will be added in the near future so please provide feedback on your experience. Import this solution int...Sandcastle Help File Builder: SHFB v1.9.3.0 Release: This release supports the Sandcastle June 2010 Release (v2.6.10621.1). It includes full support for generating, installing, and removing MS Help Viewer files. This new release is compiled under .NET 4.0, supports Visual Studio 2010 solutions and projects as documentation sources, and adds support for projects targeting the Silverlight Framework. This release uses the Sandcastle Guided Installation package used by Sandcastle Styles. Download and extract to a folder and then run SandcastleI...mytrip.mvc (CMS & e-Commerce): mytrip.mvc 1.0.53.0 beta 2: New SEO Optimisation WEB.mytrip.mvc 1.0.53.0 Web for install hosting System Requirements: NET 4.0, MSSQL 2008 or MySql (auto creation table to database) if .\SQLEXPRESS auto creation database (App_Data folder) SRC.mytrip.mvc 1.0.53.0 System Requirements: Visual Studio 2010 or Web Deweloper 2010 MSSQL 2008 or MySql (auto creation table to database) if .\SQLEXPRESS auto creation database (App_Data folder) Connector/Net 6.3.5, MVC3 RTM WARNING For run and debug SRC.mytrip.mvc 1.0.53.0 dow...AutoLoL: AutoLoL v1.6.4: It is now possible to run the clicker anyway when it can't detect the Masteries Window Fixed a critical bug in the open file dialog Removed the resize button Some UI changes 3D camera movement is now more intuitive (Trackball rotation) When an error occurs on the clicker it will attempt to focus AutoLoLYAF.NET (aka Yet Another Forum.NET): v1.9.5.5 RTW: YAF v1.9.5.5 RTM (Date: 3/4/2011 Rev: 4742) Official Discussion Thread here: http://forum.yetanotherforum.net/yaf_postsm47149_v1-9-5-5-RTW--Date-3-4-2011-Rev-4742.aspx Changes in v1.9.5.5 Rev. #4661 - Added "Copy" function to forum administration -- Now instead of having to manually re-enter all the access masks, etc, you can just duplicate an existing forum and modify after the fact. Rev. #4642 - New Setting to Enable/Disable Last Unread posts links Rev. #4641 - Added Arabic Language t...Snippet Designer: Snippet Designer 1.3.1: Snippet Designer 1.3.1 for Visual Studio 2010This is a bug fix release. Change logFixed bug where Snippet Designer would fail if you had the most recent Productivity Power Tools installed Fixed bug where "Export as Snippet" was failing in non-english locales Fixed bug where opening a new .snippet file would fail in non-english localesChiave File Encryption: Chiave 1.0: Final Relase for Chave 1.0 Stable: Application for file encryption and decryption using 512 Bit rijndael encyrption algorithm with simple to use UI. Its written in C# and compiled in .Net version 3.5. It incorporates features of Windows 7 like Jumplists, Taskbar progress and Aero Glass. Now with added support to Windows XP! Change Log from 0.9.2 to 1.0: ==================== Added: > Added Icon Overlay for Windows 7 Taskbar Icon. >Added Thumbnail Toolbar buttons to make the navigation easier...ASP.NET: Sprite and Image Optimization Preview 3: The ASP.NET Sprite and Image Optimization framework is designed to decrease the amount of time required to request and display a page from a web server by performing a variety of optimizations on the page’s images. This is the third preview of the feature and works with ASP.NET Web Forms 4, ASP.NET MVC 3, and ASP.NET Web Pages (Razor) projects. The binaries are also available via NuGet: AspNetSprites-Core AspNetSprites-WebFormsControl AspNetSprites-MvcAndRazorHelper It includes the foll...Network Monitor Open Source Parsers: Microsoft Network Monitor Parsers 3.4.2554: The Network Monitor Parsers packages contain parsers for more than 400 network protocols, including RFC based public protocols and protocols for Microsoft products defined in the Microsoft Open Specifications for Windows and SQL Server. NetworkMonitor_Parsers.msi is the base parser package which defines parsers for commonly used public protocols and protocols for Microsoft Windows. In this release, we have added 4 new protocol parsers and updated 79 existing parsers in the NetworkMonitor_Pa...Image Resizer for Windows: Image Resizer 3 Preview 1: Prepare to have your minds blown. This is the first preview of what will eventually become 39613. There are still a lot of rough edges and plenty of areas still under construction, but for your basic needs, it should be relativly stable. Note: You will need the .NET Framework 4 installed to use this version. Below is a status report of where this release is in terms of the overall goal for version 3. If you're feeling a bit technically ambitious and want to check out some of the features th...New ProjectsAppFactory: Die AppFactory Dient zur Vereinfachung der entwicklung von WPF Anwedungen. Es ist in C# entwickelt.Change the Default Playback Sound Device: ChangePlaybackDevice makes it easier for personal user to change the default playback sound device. You'll no longer have to change the default playback sound device by hand. It's developed in C#. Conectayas: Conectayas is an open source "Connect Four" alike game but transformable to "Tic-Tac-Toe" and to a lot of similar games that uses mouse. Written in DHTML (JavaScript, CSS and HTML). Very configurable. This cross-platform and cross-browser game was tested under BeOS, Linux, *BSD, Windows and others.Diamond: The all in one toolkit for WPF and Silverligth projects.Digital Disk File Format: Digital Disk is a File format that uses a simple key system, it is currently in development. It is written in vb.net, but will be expanded into other languagesdotnetMvcMalll: this is a asp.net mvc mallEasyCache .NET: EasyCache .NET is a simplified API over the ASP.NET Cache object. Its purpose is to offer a more concise syntax for adding and retrieving items from the cache.Eric Fang SharePoint workflow activities: Eric Fang SharePoint workflow activitiesExpert.NET: Expert.NET is an expert system framework for .NET applications. Written in F#, it provides constructs for defining probabilistic rulesets, as well as an inference engine. Expert.NET is ideal for encoding domain knowledge used by troubleshooting applications.GameGolem: The GameGolem is an XNA Casual Gamers portal. The purpose is to create a single ClickOnce deployed "Game Launcher" which exposes simple API for games to keep track of highscores, achivements, etc. GameGolem will become a Kongregate-like XNA-based casual games portal.Hundiyas: Hundiyas is an open source "Battleship" alike game totally written in DHTML (JavaScript, CSS and HTML) that uses mouse. This cross-platform and cross-browser game was tested under BeOS, Linux, *BSD, Windows and others.ISBC: Practicas ISBC 10/11maocaijun.database: databaseNCLI: A simple API for command line argument parsing, written in C#.nEMO: nEMO is a pure C# framework for Evolutionary Multiobjective Optimization.N-tier architecture sample: A sample on how to practically design a system following an n-tier (multitier) architecture in line with the patterns and practices presented by Microsofts Application Architectural Guide 2.0. Focus is on a service application and it´s client applications of various types.PostsByMonth Widget: This is a simple widget for the Graffiti CMS application that allows you to get a monthly list of new posts to the site. It's configurable to allow for the # of posts to display as well as the the format of the month/year header, the title and the individual line entries. This is written in .Net 3.5 with Vb.Net.Puzzle Pal: Smartphone assistant for all your puzzling events.RavenDB Notification: Notification plugin for RavenDB. With this plugin you are able to subscribe to insert and delete notifications from the RavenDB server. Very helpfull if you need to process new documents on the remote clients and you do not like to query DB for new changes.Teamwork by Intrigue Deviation: A feature-rich team collaboration and project management effort built around Scrum methodology with MVC/2.test_flow: test flowTicari Uygulama Paketi: Ticari Uygulama Paketi (TUP), Microsoft Ofis 2010 ürünleri için gelistirilmis eklenti yazilimidir.XpsViewer: XpsVieweryaphan: yaphan cms.

    Read the article

  • Guidance: A Branching strategy for Scrum Teams

    - by Martin Hinshelwood
    Having a good branching strategy will save your bacon, or at least your code. Be careful when deviating from your branching strategy because if you do, you may be worse off than when you started! This is one possible branching strategy for Scrum teams and I will not be going in depth with Scrum but you can find out more about Scrum by reading the Scrum Guide and you can even assess your Scrum knowledge by having a go at the Scrum Open Assessment. You can also read SSW’s Rules to Better Scrum using TFS which have been developed during our own Scrum implementations. Acknowledgements Bill Heys – Bill offered some good feedback on this post and helped soften the language. Note: Bill is a VS ALM Ranger and co-wrote the Branching Guidance for TFS 2010 Willy-Peter Schaub – Willy-Peter is an ex Visual Studio ALM MVP turned blue badge and has been involved in most of the guidance including the Branching Guidance for TFS 2010 Chris Birmele – Chris wrote some of the early TFS Branching and Merging Guidance. Dr Paul Neumeyer, Ph.D Parallel Processes, ScrumMaster and SSW Solution Architect – Paul wanted to have feature branches coming from the release branch as well. We agreed that this is really a spin-off that needs own project, backlog, budget and Team. Scenario: A product is developed RTM 1.0 is released and gets great sales.  Extra features are demanded but the new version will have double to price to pay to recover costs, work is approved by the guys with budget and a few sprints later RTM 2.0 is released.  Sales a very low due to the pricing strategy. There are lots of clients on RTM 1.0 calling out for patches. As I keep getting Reverse Integration and Forward Integration mixed up and Bill keeps slapping my wrists I thought I should have a reminder: You still seemed to use reverse and/or forward integration in the wrong context. I would recommend reviewing your document at the end to ensure that it agrees with the common understanding of these terms merge (forward integration) from parent to child (same direction as the branch), and merge  (reverse integration) from child to parent (the reverse direction of the branch). - one of my many slaps on the wrist from Bill Heys.   As I mentioned previously we are using a single feature branching strategy in our current project. The single biggest mistake developers make is developing against the “Main” or “Trunk” line. This ultimately leads to messy code as things are added and never finished. Your only alternative is to NEVER check in unless your code is 100%, but this does not work in practice, even with a single developer. Your ADD will kick in and your half-finished code will be finished enough to pass the build and the tests. You do use builds don’t you? Sadly, this is a very common scenario and I have had people argue that branching merely adds complexity. Then again I have seen the other side of the universe ... branching  structures from he... We should somehow convince everyone that there is a happy between no-branching and too-much-branching. - Willy-Peter Schaub, VS ALM Ranger, Microsoft   A key benefit of branching for development is to isolate changes from the stable Main branch. Branching adds sanity more than it adds complexity. We do try to stress in our guidance that it is important to justify a branch, by doing a cost benefit analysis. The primary cost is the effort to do merges and resolve conflicts. A key benefit is that you have a stable code base in Main and accept changes into Main only after they pass quality gates, etc. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft The second biggest mistake developers make is branching anything other than the WHOLE “Main” line. If you branch parts of your code and not others it gets out of sync and can make integration a nightmare. You should have your Source, Assets, Build scripts deployment scripts and dependencies inside the “Main” folder and branch the whole thing. Some departments within MSFT even go as far as to add the environments used to develop the product in there as well; although I would not recommend that unless you have a massive SQL cluster to house your source code. We tried the “add environment” back in South-Africa and while it was “phenomenal”, especially when having to switch between environments, the disk storage and processing requirements killed us. We opted for virtualization to skin this cat of keeping a ready-to-go environment handy. - Willy-Peter Schaub, VS ALM Ranger, Microsoft   I think people often think that you should have separate branches for separate environments (e.g. Dev, Test, Integration Test, QA, etc.). I prefer to think of deploying to environments (such as from Main to QA) rather than branching for QA). - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft   You can read about SSW’s Rules to better Source Control for some additional information on what Source Control to use and how to use it. There are also a number of branching Anti-Patterns that should be avoided at all costs: You know you are on the wrong track if you experience one or more of the following symptoms in your development environment: Merge Paranoia—avoiding merging at all cost, usually because of a fear of the consequences. Merge Mania—spending too much time merging software assets instead of developing them. Big Bang Merge—deferring branch merging to the end of the development effort and attempting to merge all branches simultaneously. Never-Ending Merge—continuous merging activity because there is always more to merge. Wrong-Way Merge—merging a software asset version with an earlier version. Branch Mania—creating many branches for no apparent reason. Cascading Branches—branching but never merging back to the main line. Mysterious Branches—branching for no apparent reason. Temporary Branches—branching for changing reasons, so the branch becomes a permanent temporary workspace. Volatile Branches—branching with unstable software assets shared by other branches or merged into another branch. Note   Branches are volatile most of the time while they exist as independent branches. That is the point of having them. The difference is that you should not share or merge branches while they are in an unstable state. Development Freeze—stopping all development activities while branching, merging, and building new base lines. Berlin Wall—using branches to divide the development team members, instead of dividing the work they are performing. -Branching and Merging Primer by Chris Birmele - Developer Tools Technical Specialist at Microsoft Pty Ltd in Australia   In fact, this can result in a merge exercise no-one wants to be involved in, merging hundreds of thousands of change sets and trying to get a consolidated build. Again, we need to find a happy medium. - Willy-Peter Schaub on Merge Paranoia Merge conflicts are generally the result of making changes to the same file in both the target and source branch. If you create merge conflicts, you will eventually need to resolve them. Often the resolution is manual. Merging more frequently allows you to resolve these conflicts close to when they happen, making the resolution clearer. Waiting weeks or months to resolve them, the Big Bang approach, means you are more likely to resolve conflicts incorrectly. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft   Figure: Main line, this is where your stable code lives and where any build has known entities, always passes and has a happy test that passes as well? Many development projects consist of, a single “Main” line of source and artifacts. This is good; at least there is source control . There are however a couple of issues that need to be considered. What happens if: you and your team are working on a new set of features and the customer wants a change to his current version? you are working on two features and the customer decides to abandon one of them? you have two teams working on different feature sets and their changes start interfering with each other? I just use labels instead of branches? That's a lot of “what if’s”, but there is a simple way of preventing this. Branching… In TFS, labels are not immutable. This does not mean they are not useful. But labels do not provide a very good development isolation mechanism. Branching allows separate code sets to evolve separately (e.g. Current with hotfixes, and vNext with new development). I don’t see how labels work here. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft   Figure: Creating a single feature branch means you can isolate the development work on that branch.   Its standard practice for large projects with lots of developers to use Feature branching and you can check the Branching Guidance for the latest recommendations from the Visual Studio ALM Rangers for other methods. In the diagram above you can see my recommendation for branching when using Scrum development with TFS 2010. It consists of a single Sprint branch to contain all the changes for the current sprint. The main branch has the permissions changes so contributors to the project can only Branch and Merge with “Main”. This will prevent accidental check-ins or checkouts of the “Main” line that would contaminate the code. The developers continue to develop on sprint one until the completion of the sprint. Note: In the real world, starting a new Greenfield project, this process starts at Sprint 2 as at the start of Sprint 1 you would have artifacts in version control and no need for isolation.   Figure: Once the sprint is complete the Sprint 1 code can then be merged back into the Main line. There are always good practices to follow, and one is to always do a Forward Integration from Main into Sprint 1 before you do a Reverse Integration from Sprint 1 back into Main. In this case it may seem superfluous, but this builds good muscle memory into your developer’s work ethic and means that no bad habits are learned that would interfere with additional Scrum Teams being added to the Product. The process of completing your sprint development: The Team completes their work according to their definition of done. Merge from “Main” into “Sprint1” (Forward Integration) Stabilize your code with any changes coming from other Scrum Teams working on the same product. If you have one Scrum Team this should be quick, but there may have been bug fixes in the Release branches. (we will talk about release branches later) Merge from “Sprint1” into “Main” to commit your changes. (Reverse Integration) Check-in Delete the Sprint1 branch Note: The Sprint 1 branch is no longer required as its useful life has been concluded. Check-in Done But you are not yet done with the Sprint. The goal in Scrum is to have a “potentially shippable product” at the end of every Sprint, and we do not have that yet, we only have finished code.   Figure: With Sprint 1 merged you can create a Release branch and run your final packaging and testing In 99% of all projects I have been involved in or watched, a “shippable product” only happens towards the end of the overall lifecycle, especially when sprints are short. The in-between releases are great demonstration releases, but not shippable. Perhaps it comes from my 80’s brain washing that we only ship when we reach the agreed quality and business feature bar. - Willy-Peter Schaub, VS ALM Ranger, Microsoft Although you should have been testing and packaging your code all the way through your Sprint 1 development, preferably using an automated process, you still need to test and package with stable unchanging code. This is where you do what at SSW we call a “Test Please”. This is first an internal test of the product to make sure it meets the needs of the customer and you generally use a resource external to your Team. Then a “Test Please” is conducted with the Product Owner to make sure he is happy with the output. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client?   Figure: If you find a deviation from the expected result you fix it on the Release branch. If during your final testing or your “Test Please” you find there are issues or bugs then you should fix them on the release branch. If you can’t fix them within the time box of your Sprint, then you will need to create a Bug and put it onto the backlog for prioritization by the Product owner. Make sure you leave plenty of time between your merge from the development branch to find and fix any problems that are uncovered. This process is commonly called Stabilization and should always be conducted once you have completed all of your User Stories and integrated all of your branches. Even once you have stabilized and released, you should not delete the release branch as you would with the Sprint branch. It has a usefulness for servicing that may extend well beyond the limited life you expect of it. Note: Don't get forced by the business into adding features into a Release branch instead that indicates the unspoken requirement is that they are asking for a product spin-off. In this case you can create a new Team Project and branch from the required Release branch to create a new Main branch for that product. And you create a whole new backlog to work from.   Figure: When the Team decides it is happy with the product you can create a RTM branch. Once you have fixed all the bugs you can, and added any you can’t to the Product Backlog, and you Team is happy with the result you can create a Release. This would consist of doing the final Build and Packaging it up ready for your Sprint Review meeting. You would then create a read-only branch that represents the code you “shipped”. This is really an Audit trail branch that is optional, but is good practice. You could use a Label, but Labels are not Auditable and if a dispute was raised by the customer you can produce a verifiable version of the source code for an independent party to check. Rare I know, but you do not want to be at the wrong end of a legal battle. Like the Release branch the RTM branch should never be deleted, or only deleted according to your companies legal policy, which in the UK is usually 7 years.   Figure: If you have made any changes in the Release you will need to merge back up to Main in order to finalise the changes. Nothing is really ever done until it is in Main. The same rules apply when merging any fixes in the Release branch back into Main and you should do a reverse merge before a forward merge, again for the muscle memory more than necessity at this stage. Your Sprint is now nearly complete, and you can have a Sprint Review meeting knowing that you have made every effort and taken every precaution to protect your customer’s investment. Note: In order to really achieve protection for both you and your client you would add Automated Builds, Automated Tests, Automated Acceptance tests, Acceptance test tracking, Unit Tests, Load tests, Web test and all the other good engineering practices that help produce reliable software.     Figure: After the Sprint Planning meeting the process begins again. Where the Sprint Review and Retrospective meetings mark the end of the Sprint, the Sprint Planning meeting marks the beginning. After you have completed your Sprint Planning and you know what you are trying to achieve in Sprint 2 you can create your new Branch to develop in. How do we handle a bug(s) in production that can’t wait? Although in Scrum the only work done should be on the backlog there should be a little buffer added to the Sprint Planning for contingencies. One of these contingencies is a bug in the current release that can’t wait for the Sprint to finish. But how do you handle that? Willy-Peter Schaub asked an excellent question on the release activities: In reality Sprint 2 starts when sprint 1 ends + weekend. Should we not cater for a possible parallelism between Sprint 2 and the release activities of sprint 1? It would introduce FI’s from main to sprint 2, I guess. Your “Figure: Merging print 2 back into Main.” covers, what I tend to believe to be reality in most cases. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I agree, and if you have a single Scrum team then your resources are limited. The Scrum Team is responsible for packaging and release, so at least one run at stabilization, package and release should be included in the Sprint time box. If more are needed on the current production release during the Sprint 2 time box then resource needs to be pulled from Sprint 2. The Product Owner and the Team have four choices (in order of disruption/cost): Backlog: Add the bug to the backlog and fix it in the next Sprint Buffer Time: Use any buffer time included in the current Sprint to fix the bug quickly Make time: Remove a Story from the current Sprint that is of equal value to the time lost fixing the bug(s) and releasing. Note: The Team must agree that it can still meet the Sprint Goal. Cancel Sprint: Cancel the sprint and concentrate all resource on fixing the bug(s) Note: This can be a very costly if the current sprint has already had a lot of work completed as it will be lost. The choice will depend on the complexity and severity of the bug(s) and both the Product Owner and the Team need to agree. In this case we will go with option #2 or #3 as they are uncomplicated but severe bugs. Figure: Real world issue where a bug needs fixed in the current release. If the bug(s) is urgent enough then then your only option is to fix it in place. You can edit the release branch to find and fix the bug, hopefully creating a test so it can’t happen again. Follow the prior process and conduct an internal and customer “Test Please” before releasing. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client?   Figure: After you have fixed the bug you need to ship again. You then need to again create an RTM branch to hold the version of the code you released in escrow.   Figure: Main is now out of sync with your Release. We now need to get these new changes back up into the Main branch. Do a reverse and then forward merge again to get the new code into Main. But what about the branch, are developers not working on Sprint 2? Does Sprint 2 now have changes that are not in Main and Main now have changes that are not in Sprint 2? Well, yes… and this is part of the hit you take doing branching. But would this scenario even have been possible without branching?   Figure: Getting the changes in Main into Sprint 2 is very important. The Team now needs to do a Forward Integration merge into their Sprint and resolve any conflicts that occur. Maybe the bug has already been fixed in Sprint 2, maybe the bug no longer exists! This needs to be identified and resolved by the developers before they continue to get further out of Sync with Main. Note: Avoid the “Big bang merge” at all costs.   Figure: Merging Sprint 2 back into Main, the Forward Integration, and R0 terminates. Sprint 2 now merges (Reverse Integration) back into Main following the procedures we have already established.   Figure: The logical conclusion. This then allows the creation of the next release. By now you should be getting the big picture and hopefully you learned something useful from this post. I know I have enjoyed writing it as I find these exploratory posts coupled with real world experience really help harden my understanding.  Branching is a tool; it is not a silver bullet. Don’t over use it, and avoid “Anti-Patterns” where possible. Although the diagram above looks complicated I hope showing you how it is formed simplifies it as much as possible.   Technorati Tags: Branching,Scrum,VS ALM,TFS 2010,VS2010

    Read the article

< Previous Page | 1 2 3