Search Results

Search found 3047 results on 122 pages for 'subset sum'.

Page 31/122 | < Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38  | Next Page >

  • restrict duplicate rows in specific columns in mysql

    - by JPro
    I have a query like this : select testset, count(distinct results.TestCase) as runs, Sum(case when Verdict = "PASS" then 1 else 0 end) as pass, Sum(case when Verdict <> "PASS" then 1 else 0 end) as fails, Sum(case when latest_issue <> "NULL" then 1 else 0 end) as issues, Sum(case when latest_issue <> "NULL" and issue_type = "TC" then 1 else 0 end) as TC_issues from results join testcases on results.TestCase = testcases.TestCase where platform = "T1_PLATFORM" AND testcases.CaseType = "M2" and testcases.dummy <> "flag_1" group by testset order by results.TestCase The result set I get is : testset runs pass fails issues TC_issues T1 66 125 73 38 33 T2 18 19 16 16 15 T3 57 58 55 55 29 T4 52 43 12 0 0 T5 193 223 265 130 22 T6 23 12 11 0 0 My problem is, this is a result table which has testcases running multiple times. So, I am able to restrict the runs using the distinct TestCases but when I want the pass and fails, since I am using case I am unable to eliminate the duplicates. Is there any way to achieve what I want? any help please? thanks.

    Read the article

  • ggplot: showing % instead of counts in charts of categorical variables

    - by wishihadabettername
    I'm plotting a categorical variable and instead of showing the counts for each category value, I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command. I was experimenting with something like qplot (mydataf) + stat_bin(aes(n=nrow(mydataf), y=..count../n)) + scale_y_continuous(formatter="percent") but I must be using it incorrectly, as I got errors. To easily reproduce the setup, here's a simplified example: mydata <- c ("aa", "bb", null, "bb", "cc", "aa", "aa", "aa", "ee", null, "cc"); mydataf <- factor(mydata); qplot (mydataf); #this shows the count, I'm looking to see % displayed. In the real case I'll probably use ggplot instead of qplot, but the right way to use stat_bin still eludes me. Thank you. UPDATE: I've also tried these four approaches: ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + scale_y_continuous(formatter = 'percent'); ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + scale_y_continuous(formatter = 'percent') + geom_bar(); ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + scale_y_continuous(formatter = 'percent'); ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + scale_y_continuous(formatter = 'percent') + geom_bar(); but all 4 give: Error: ggplot2 doesn't know how to deal with data of class factor The same error appears for the simple case of ggplot (data=mydataf, aes(levels(mydataf))) + geom_bar() so it's clearly something about how ggplot interacts with a single vector. I'm scratching my head, googling for that error gives a single result.

    Read the article

  • Aggregate SQL column values by time period

    - by user305688
    I have some numerical data that comes in every 5 minutes (i.e. 288 values per day, and quite a few days worth of data). I need to write a query that can return the sums of all values for each day. So currently the table looks like this: 03/30/2010 00:01:00 -- 553 03/30/2010 00:06:00 -- 558 03/30/2010 00:11:00 -- 565 03/30/2010 00:16:00 -- 565 03/30/2010 00:21:00 -- 558 03/30/2010 00:26:00 -- 566 03/30/2010 00:31:00 -- 553 ... And this goes on for 'x' number of days, I'd like the query to return 'x' number of rows, each of which containing the sum of all the values on each day. Something like this: 03/30/2010 -- <sum> 03/31/2010 -- <sum> 04/01/2010 -- <sum> The query will go inside a Dundas webpart, so unfortunately I can't write custom user functions to assist it. All the logic needs to be in just the one big query. Any help would be appreciated, thanks. I'm trying to get it to work using GROUP BY and DATEPART at the moment, not sure if it's the right way to go about it.

    Read the article

  • Explaining a Ruby code snippet

    - by Michael Foukarakis
    I'm in that uncomfortable position again, where somebody has left me with a code snippet in a language I don't know and I have to maintain it. While I haven't introduced Ruby to myself some parts of it are quite simple, but I'd like to hear your explanations nonetheless. Here goes: words = File.open("lengths.txt") {|f| f.read }.split # read all lines of a file in 'words'? values = Array.new(0) words.each { |value| values << value.to_i } # looked this one up, it's supposed to convert to an array of integers, right? values.sort! values.uniq! diffs = Array.new(0) # this looks unused, unless I'm missing something obvious sum = 0 s = 0 # another unused variable # this looks like it's computing the sum of differences between successive # elements, but that sum also remains unused, or does it? values.each_index { |index| if index.to_i < values.length-1 then sum += values.at(index.to_i + 1) - values.at(index.to_i) end } # could you also explain the syntax here? puts "delta has the value of\n" # this will eventually print the minimum of the original values divided by 2 puts values.at(0) / 2 The above script was supposed to figure out the average of the differences between every two successive elements (integers, essentially) in a list. Am I right in saying this is nowhere near what it actually does, or am I missing something fundamental, which is likely considering I have no Ruby knowledge?

    Read the article

  • How to efficiently get highest & lowest values from a List<double?>, and then modify them?

    - by DaveDev
    I have to get the sum of a list of doubles. If the sum is 100, I have to decrement from the highest number until it's = 100. If the sum is < 100, I have to increment the lowest number until it's = 100. I can do this by looping though the list, assigning the values to placeholder variables and testing which is higher or lower but I'm wondering if any gurus out there could suggest a super cool & efficient way to do this? The code below basically outlines what I'm trying to achieve: var splitValues = new List<double?>(); splitValues.Add(Math.Round(assetSplit.EquityTypeSplit() ?? 0)); splitValues.Add(Math.Round(assetSplit.PropertyTypeSplit() ?? 0)); splitValues.Add(Math.Round(assetSplit.FixedInterestTypeSplit() ?? 0)); splitValues.Add(Math.Round(assetSplit.CashTypeSplit() ?? 0)); var listSum = splitValues.Sum(split => split.Value); if (listSum != 100) { if (listSum > 100) { // how to get highest value and decrement by 1 until listSum == 100 // then reassign back into the splitValues list? var highest = // ?? } else { // how to get lowest where value is > 0, and increment by 1 until listSum == 100 // then reassign back into the splitValues list? var lowest = // ?? } } update: the list has to remain in the same order as the items are added.

    Read the article

  • Misaligned Pointer Performance

    - by Elite Mx
    Aren't misaligned pointers (in the BEST possible case) supposed to slow down performance and in the worst case crash your program (assuming the compiler was nice enough to compile your invalid c program). Well, the following code doesn't seem to have any performance differences between the aligned and misaligned versions. Why is that? /* brutality.c */ #ifdef BRUTALITY xs = (unsigned long *) ((unsigned char *) xs + 1); #endif ... /* main.c */ #include <stdio.h> #include <stdlib.h> #define size_t_max ((size_t)-1) #define max_count(var) (size_t_max / (sizeof var)) int main(int argc, char *argv[]) { unsigned long sum, *xs, *itr, *xs_end; size_t element_count = max_count(*xs) >> 4; xs = malloc(element_count * (sizeof *xs)); if(!xs) exit(1); xs_end = xs + element_count - 1; sum = 0; for(itr = xs; itr < xs_end; itr++) *itr = 0; #include "brutality.c" itr = xs; while(itr < xs_end) sum += *itr++; printf("%lu\n", sum); /* we could free the malloc-ed memory here */ /* but we are almost done */ exit(0); } Compiled and tested on two separate machines using gcc -pedantic -Wall -O0 -std=c99 main.c for i in {0..9}; do time ./a.out; done

    Read the article

  • MySql: Is it reasonable to use 'view' or I would better denormalize my DB?

    - by Budda
    There is 'team_sector' table with following fields: Id, team_id, sect_id, size, level It contains few records for each 'team' entity (referenced with 'team_id' field). Each record represent sector of team's stadium (totally 8 sectors). Now it is necessary to implement few searches: by overall stadium size (SUM(size)); the best quality (SUM(level)/COUNT(*)). I could create query something like this: SELECT TS.team_id, SUM(TS.size) as OverallSize, SUM(TS.Level)/COUNT(TS.Id) AS QualityLevel FROM team_sector GROUP BY team_id ORDER BY OverallSize DESC / ORDER BY QualityLevel DESC But my concern here is that calculation for each team will be done each time on query performed. It is not too big overhead (at least now), but I would like to avoid performance issues later. I see 2 options here. The 1st one is to create 2 additional fields in 'team' table (for example) and store there OverallSize and QualityLevel fields. If information if 'sector' table is changed - update those table too (probably would be good to do that with triggers, as sector table doesn't change too often). The 2nd option is to create a view that will provide required data. The 2nd option seems much easier for me, but I don't have a lot of experience/knowledge of work with views. Q1: What is the best option from your perspective here and why? Probably you could suggest other options? Q2: Can I create view in such way that it will do calculations rarely (at least once per day)? If yes - how? Q3: Is it reasonable to use triggers for such purpose (1st option). P.S. MySql 5.1 is used, overall number of teams is around 1-2 thousand, overall number of records in sector table - overall 6-8 thousand. I understand, those numbers are pretty small, but I would like to implement the best practice here.

    Read the article

  • RiverTrail - JavaScript GPPGU Data Parallelism

    - by JoshReuben
    Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - http://www.khronos.org/webcl/ . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications via a familiar JavaScript programming paradigm. It extends JavaScript with simple deterministic data-parallel constructs that are translated at runtime into a low-level hardware abstraction layer. With its high-level JS API, programmers do not have to learn a new language or explicitly manage threads, orchestrate shared data synchronization or scheduling. It has been proposed as a draft specification to ECMA a (known as ECMA strawman). RiverTrail runs in all popular browsers (except I.E. of course). To get started, download a prebuilt version https://github.com/downloads/RiverTrail/RiverTrail/rivertrail-0.17.xpi , install Intel's OpenCL SDK http://www.intel.com/go/opencl and try out the interactive River Trail shell http://rivertrail.github.com/interactive For a video overview, see  http://www.youtube.com/watch?v=jueg6zB5XaM . ParallelArray the ParallelArray type is the central component of this API & is a JS object that contains ordered collections of scalars – i.e. multidimensional uniform arrays. A shape property describes the dimensionality and size– e.g. a 2D RGBA image will have shape [height, width, 4]. ParallelArrays are immutable & fluent – they are manipulated by invoking methods on them which produce new ParallelArray objects. ParallelArray supports several constructors over arrays, functions & even the canvas. // Create an empty Parallel Array var pa = new ParallelArray(); // pa0 = <>   // Create a ParallelArray out of a nested JS array. // Note that the inner arrays are also ParallelArrays var pa = new ParallelArray([ [0,1], [2,3], [4,5] ]); // pa1 = <<0,1>, <2,3>, <4.5>>   // Create a two-dimensional ParallelArray with shape [3, 2] using the comprehension constructor var pa = new ParallelArray([3, 2], function(iv){return iv[0] * iv[1];}); // pa7 = <<0,0>, <0,1>, <0,2>>   // Create a ParallelArray from canvas.  This creates a PA with shape [w, h, 4], var pa = new ParallelArray(canvas); // pa8 = CanvasPixelArray   ParallelArray exposes fluent API functions that take an elemental JS function for data manipulation: map, combine, scan, filter, and scatter that return a new ParallelArray. Other functions are scalar - reduce  returns a scalar value & get returns the value located at a given index. The onus is on the developer to ensure that the elemental function does not defeat data parallelization optimization (avoid global var manipulation, recursion). For reduce & scan, order is not guaranteed - the onus is on the dev to provide an elemental function that is commutative and associative so that scan will be deterministic – E.g. Sum is associative, but Avg is not. map Applies a provided elemental function to each element of the source array and stores the result in the corresponding position in the result array. The map method is shape preserving & index free - can not inspect neighboring values. // Adding one to each element. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.map(function inc(v) {     return v+1; }); //<2,3,4,5,6> combine Combine is similar to map, except an index is provided. This allows elemental functions to access elements from the source array relative to the one at the current index position. While the map method operates on the outermost dimension only, combine, can choose how deep to traverse - it provides a depth argument to specify the number of dimensions it iterates over. The elemental function of combine accesses the source array & the current index within it - element is computed by calling the get method of the source ParallelArray object with index i as argument. It requires more code but is more expressive. var source = new ParallelArray([1,2,3,4,5]); var plusOne = source.combine(function inc(i) { return this.get(i)+1; }); reduce reduces the elements from an array to a single scalar result – e.g. Sum. // Calculate the sum of the elements var source = new ParallelArray([1,2,3,4,5]); var sum = source.reduce(function plus(a,b) { return a+b; }); scan Like reduce, but stores the intermediate results – return a ParallelArray whose ith elements is the results of using the elemental function to reduce the elements between 0 and I in the original ParallelArray. // do a partial sum var source = new ParallelArray([1,2,3,4,5]); var psum = source.scan(function plus(a,b) { return a+b; }); //<1, 3, 6, 10, 15> scatter a reordering function - specify for a certain source index where it should be stored in the result array. An optional conflict function can prevent an exception if two source values are assigned the same position of the result: var source = new ParallelArray([1,2,3,4,5]); var reorder = source.scatter([4,0,3,1,2]); // <2, 4, 5, 3, 1> // if there is a conflict use the max. use 33 as a default value. var reorder = source.scatter([4,0,3,4,2], 33, function max(a, b) {return a>b?a:b; }); //<2, 33, 5, 3, 4> filter // filter out values that are not even var source = new ParallelArray([1,2,3,4,5]); var even = source.filter(function even(iv) { return (this.get(iv) % 2) == 0; }); // <2,4> Flatten used to collapse the outer dimensions of an array into a single dimension. pa = new ParallelArray([ [1,2], [3,4] ]); // <<1,2>,<3,4>> pa.flatten(); // <1,2,3,4> Partition used to restore the original shape of the array. var pa = new ParallelArray([1,2,3,4]); // <1,2,3,4> pa.partition(2); // <<1,2>,<3,4>> Get return value found at the indices or undefined if no such value exists. var pa = new ParallelArray([0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24]) pa.get([1,1]); // 11 pa.get([1]); // <10,11,12,13,14>

    Read the article

  • MapRedux - PowerShell and Big Data

    - by Dittenhafer Solutions
    MapRedux – #PowerShell and #Big Data Have you been hearing about “big data”, “map reduce” and other large scale computing terms over the past couple of years and been curious to dig into more detail? Have you read some of the Apache Hadoop online documentation and unfortunately concluded that it wasn't feasible to setup a “test” hadoop environment on your machine? More recently, I have read about some of Microsoft’s work to enable Hadoop on the Azure cloud. Being a "Microsoft"-leaning technologist, I am more inclinded to be successful with experimentation when on the Windows platform. Of course, it is not that I am "religious" about one set of technologies other another, but rather more experienced. Anyway, within the past couple of weeks I have been thinking about PowerShell a bit more as the 2012 PowerShell Scripting Games approach and it occured to me that PowerShell's support for Windows Remote Management (WinRM), and some other inherent features of PowerShell might lend themselves particularly well to a simple implementation of the MapReduce framework. I fired up my PowerShell ISE and started writing just to see where it would take me. Quite simply, the ScriptBlock feature combined with the ability of Invoke-Command to create remote jobs on networked servers provides much of the plumbing of a distributed computing environment. There are some limiting factors of course. Microsoft provided some default settings which prevent PowerShell from taking over a network without administrative approval first. But even with just one adjustment, a given Windows-based machine can become a node in a MapReduce-style distributed computing environment. Ok, so enough introduction. Let's talk about the code. First, any machine that will participate as a remote "node" will need WinRM enabled for remote access, as shown below. This is not exactly practical for hundreds of intended nodes, but for one (or five) machines in a test environment it does just fine. C:> winrm quickconfig WinRM is not set up to receive requests on this machine. The following changes must be made: Set the WinRM service type to auto start. Start the WinRM service. Make these changes [y/n]? y Alternatively, you could take the approach described in the Remotely enable PSRemoting post from the TechNet forum and use PowerShell to create remote scheduled tasks that will call Enable-PSRemoting on each intended node. Invoke-MapRedux Moving on, now that you have one or more remote "nodes" enabled, you can consider the actual Map and Reduce algorithms. Consider the following snippet: $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose Invoke-MapRedux takes an instance of a MapReduceItem which references the Map and Reduce scriptblocks, an array of computer names which are the remote nodes, and the initial data set to be processed. As simple as that, you can start working with concepts of big data and the MapReduce paradigm. Now, how did we get there? I have published the initial version of my PsMapRedux PowerShell Module on GitHub. The PsMapRedux module provides the Invoke-MapRedux function described above. Feel free to browse the underlying code and even contribute to the project! In a later post, I plan to show some of the inner workings of the module, but for now let's move on to how the Map and Reduce functions are defined. Map Both the Map and Reduce functions need to follow a prescribed prototype. The prototype for a Map function in the MapRedux module is as follows. A simple scriptblock that takes one PsObject parameter and returns a hashtable. It is important to note that the PsObject $dataset parameter is a MapRedux custom object that has a "Data" property which offers an array of data to be processed by the Map function. $aMap = { Param ( [PsObject] $dataset ) # Indicate the job is running on the remote node. Write-Host ($env:computername + "::Map"); # The hashtable to return $list = @{}; # ... Perform the mapping work and prepare the $list hashtable result with your custom PSObject... # ... The $dataset has a single 'Data' property which contains an array of data rows # which is a subset of the originally submitted data set. # Return the hashtable (Key, PSObject) Write-Output $list; } Reduce Likewise, with the Reduce function a simple prototype must be followed which takes a $key and a result $dataset from the MapRedux's partitioning function (which joins the Map results by key). Again, the $dataset is a MapRedux custom object that has a "Data" property as described in the Map section. $aReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) # The hashtable to return $redux = @{}; # Return Write-Output $redux; } All Together Now When everything is put together in a short example script, you implement your Map and Reduce functions, query for some starting data, build the MapReduxItem via New-MapReduxItem and call Invoke-MapRedux to get the process started: # Import the MapRedux and SQL Server providers Import-Module "MapRedux" Import-Module “sqlps” -DisableNameChecking # Query the database for a dataset Set-Location SQLSERVER:\sql\dbserver1\default\databases\myDb $query = "SELECT MyKey, Date, Value1 FROM BigData ORDER BY MyKey"; Write-Host "Query: $query" $dataset = Invoke-SqlCmd -query $query # Build the Map function $MyMap = { Param ( [PsObject] $dataset ) Write-Host ($env:computername + "::Map"); $list = @{}; foreach($row in $dataset.Data) { # Write-Host ("Key: " + $row.MyKey.ToString()); if($list.ContainsKey($row.MyKey) -eq $true) { $s = $list.Item($row.MyKey); $s.Sum += $row.Value1; $s.Count++; } else { $s = New-Object PSObject; $s | Add-Member -Type NoteProperty -Name MyKey -Value $row.MyKey; $s | Add-Member -type NoteProperty -Name Sum -Value $row.Value1; $list.Add($row.MyKey, $s); } } Write-Output $list; } $MyReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) $redux = @{}; $count = 0; foreach($s in $dataset.Data) { $sum += $s.Sum; $count += 1; } # Reduce $redux.Add($s.MyKey, $sum / $count); # Return Write-Output $redux; } # Create the item data $Mr = New-MapReduxItem "My Test MapReduce Job" $MyMap $MyReduce # Array of processing nodes... $MyNodes = ("node1", "node2", "node3", "node4", "localhost") # Run the Map Reduce routine... $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose # Show the results Set-Location C:\ $MyMrResults | Out-GridView Conclusion I hope you have seen through this article that PowerShell has a significant infrastructure available for distributed computing. While it does take some code to expose a MapReduce-style framework, much of the work is already done and PowerShell could prove to be the the easiest platform to develop and run big data jobs in your corporate data center, potentially in the Azure cloud, or certainly as an academic excerise at home or school. Follow me on Twitter to stay up to date on the continuing progress of my Powershell MapRedux module, and thanks for reading! Daniel

    Read the article

  • Guide to reduce TFS database growth using the Test Attachment Cleaner

    - by terje
    Recently there has been several reports on TFS databases growing too fast and growing too big.  Notable this has been observed when one has started to use more features of the Testing system.  Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process.  Further some preventive practices and maintenance rules should be adopted. A lot of people have blogged about this, among these are: Anu’s very important blog post here describes both the problem and solutions to handle it.  She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective. Brian Harry’s blog post here describes the problem too This forum thread describes the problem with some solution hints. Ravi Shanker’s blog post here describes best practices on solving this (TBP) Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them.   The problem can be divided into the following areas: Publishing of test results from builds Publishing of manual test results and their attachments in particular Publishing of deployment binaries for use during a test run Bugs in SQL server preventing total cleanup of data (All the published data above is published into the TFS database as attachments.) The test results will include all data being collected during the run.  Some of this data can grow rather large, like IntelliTrace logs and video recordings.   Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data.   In order to handle this systematically, I have set up a 3-stage process: Find out if you have a database space issue Set up your TFS server to minimize potential database issues If you have the “problem”, clean up the database and otherwise keep it clean   Analyze the data Are your database( s) growing ?  Are unused test results growing out of proportion ? To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some  more detailed information. If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them.  But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them. If you have multiple Project Collections, find out which might have problems: (Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed.  I will try to update them for Dev-11 when it is released.) Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases. Use the query below to find the Project Collection databases and their sizes, in descending size order.  use master select DB_NAME(database_id) AS DBName, (size/128) SizeInMB FROM sys.master_files where type=0 and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc Doing this on one of our SQL servers gives the following results: It is pretty easy to see on which collection to start the work   Find out which tables are possibly too large Keep a special watch out for the Tfs_Attachment table. Use the script at the bottom of Grant’s blog to find the table sizes in descending size order. In our case we got this result: From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent.   Find out which team projects have possibly too large attachments In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC. Use the following query to find this, replace the collection database name with whatever applies in your case:   use Tfs_DefaultCollection select p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by p.projectname order by sum(a.compressedlength) desc In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years: As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones.  Check which attachment types takes up the most space It can be nice to know which attachment types takes up the space, so run the following query: use Tfs_DefaultCollection select a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by a.attachmenttype order by sum(a.compressedlength) desc We then got this result: From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog. Check which file types, by their extension, takes up the most space Run the following query use Tfs_DefaultCollection select SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999) order by sum(compressedlength) desc This gives a result like this:   Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later.    Get your TFS server and environment properly set up Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized.  To ensure this you should install the following set of updates and components. The assumption is that your TFS Server is at SP1 level. Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs. Binaries will still be uploaded if: Code coverage is enabled in the test settings. You change the UploadDeploymentItem to true in the testsettings file. Be aware that this might be reset back to false by another user which haven't installed this QFE. The hotfix should be installed to The build servers (the build agents) The machine hosting the Test Controller Local development computers (Visual Studio) Local test computers (MTM) It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. If you use the SQL Server 2008 R2 you should also install the CU 10 (or later).  This CU fixes a potential problem of hanging “ghost” files.  This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed. There is no such CU for SQL Server 2008 pre-R2 Work around:  If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query: use master SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname', index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED') The problem is a stalled ghost cleanup process.  Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again.  (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again. The fix will come in the next CU cycle for SQL Server R2 SP1.  The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU. The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data.  The SQL Server can get into this hanging state (without the QFE) in certain cases due to this. And of course, install and set up the Test Attachment Cleaner command line power tool.  This should be done following some guidelines from Ravi Shanker: “When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “ This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly. “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system” This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command. Cleaning out the attachments The TAC is run from the command line using a set of parameters and controlled by a settingsfile.  The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project.  When you install the TAC there is a very useful readme file in the same directory. When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up. Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general There are two scenarios we need to consider: Cleaning up an existing overgrown database Maintaining a server to avoid an overgrown database using scheduled TAC   1. Cleaning up a database which has grown too big due to these attachments. This job is a “Once” job.  We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below.  In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about  the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below). Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items.  Grant’s settings file is an example of the last one.  A settings file to remove only large attachments could look like this: <!-- Scenario : Remove large files --> <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> </Attachment> </DeletionCriteria> Or like this: If you want only to remove dll’s and pdb’s about that size, add an Extensions-section.  Without that section, all extensions will be deleted. <!-- Scenario : Remove large files of type dll's and pdb's --> <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="dll" /> <Include value="pdb" /> </Extensions> </Attachment> </DeletionCriteria> Before you start up your scheduled maintenance, you should clear out all older items. 2. Scheduled maintenance using the TAC If you run a schedule every night, and remove old items, and also remove them in small batches.  It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better. One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs.  Doing this every night ensures that only small amounts of data is deleted. <!-- Scenario : Remove old items except if they have active or resolved bugs --> <DeletionCriteria> <TestRun> <AgeInDays OlderThan="180" /> </TestRun> <Attachment /> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved"/> </LinkedBugs> </DeletionCriteria> In my experience there are projects which are left with active or resolved workitems, akthough no further work is done.  It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section. A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about.  Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data. <!-- Scenario : Remove specific files early --> <DeletionCriteria> <TestRun > <AgeInDays OlderThan="30" /> </TestRun> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="iTrace"/> <Include value="dll"/> <Include value="pdb"/> <Include value="wmv"/> </Extensions> </Attachment> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved" /> </LinkedBugs> </DeletionCriteria> The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension. To run the tool do the following command: tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete   Shrinking the database You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted.  In this case you SHOULD do it, to free up all that space.  But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough. For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records.  In fact, it is regarded as a bad practice to shrink the database regularly.  So on a daily maintenance schedule you should NOT shrink the database. To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards.  I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

    Read the article

  • Using Aggregate functions in DataView filters

    - by Shrewd Demon
    hi, i have a DataTable that has a column ("Profit"). What i want is to get the Sum of all the values in this table. I tried to do this in the following manner... DataTable dsTemp = new DataTable(); dsTemp.Columns.Add("Profit"); DataRow dr = null; dr = dsTemp.NewRow(); dr["Profit"] = 100; dsTemp.Rows.Add(dr); dr = dsTemp.NewRow(); dr["Profit"] = 200; dsTemp.Rows.Add(dr); DataView dvTotal = dsTemp.DefaultView; dvTotal.RowFilter = " SUM ( Profit ) "; DataTable dt = dvTotal.ToTable(); But i get an error while applying the filter... how can i get the Sum of the Profit column in a variable thank you...

    Read the article

  • [hibernate - jpa ] good practices and bad practices

    - by blow
    Hi all, i have some questions about interaction with hibernate. openSession or getCurrentSession (without jta, thread insted)? How mix session operations with swing gui? Is good have something like this in a javabean class? public void actionPerformed(ActionEvent event) { // session code } Can i add methods to my entities that contains hql queries or is a bad practice? For example: // This method is in an entity MyOtherEntity.java class public int getDuration(){ Session session=HibernateUtil.getSessionFactory().getCurrentSession(); session.beginTransaction(); int sum=(Integer)session.createQuery("select sum(e.duration) as duration from MyEntity as e where e.myOtherEntity.id=:id group by e.name"). .setLong("id", getId()); .uniqueResult(); return sum; } In alternative how can i do this in a better and elegant way? Thanks.

    Read the article

  • SumProduct over sets of cells (not contiguous)

    - by Craig
    I have a total data set that is for 4 different groupings. One of the values is the average time, the other is count. For the Total I have to multiply these and then divide by the total of the count. Currently I use: =SUM(D32*D2,D94*D64,D156*D126,D218*D188)/SUM(D32,D94,D156,D218) I would rather use a SumProduct if I can to make it more readable. I tried to do: =SUMPRODUCT((D2,D64,D126,D188),(D32,D94,D156,D218))/SUM(D32,94,D156,D218) But as you can tell by my posting here, that did not work. Is there a way to do SumProduct like I want? Thoughts, Answers, Questions, Comments? Craig

    Read the article

  • Recursive languages vs context-sensitive languages

    - by teehoo
    In Chomsky's hierarchy, the set of recursive languages is not defined. I know that recursive languages are a subset of recursively enumerable languages and that all recursive languages are decidable. What I'm curious about is how recursive languages compare to context-sensitive languages. Can I assume that context-sensitive languages are a strict subset of recursive languages, and therefore all context-sensitive languages are decidable?

    Read the article

  • Why unsigned int contained negative number

    - by Daziplqa
    Hi All, I am new to C, What I know about unsigned numerics (unsigned short, int and longs), that It contains positive numbers only, but the following simple program successfully assigned a negative number to an unsigned int: 1 /* 2 * ===================================================================================== 3 * 4 * Filename: prog4.c 5 * 6 * ===================================================================================== 7 */ 8 9 #include <stdio.h> 10 11 int main(void){ 12 13 int v1 =0, v2=0; 14 unsigned int sum; 15 16 v1 = 10; 17 v2 = 20; 18 19 sum = v1 - v2; 20 21 printf("The subtraction of %i from %i is %i \n" , v1, v2, sum); 22 23 return 0; 24 } The output is : The subtraction of 10 from 20 is -10

    Read the article

  • PHP New Line Help

    - by Jeremy Person
    I'm just starting PHP programming as you can tell. I want the results of each statement to be on their own line but can't get it to work yet. I tried referring to the code on this page and am obvsiously doing something wrong. Thank you very much. <? $mySentence="This is a sentence 123456789."; $myNumber1 = 9.5; $myNumber2 = .5; $sum = $myNumber1 + $myNumber2; echo "Hello, lets display our PHP variables: \r"; echo $mySentence; echo "The sum of $myNumber 1 and $myNumber2 = $sum "; echo "\"This text has double quotes\""; ?>

    Read the article

  • ASP.Net Custom Field From Query In DataSet

    - by boruchsiper
    I added a new query to a table adapter in a DataSet. This query adds another field to the query whcih is a sum from another table. Here is the full query: SELECT (SELECT COUNT(donationID) AS Expr1 FROM Donations AS da WHERE (dn.donorID = donorID)) AS Count, Solicitor, address1, address2, city, companyName, country, donorID, email, first, last, phoneHome, phoneMobile, phoneWork, state, webURL, zip, (select sum(amount) from Donations as dna where dna.donorID = dn.donorID) as SumDonations FROM Donors AS dn order by last The new field is represented in the last part of the query: (select sum(amount) from Donations as dna where dna.donorID = dn.donorID) as SumDonations I can preview the data in the xsd but the last field "SumDonations" is not showing up as a field I can add to my gridview. I rebuilt the website but no luck. What am I missing?

    Read the article

  • Reporting Services is displaying extra rows for minimised row groups

    - by Graphain
    Hi, I have a fairly basic SQL Server Reporting Services report that is using nested row groups. Each sub-group depends on expanding its parent to be visible which is all pretty standard. The layout is something like this: { Company { { Car SUM(Price) { { { Part Price My desired result when expanded is something like this (which I get fine): - SuperCarCompany - SuperCar 20 Door 20 - SuperCar2 70 Door 30 Window 40 - OtherCarCompany - SuperCar2 50 /* Same SuperCar2 */ Door 50 - MoreCarCompany - BestCarEver 535 Engine 500 Door 30 Window 5 And when opened initially something like this: + SuperCarCompany + OtherCarCompany + MoreCarCompany However, I'm getting this: + SuperCarCompany + SuperCar2 70 (i.e. sum of all SuperCar2) + OtherCarCompany + SuperCar 20 + MoreCarCompany + BestCarEver 535 and I can even expand these superfluous rows like this: + SuperCarCompany - SuperCar2 70 (i.e. sum of all SuperCar2) Door 30 (i.e. first child of any SuperCar2) The superflous rows dissapear immediately when I expand the expected row above it (i.e. I'd need to expand all expected rows to get rid of all superflous rows). Any idea on the cause?

    Read the article

  • how to find maximum frequent item sets from large transactional data file

    - by ANIL MANE
    Hi, I have the input file contains large amount of transactions like Transaction ID Items T1 Bread, milk, coffee, juice T2 Juice, milk, coffee T3 Bread, juice T4 Coffee, milk T5 Bread, Milk T6 Coffee, Bread T7 Coffee, Bread, Juice T8 Bread, Milk, Juice T9 Milk, Bread, Coffee, T10 Bread T11 Milk T12 Milk, Coffee, Bread, Juice i want the occurrence of every unique item like Item Name Count Bread 9 Milk 8 Coffee 7 Juice 6 and from that i want an a fp-tree now by traversing this tree i want the maximal frequent itemsets as follows The basic idea of method is to dispose nodes in each “layer” from bottom to up. The concept of “layer” is different to the common concept of layer in a tree. Nodes in a “layer” mean the nodes correspond to the same item and be in a linked list from the “Head Table”. For nodes in a “layer” NBN method will be used to dispose the nodes from left to right along the linked list. To use NBN method, two extra fields will be added to each node in the ordered FP-Tree. The field tag of node N stores the information of whether N is maximal frequent itemset, and the field count’ stores the support count information in the nodes at left. In Figure, the first node to be disposed is “juice: 2”. If the min_sup is equal to or less than 2 then “bread, milk, coffee, juice” is a maximal frequent itemset. Firstly output juice:2 and set the field tag of “coffee:3” as “false” (the field tag of each node is “true” initially ). Next check whether the right four itemsets juice:1 be the subset of juice:2. If the itemset one node “juice:1” corresponding to is the subset of juice:2 set the field tag of the node “false”. In the following process when the field tag of the disposed node is FALSE we can omit the node after the same tagging. If the min_sup is more than 2 then check whether the right four juice:1 is the subset of juice:2. If the itemset one node “juice:1” corresponding to is the subset of juice:2 then set the field count’ of the node with the sum of the former count’ and 2 After all the nodes “juice” disposed ,begin to dispose the node “coffee:3”. Any suggestions or available source code, welcome. thanks in advance

    Read the article

  • Optimization of running total calculation in SQL for multiple values per join condition

    - by Kiril
    I have the following table (test_table): date value --------------- d1 10.0 d1 20.0 d2 60.0 d2 10.0 d2 -20.0 d3 40.0 I calculate the running total as follows. I use the same query twice, because first I need to calculate the values for a specifi date, and afterwards I can calculate the running total. Otherwise, joining the two tables where date is not unique, I would get too many results from the join: SELECT t1.date, SUM(t2.value) AS total FROM (SELECT date, SUM(value) AS value FROM test_table GROUP BY date) AS t1 JOIN (SELECT date, SUM(value) AS value FROM test_table GROUP BY date) AS t2 ON t1.date >= t2.date GROUP BY t1.date ORDER BY t1.date This gives me (which is fine): date total ------------- d1 30.0 d2 80.0 d3 120.0 BUT, this query isn't very efficient, because I need to change conditions in two places, if necessary. In production, the test_table is a lot bigger ( 4 Mio. rows), and the query takes too much time to complete. Question: How can I avoid using the same query twice?

    Read the article

  • Odd GROUP BY output DB2 - Results not as expected

    - by CallCthulhu
    If I run the following query: select load_cyc_num , crnt_dnlq_age_cde , sum(cc_min_pymt_amt) as min_pymt , sum(ec_tot_bal) as budget , case when ec_tot_bal 0 then 'Y' else 'N' end as budget , case when ac_stat_cde in ('A0P','A1P','ARP','A3P') then 'Y' else 'N' end as arngmnt , sum(sn_close_bal) as st_bal from statements where (sn_close_bal 0 or ec_tot_bal 0) and load_cyc_num in (200911) group by load_cyc_num , crnt_dnlq_age_cde , case when ec_tot_bal 0 then 'Y' else 'N' end , case when ac_stat_cde in ('A0P','A1P','ARP','A3P') then 'Y' else 'N' end then I get the correct "BUDGET" grouping, but not the correct "ARRANGEMENT" grouping, only two rows have a "Y". If I change the order of the case statements in the GROUP BY, then I get the correct grouping (full Y-N breakdown for both columns). Am I missing something obvious?

    Read the article

  • whats the diference between train, validation and test set, in neural networks?

    - by Daniel
    Im using this library http://pastebin.com/raw.php?i=aMtVv4RZ to implement a learning agent. I have generated the train cases, but i dont know for sure what are the validation and test sets, the teacher says: 70% should be train cases, 10% will be test cases and the rest 20% should be validation cases. Thanks. edit i have this code, for training.. but i have no ideia when to stop training.. def train(self, train, validation, N=0.3, M=0.1): # N: learning rate # M: momentum factor accuracy = list() while(True): error = 0.0 for p in train: input, target = p self.update(input) error = error + self.backPropagate(target, N, M) print "validation" total = 0 for p in validation: input, target = p output = self.update(input) total += sum([abs(target - output) for target, output in zip(target, output)]) #calculates sum of absolute diference between target and output accuracy.append(total) print min(accuracy) print sum(accuracy[-5:])/5 #if i % 100 == 0: print 'error %-14f' % error if ? < ?: break

    Read the article

  • Excel > Microsoft Query > SQL Server > Multiple Parameters

    - by pojomx
    Hi, Im relatively new to sql server and excel/microsoft query, I have a query like this Select ...[data]...B1.b,B2.b,B3.b From TABLEA Inner join ( SELECT ---[data]...sum(...) as b From TABLEB WHERE Date between [startdate] and [enddate] ) as B1 Inner join ( SELECT ---[data]...sum(...) as b From TABLEB WHERE Date between [startdate-1week] and [enddate] ) as B2 Inner join ( SELECT ---[data]...sum(...) as b From TABLEB WHERE Date between [startdate-2weeks] and [enddate] ) as B3 Where Date between [startdate] and [enddate] It works, when i introduce the dates manually, but i need them to be "Dynamic" (introduced from excel) but, when I put the "?" (for parameters) on all the dates, it throws an error. "Invalid Parameter Number" :D How can i make this work, within excel? Im using SQL Server and Microsoft Query Connection Data.

    Read the article

  • Can't get multiple panel plots with chartSeries function from quantod package in R

    - by Milktrader
    Jeff Ryan's quantmod package is an excellent contribution to the R finance world. I like to use chartSeries() function, but when I try to get it to display multiple panes simultaneously, it doesn't work. par(mfrow=c(2,2)) chartSeries (SPX) chartSeries (SPX, subset="2010") chartSeries (NDX) chartSeries (NDX, subset="2010") would normally return a four-panel graphic as it does with the plot() function but in the chartSeries example it runs through all instances one at a time without creating a single four-panel graphic.

    Read the article

< Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38  | Next Page >