Search Results

Search found 2156 results on 87 pages for 'weighted average'.

Page 16/87 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >

Can you configure the sample length for mod_status requests per second?

- by D-Rock

The Apache Module mod_status allows you to view the current average requests per second being served. It measures this average over a fairly long period of time however (I don't know exactly how long). Is it possible to modify this length to something shorter so that I can get a more accurate idea of the current load on the server?

Read the article
Memcached Lagging

- by Brad Dwyer

Let me preface this by saying that this is a followup question to this topic. That was "solved" by switching from Solaris (SmartOS) to Ubuntu for the memcached server. Now we've multiplied load by about 5x and are running into problems again. We are running a site that is doing about 1000 requests/minute, each request hits Memcached with approximately 3 reads and 1 write. So load is approximately 65 requests per second. Total data in the cache is about 37M, and each key contains a very small amount of data (a JSON-encoded array of integers amounting to less than 1K). We have setup a benchmarking script on these pages and fed the data into StatsD for logging. The problem is that there are spikes where Memcached takes a very long time to respond. These do not appear to correlate with spikes in traffic. What could be causing these spikes? Why would memcached take over a second to reply? We just booted up a second server to put in the pool and it didn't make any noticeable difference in the frequency or severity of the spikes. This is the output of getStats() on the servers: Array ( [-----------] => Array ( [pid] => 1364 [uptime] => 3715684 [threads] => 4 [time] => 1336596719 [pointer_size] => 64 [rusage_user_seconds] => 7924 [rusage_user_microseconds] => 170000 [rusage_system_seconds] => 187214 [rusage_system_microseconds] => 190000 [curr_items] => 12578 [total_items] => 53516300 [limit_maxbytes] => 943718400 [curr_connections] => 14 [total_connections] => 72550117 [connection_structures] => 165 [bytes] => 2616068 [cmd_get] => 450388258 [cmd_set] => 53493365 [get_hits] => 450388258 [get_misses] => 2244297 [evictions] => 0 [bytes_read] => 2138744916 [bytes_written] => 745275216 [version] => 1.4.2 ) [-----------:11211] => Array ( [pid] => 8099 [uptime] => 4687 [threads] => 4 [time] => 1336596719 [pointer_size] => 64 [rusage_user_seconds] => 7 [rusage_user_microseconds] => 170000 [rusage_system_seconds] => 290 [rusage_system_microseconds] => 990000 [curr_items] => 2384 [total_items] => 225964 [limit_maxbytes] => 943718400 [curr_connections] => 7 [total_connections] => 588097 [connection_structures] => 91 [bytes] => 562641 [cmd_get] => 1012562 [cmd_set] => 225778 [get_hits] => 1012562 [get_misses] => 125161 [evictions] => 0 [bytes_read] => 91270698 [bytes_written] => 350071516 [version] => 1.4.2 ) ) Edit: Here is the result of a set and retrieve of 10,000 values. Normal: Stored 10000 values in 5.6118 seconds. Average: 0.0006 High: 0.1958 Low: 0.0003 Fetched 10000 values in 5.1215 seconds. Average: 0.0005 High: 0.0141 Low: 0.0003 When Spiking: Stored 10000 values in 16.5074 seconds. Average: 0.0017 High: 0.9288 Low: 0.0003 Fetched 10000 values in 19.8771 seconds. Average: 0.0020 High: 0.9478 Low: 0.0003

Read the article
Working with PivotTables in Excel

- by Mark Virtue

PivotTables are one of the most powerful features of Microsoft Excel. They allow large amounts of data to be analyzed and summarized in just a few mouse clicks. In this article, we explore PivotTables, understand what they are, and learn how to create and customize them. Note: This article is written using Excel 2010 (Beta). The concept of a PivotTable has changed little over the years, but the method of creating one has changed in nearly every iteration of Excel. If you are using a version of Excel that is not 2010, expect different screens from the ones you see in this article. A Little History In the early days of spreadsheet programs, Lotus 1-2-3 ruled the roost. Its dominance was so complete that people thought it was a waste of time for Microsoft to bother developing their own spreadsheet software (Excel) to compete with Lotus. Flash-forward to 2010, and Excel’s dominance of the spreadsheet market is greater than Lotus’s ever was, while the number of users still running Lotus 1-2-3 is approaching zero. How did this happen? What caused such a dramatic reversal of fortunes? Industry analysts put it down to two factors: Firstly, Lotus decided that this fancy new GUI platform called “Windows” was a passing fad that would never take off. They declined to create a Windows version of Lotus 1-2-3 (for a few years, anyway), predicting that their DOS version of the software was all anyone would ever need. Microsoft, naturally, developed Excel exclusively for Windows. Secondly, Microsoft developed a feature for Excel that Lotus didn’t provide in 1-2-3, namely PivotTables. The PivotTables feature, exclusive to Excel, was deemed so staggeringly useful that people were willing to learn an entire new software package (Excel) rather than stick with a program (1-2-3) that didn’t have it. This one feature, along with the misjudgment of the success of Windows, was the death-knell for Lotus 1-2-3, and the beginning of the success of Microsoft Excel. Understanding PivotTables So what is a PivotTable, exactly? Put simply, a PivotTable is a summary of some data, created to allow easy analysis of said data. But unlike a manually created summary, Excel PivotTables are interactive. Once you have created one, you can easily change it if it doesn’t offer the exact insights into your data that you were hoping for. In a couple of clicks the summary can be “pivoted” – rotated in such a way that the column headings become row headings, and vice versa. There’s a lot more that can be done, too. Rather than try to describe all the features of PivotTables, we’ll simply demonstrate them… The data that you analyze using a PivotTable can’t be just any data – it has to be raw data, previously unprocessed (unsummarized) – typically a list of some sort. An example of this might be the list of sales transactions in a company for the past six months. Examine the data shown below: Notice that this is not raw data. In fact, it is already a summary of some sort. In cell B3 we can see $30,000, which apparently is the total of James Cook’s sales for the month of January. So where is the raw data? How did we arrive at the figure of $30,000? Where is the original list of sales transactions that this figure was generated from? It’s clear that somewhere, someone must have gone to the trouble of collating all of the sales transactions for the past six months into the summary we see above. How long do you suppose this took? An hour? Ten? Probably. If we were to track down the original list of sales transactions, it might look something like this: You may be surprised to learn that, using the PivotTable feature of Excel, we can create a monthly sales summary similar to the one above in a few seconds, with only a few mouse clicks. We can do this – and a lot more too! How to Create a PivotTable First, ensure that you have some raw data in a worksheet in Excel. A list of financial transactions is typical, but it can be a list of just about anything: Employee contact details, your CD collection, or fuel consumption figures for your company’s fleet of cars. So we start Excel… …and we load such a list… Once we have the list open in Excel, we’re ready to start creating the PivotTable. Click on any one single cell within the list: Then, from the Insert tab, click the PivotTable icon: The Create PivotTable box appears, asking you two questions: What data should your new PivotTable be based on, and where should it be created? Because we already clicked on a cell within the list (in the step above), the entire list surrounding that cell is already selected for us ($A$1:$G$88 on the Payments sheet, in this example). Note that we could select a list in any other region of any other worksheet, or even some external data source, such as an Access database table, or even a MS-SQL Server database table. We also need to select whether we want our new PivotTable to be created on a new worksheet, or on an existing one. In this example we will select a new one: The new worksheet is created for us, and a blank PivotTable is created on that worksheet: Another box also appears: The PivotTable Field List. This field list will be shown whenever we click on any cell within the PivotTable (above): The list of fields in the top part of the box is actually the collection of column headings from the original raw data worksheet. The four blank boxes in the lower part of the screen allow us to choose the way we would like our PivotTable to summarize the raw data. So far, there is nothing in those boxes, so the PivotTable is blank. All we need to do is drag fields down from the list above and drop them in the lower boxes. A PivotTable is then automatically created to match our instructions. If we get it wrong, we only need to drag the fields back to where they came from and/or drag new fields down to replace them. The Values box is arguably the most important of the four. The field that is dragged into this box represents the data that needs to be summarized in some way (by summing, averaging, finding the maximum, minimum, etc). It is almost always numerical data. A perfect candidate for this box in our sample data is the “Amount” field/column. Let’s drag that field into the Values box: Notice that (a) the “Amount” field in the list of fields is now ticked, and “Sum of Amount” has been added to the Values box, indicating that the amount column has been summed. If we examine the PivotTable itself, we indeed find the sum of all the “Amount” values from the raw data worksheet: We’ve created our first PivotTable! Handy, but not particularly impressive. It’s likely that we need a little more insight into our data than that. Referring to our sample data, we need to identify one or more column headings that we could conceivably use to split this total. For example, we may decide that we would like to see a summary of our data where we have a row heading for each of the different salespersons in our company, and a total for each. To achieve this, all we need to do is to drag the “Salesperson” field into the Row Labels box: Now, finally, things start to get interesting! Our PivotTable starts to take shape…. With a couple of clicks we have created a table that would have taken a long time to do manually. So what else can we do? Well, in one sense our PivotTable is complete. We’ve created a useful summary of our source data. The important stuff is already learned! For the rest of the article, we will examine some ways that more complex PivotTables can be created, and ways that those PivotTables can be customized. First, we can create a two-dimensional table. Let’s do that by using “Payment Method” as a column heading. Simply drag the “Payment Method” heading to the Column Labels box: Which looks like this: Starting to get very cool! Let’s make it a three-dimensional table. What could such a table possibly look like? Well, let’s see… Drag the “Package” column/heading to the Report Filter box: Notice where it ends up…. This allows us to filter our report based on which “holiday package” was being purchased. For example, we can see the breakdown of salesperson vs payment method for all packages, or, with a couple of clicks, change it to show the same breakdown for the “Sunseekers” package: And so, if you think about it the right way, our PivotTable is now three-dimensional. Let’s keep customizing… If it turns out, say, that we only want to see cheque and credit card transactions (i.e. no cash transactions), then we can deselect the “Cash” item from the column headings. Click the drop-down arrow next to Column Labels, and untick “Cash”: Let’s see what that looks like…As you can see, “Cash” is gone. Formatting This is obviously a very powerful system, but so far the results look very plain and boring. For a start, the numbers that we’re summing do not look like dollar amounts – just plain old numbers. Let’s rectify that. A temptation might be to do what we’re used to doing in such circumstances and simply select the whole table (or the whole worksheet) and use the standard number formatting buttons on the toolbar to complete the formatting. The problem with that approach is that if you ever change the structure of the PivotTable in the future (which is 99% likely), then those number formats will be lost. We need a way that will make them (semi-)permanent. First, we locate the “Sum of Amount” entry in the Values box, and click on it. A menu appears. We select Value Field Settings… from the menu: The Value Field Settings box appears. Click the Number Format button, and the standard Format Cells box appears: From the Category list, select (say) Accounting, and drop the number of decimal places to 0. Click OK a few times to get back to the PivotTable… As you can see, the numbers have been correctly formatted as dollar amounts. While we’re on the subject of formatting, let’s format the entire PivotTable. There are a few ways to do this. Let’s use a simple one… Click the PivotTable Tools/Design tab: Then drop down the arrow in the bottom-right of the PivotTable Styles list to see a vast collection of built-in styles: Choose any one that appeals, and look at the result in your PivotTable: Other Options We can work with dates as well. Now usually, there are many, many dates in a transaction list such as the one we started with. But Excel provides the option to group data items together by day, week, month, year, etc. Let’s see how this is done. First, let’s remove the “Payment Method” column from the Column Labels box (simply drag it back up to the field list), and replace it with the “Date Booked” column: As you can see, this makes our PivotTable instantly useless, giving us one column for each date that a transaction occurred on – a very wide table! To fix this, right-click on any date and select Group… from the context-menu: The grouping box appears. We select Months and click OK: Voila! A much more useful table: (Incidentally, this table is virtually identical to the one shown at the beginning of this article – the original sales summary that was created manually.) Another cool thing to be aware of is that you can have more than one set of row headings (or column headings): …which looks like this…. You can do a similar thing with column headings (or even report filters). Keeping things simple again, let’s see how to plot averaged values, rather than summed values. First, click on “Sum of Amount”, and select Value Field Settings… from the context-menu that appears: In the Summarize value field by list in the Value Field Settings box, select Average: While we’re here, let’s change the Custom Name, from “Average of Amount” to something a little more concise. Type in something like “Avg”: Click OK, and see what it looks like. Notice that all the values change from summed totals to averages, and the table title (top-left cell) has changed to “Avg”: If we like, we can even have sums, averages and counts (counts = how many sales there were) all on the same PivotTable! Here are the steps to get something like that in place (starting from a blank PivotTable): Drag “Salesperson” into the Column Labels Drag “Amount” field down into the Values box three times For the first “Amount” field, change its custom name to “Total” and it’s number format to Accounting (0 decimal places) For the second “Amount” field, change its custom name to “Average”, its function to Average and it’s number format to Accounting (0 decimal places) For the third “Amount” field, change its name to “Count” and its function to Count Drag the automatically created field from Column Labels to Row Labels Here’s what we end up with: Total, average and count on the same PivotTable! Conclusion There are many, many more features and options for PivotTables created by Microsoft Excel – far too many to list in an article like this. To fully cover the potential of PivotTables, a small book (or a large website) would be required. Brave and/or geeky readers can explore PivotTables further quite easily: Simply right-click on just about everything, and see what options become available to you. There are also the two ribbon-tabs: PivotTable Tools/Options and Design. It doesn’t matter if you make a mistake – it’s easy to delete the PivotTable and start again – a possibility old DOS users of Lotus 1-2-3 never had. We’ve included an Excel that should work with most versions of Excel, so you can download to practice your PivotTable skills. Download Our Practice Excel File Similar Articles Productive Geek Tips Magnify Selected Cells In Excel 2007Share Access Data with Excel in Office 2010Make Excel 2007 Print Gridlines In Workbook FileMake Excel 2007 Always Save in Excel 2003 FormatConvert Older Excel Documents to Excel 2007 Format TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 PCmover Professional Ben & Jerry’s Free Cone Day, 3/23/10 New Stinger from McAfee Helps Remove ‘FakeAlert’ Threats Google Apps Marketplace: Tools & Services For Google Apps Users Get News Quick and Precise With Newser Scan for Viruses in Ubuntu using ClamAV Replace Your Windows Task Manager With System Explorer

Read the article
General monitoring for SQL Server Analysis Services using Performance Monitor

- by Testas

A recent customer engagement required a setup of a monitoring solution for SSAS, due to the time restrictions placed upon this, native Windows Performance Monitor (Perfmon) and SQL Server Profiler Monitoring Tools was used as using a third party tool would have meant the customer providing an additional monitoring server that was not available.I wanted to outline the performance monitoring counters that was used to monitor the system on which SSAS was running. Due to the slow query performance that was occurring during certain scenarios, perfmon was used to establish if any pressure was being placed on the Disk, CPU or Memory subsystem when concurrent connections access the same query, and Profiler to pinpoint how the query was being managed within SSAS, profiler I will leave for another blogThis guide is not designed to provide a definitive list of what should be used when monitoring SSAS, different situations may require the addition or removal of counters as presented by the situation. However I hope that it serves as a good basis for starting your monitoring of SSAS. I would also like to acknowledge Chris Webb’s awesome chapters from “Expert Cube Development” that also helped shape my monitoring strategy:http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!6657.entrySimulating ConnectionsTo simulate the additional connections to the SSAS server whilst monitoring, I used ascmd to simulate multiple connections to the typical and worse performing queries that were identified by the customer. A similar sript can be downloaded from codeplex at http://www.codeplex.com/SQLSrvAnalysisSrvcs. File name: ASCMD_StressTestingScripts.zip. Performance MonitorWithin performance monitor, a counter log was created that contained the list of counters below. The important point to note when running the counter log is that the RUN AS property within the counter log properties should be changed to an account that has rights to the SSAS instance when monitoring MSAS counters. Failure to do so means that the counter log runs under the system account, no errors or warning are given while running the counter log, and it is not until you need to view the MSAS counters that they will not be displayed if run under the default account that has no right to SSAS. If your connection simulation takes hours, this could prove quite frustrating if not done beforehand JThe counters used…… Object Counter Instance Justification System Processor Queue legnth N/A Indicates how many threads are waiting for execution against the processor. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. System Context Switches/sec N/A Measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if the Processor\Interrupts/sec\(_Total) counter on your machine shows a similar unexplained increase Process % Processor Time sqlservr Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process % Processor Time msmdsrv Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process Working Set sqlservr If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Process Working Set msmdsrv If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Processor % Processor Time _Total and individual cores measures the total utilization of your processor by all running processes. If multi-proc then be mindful only an average is provided Processor % Privileged Time _Total To see how the OS is handling basic IO requests. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. Processor % User Time _Total To see how the applications is interacting from a processor perspective, a high percentage utilisation determine that the server is dealing with too many apps and may require increasing thje hardware or scaling out Processor Interrupts/sec _Total The average rate, in incidents per second, at which the processor received and serviced hardware interrupts. Shoulr be consistant over time but a sudden unexplained increase could indicate a device malfunction which can be confirmed using the System\Context Switches/sec counter Memory Pages/sec N/A Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays, this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. A good idea here is to configure a perfmon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system. May also want to see the configuration of the page file on the Server Memory Available Mbytes N/A is the amount of physical memory, in bytes, available to processes running on the computer. if this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM. monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM. Physical Disk Disk Transfers/sec for each physical disk If it goes above 10 disk I/Os per second then you've got poor response time for your disk. Physical Disk Idle Time _total If Disk Transfers/sec is above 25 disk I/Os per second use this counter. which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. Physical Disk Disk queue legnth For the OLAP and SQL physical disk A value that is consistently less than 2 means that the disk system is handling the IO requests against the physical disk Network Interface Bytes Total/sec For the NIC Should be monitored over a period of time to see if there is anb increase/decrease in network utilisation Network Interface Current Bandwidth For the NIC is an estimate of the current bandwidth of the network interface in bits per second (BPS). MSAS 2005: Memory Memory Limit High KB N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Limit Low KB N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Usage KB N/A Displays the memory usage of the server process. MSAS 2005: Memory File Store KB N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache MSAS 2005: Storage Engine Query Queries from Cache Direct / sec N/A Displays the rate of queries answered from the cache directly MSAS 2005: Storage Engine Query Queries from Cache Filtered / Sec N/A Displays the Rate of queries answered by filtering existing cache entry. MSAS 2005: Storage Engine Query Queries from File / Sec N/A Displays the Rate of queries answered from files. MSAS 2005: Storage Engine Query Average time /query N/A Displays the average time of a query MSAS 2005: Connection Current connections N/A Displays the number of connections against the SSAS instance MSAS 2005: Connection Requests / sec N/A Displays the rate of query requests per second MSAS 2005: Locks Current Lock Waits N/A Displays thhe number of connections waiting on a lock MSAS 2005: Threads Query Pool job queue Length N/A The number of queries in the job queue MSAS 2005:Proc Aggregations Temp file bytes written/sec N/A Shows the number of bytes of data processed in a temporary file MSAS 2005:Proc Aggregations Temp file rows written/sec N/A Shows the number of bytes of data processed in a temporary file

Read the article
Cost Comparison Hard Disk Drive to Solid State Drive on Price per Gigabyte - dispelling a myth!

- by tonyrogerson

It is often said that Hard Disk Drive storage is significantly cheaper per GiByte than Solid State Devices – this is wholly inaccurate within the database space. People need to look at the cost of the complete solution and not just a single component part in isolation to what is really required to meet the business requirement. Buying a single Hitachi Ultrastar 600GB 3.5” SAS 15Krpm hard disk drive will cost approximately £239.60 (http://scan.co.uk, 22nd March 2012) compared to an OCZ 600GB Z-Drive R4 CM84 PCIe costing £2,316.54 (http://scan.co.uk, 22nd March 2012); I’ve not included FusionIO ioDrive because there is no public pricing available for it – something I never understand and personally when companies do this I immediately think what are they hiding, luckily in FusionIO’s case the product is proven though is expensive compared to OCZ enterprise offerings. On the face of it the single 15Krpm hard disk has a price per GB of £0.39, the SSD £3.86; this is what you will see in the press and this is what sales people will use in comparing the two technologies – do not be fooled by this bullshit people! What is the requirement? The requirement is the database will have a static size of 400GB kept static through archiving so growth and trim will balance the database size, the client requires resilience, there will be several hundred call centre staff querying the database where queries will read a small amount of data but there will be no hot spot in the data so the randomness will come across the entire 400GB of the database, estimates predict that the IOps required will be approximately 4,000IOps at peak times, because it’s a call centre system the IO latency is important and must remain below 5ms per IO. The balance between read and write is 70% read, 30% write. The requirement is now defined and we have three of the most important pieces of the puzzle – space required, estimated IOps and maximum latency per IO. Something to consider with regard SQL Server; write activity requires synchronous IO to the storage media specifically the transaction log; that means the write thread will wait until the IO is completed and hardened off until the thread can continue execution, the requirement has stated that 30% of the system activity will be write so we can expect a high amount of synchronous activity. The hardware solution needs to be defined; two possible solutions: hard disk or solid state based; the real question now is how many hard disks are required to achieve the IO throughput, the latency and resilience, ditto for the solid state. Hard Drive solution On a test on an HP DL380, P410i controller using IOMeter against a single 15Krpm 146GB SAS drive, the throughput given on a transfer size of 8KiB against a 40GiB file on a freshly formatted disk where the partition is the only partition on the disk thus the 40GiB file is on the outer edge of the drive so more sectors can be read before head movement is required: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 3,733 IOps at an average latency of 34.06ms (34 MiB/s). The same test was done on the same disk but the test file was 130GiB: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 528 IOps at an average latency of 217.49ms (4 MiB/s). From the result it is clear random performance gets worse as the disk fills up – I’m currently writing an article on short stroking which will cover this in detail. Given the work load is random in nature looking at the random performance of the single drive when only 40 GiB of the 146 GB is used gives near the IOps required but the latency is way out. Luckily I have tested 6 x 15Krpm 146GB SAS 15Krpm drives in a RAID 0 using the same test methodology, for the same test above on a 130 GiB for each drive added the performance boost is near linear, for each drive added throughput goes up by 5 MiB/sec, IOps by 700 IOps and latency reducing nearly 50% per drive added (172 ms, 94 ms, 65 ms, 47 ms, 37 ms, 30 ms). This is because the same 130GiB is spread out more as you add drives 130 / 1, 130 / 2, 130 / 3 etc. so implicit short stroking is occurring because there is less file on each drive so less head movement required. The best latency is still 30 ms but we have the IOps required now, but that’s on a 130GiB file and not the 400GiB we need. Some reality check here: a) the drive randomness is more likely to be 50/50 and not a full 100% but the above has highlighted the effect randomness has on the drive and the more a drive fills with data the worse the effect. For argument sake let us assume that for the given workload we need 8 disks to do the job, for resilience reasons we will need 16 because we need to RAID 1+0 them in order to get the throughput and the resilience, RAID 5 would degrade performance. Cost for hard drives: 16 x £239.60 = £3,833.60 For the hard drives we will need disk controllers and a separate external disk array because the likelihood is that the server itself won’t take the drives, a quick spec off DELL for a PowerVault MD1220 which gives the dual pathing with 16 disks 146GB 15Krpm 2.5” disks is priced at £7,438.00, note its probably more once we had two controller cards to sit in the server in, racking etc. Minimum cost taking the DELL quote as an example is therefore: {Cost of Hardware} / {Storage Required} £7,438.60 / 400 = £18.595 per GB £18.59 per GiB is a far cry from the £0.39 we had been told by the salesman and the myth. Yes, the storage array is composed of 16 x 146 disks in RAID 10 (therefore 8 usable) giving an effective usable storage availability of 1168GB but the actual storage requirement is only 400 and the extra disks have had to be purchased to get the IOps up. Solid State Drive solution A single card significantly exceeds the IOps and latency required, for resilience two will be required. ( £2,316.54 * 2 ) / 400 = £11.58 per GB With the SSD solution only two PCIe sockets are required, no external disk units, no additional controllers, no redundant controllers etc. Conclusion I hope by showing you an example that the myth that hard disk drives are cheaper per GiB than Solid State has now been dispelled - £11.58 per GB for SSD compared to £18.59 for Hard Disk. I’ve not even touched on the running costs, compare the costs of running 18 hard disks, that’s a lot of heat and power compared to two PCIe cards!Just a quick note: I've left a fair amount of information out due to this being a blog! If in doubt, email me :)I'll also deal with the myth that SSD's wear out at a later date as well - that's just way over done still, yes, 5 years ago, but now - no.

Read the article
SQL SERVER – ASYNC_IO_COMPLETION – Wait Type – Day 11 of 28

- by pinaldave

For any good system three things are vital: CPU, Memory and IO (disk). Among these three, IO is the most crucial factor of SQL Server. Looking at real-world cases, I do not see IT people upgrading CPU and Memory frequently. However, the disk is often upgraded for either improving the space, speed or throughput. Today we will look at another IO-related wait type. From Book On-Line: Occurs when a task is waiting for I/Os to finish. ASYNC_IO_COMPLETION Explanation: Any tasks are waiting for I/O to finish. If by any means your application that’s connected to SQL Server is processing the data very slowly, this type of wait can occur. Several long-running database operations like BACKUP, CREATE DATABASE, ALTER DATABASE or other operations can also create this wait type. Reducing ASYNC_IO_COMPLETION wait: When it is an issue related to IO, one should check for the following things associated to IO subsystem: Look at the programming and see if there is any application code which processes the data slowly (like inefficient loop, etc.). Note that it should be re-written to avoid this wait type. Proper placing of the files is very important. We should check the file system for proper placement of the files – LDF and MDF on separate drive, TempDB on another separate drive, hot spot tables on separate filegroup (and on separate disk), etc. Check the File Statistics and see if there is a higher IO Read and IO Write Stall SQL SERVER – Get File Statistics Using fn_virtualfilestats. Check event log and error log for any errors or warnings related to IO. If you are using SAN (Storage Area Network), check the throughput of the SAN system as well as configuration of the HBA Queue Depth. In one of my recent projects, the SAN was performing really badly and so the SAN administrator did not accept it. After some investigations, he agreed to change the HBA Queue Depth on the development setup (test environment). As soon as we changed the HBA Queue Depth to quite a higher value, there was a sudden big improvement in the performance. It is very likely to happen that there are no proper indexes on the system and yet there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can effectively reduce lots of CPU, Memory and IO (considering cover index has lesser columns than cluster table and all other; it depends upon the situation). You can refer to the following two articles I wrote that talk about how to optimize indexes: Create Missing Indexes Drop Unused Indexes Checking Memory Related Perfmon Counters SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2) SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark) SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system) SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds) Memory: Available Mbytes (Information only) Memory: Page Faults/sec (Benchmark only) Memory: Pages/sec (Benchmark only) Checking Disk Related Perfmon Counters Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good) Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good) Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good) Read all the post in the Wait Types and Queue series. Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussions of Wait Stats in this blog are generic and vary from system to system. It is recommended that you test this on a development server before implementing it to a production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

Read the article
SQL SERVER – IO_COMPLETION – Wait Type – Day 10 of 28

- by pinaldave

For any good system three things are vital: CPU, Memory and IO (disk). Among these three, IO is the most crucial factor of SQL Server. Looking at real-world cases, I do not see IT people upgrading CPU and Memory frequently. However, the disk is often upgraded for either improving the space, speed or throughput. Today we will look at an IO-related wait types. From Book On-Line: Occurs while waiting for I/O operations to complete. This wait type generally represents non-data page I/Os. Data page I/O completion waits appear as PAGEIOLATCH_* waits. IO_COMPLETION Explanation: Any tasks are waiting for I/O to finish. This is a good indication that IO needs to be looked over here. Reducing IO_COMPLETION wait: When it is an issue concerning the IO, one should look at the following things related to IO subsystem: Proper placing of the files is very important. We should check the file system for proper placement of files – LDF and MDF on a separate drive, TempDB on another separate drive, hot spot tables on separate filegroup (and on separate disk),etc. Check the File Statistics and see if there is higher IO Read and IO Write Stall SQL SERVER – Get File Statistics Using fn_virtualfilestats. Check event log and error log for any errors or warnings related to IO. If you are using SAN (Storage Area Network), check the throughput of the SAN system as well as the configuration of the HBA Queue Depth. In one of my recent projects, the SAN was performing really badly so the SAN administrator did not accept it. After some investigations, he agreed to change the HBA Queue Depth on development (test environment) set up and as soon as we changed the HBA Queue Depth to quite a higher value, there was a sudden big improvement in the performance. It is very possible that there are no proper indexes in the system and there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can effectively reduce lots of CPU, Memory and IO (considering cover index has lesser columns than cluster table and all other; it depends upon the situation). You can refer to the two articles that I wrote; they are about how to optimize indexes: Create Missing Indexes Drop Unused Indexes Checking Memory Related Perfmon Counters SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2) SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark) SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system) SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds) Memory: Available Mbytes (Information only) Memory: Page Faults/sec (Benchmark only) Memory: Pages/sec (Benchmark only) Checking Disk Related Perfmon Counters Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good) Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good) Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good) Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussions of Wait Stats in this blog are generic and vary from system to system. It is recommended that you test this on a development server before implementing it to a production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Types, SQL White Papers, T SQL, Technology

Read the article
Merge sort versus quick sort performance

- by Giorgio

I have implemented merge sort and quick sort using C (GCC 4.4.3 on Ubuntu 10.04 running on a 4 GB RAM laptop with an Intel DUO CPU at 2GHz) and I wanted to compare the performance of the two algorithms. The prototypes of the sorting functions are: void merge_sort(const char **lines, int start, int end); void quick_sort(const char **lines, int start, int end); i.e. both take an array of pointers to strings and sort the elements with index i : start <= i <= end. I have produced some files containing random strings with length on average 4.5 characters. The test files range from 100 lines to 10000000 lines. I was a bit surprised by the results because, even though I know that merge sort has complexity O(n log(n)) while quick sort is O(n^2), I have often read that on average quick sort should be as fast as merge sort. However, my results are the following. Up to 10000 strings, both algorithms perform equally well. For 10000 strings, both require about 0.007 seconds. For 100000 strings, merge sort is slightly faster with 0.095 s against 0.121 s. For 1000000 strings merge sort takes 1.287 s against 5.233 s of quick sort. For 5000000 strings merge sort takes 7.582 s against 118.240 s of quick sort. For 10000000 strings merge sort takes 16.305 s against 1202.918 s of quick sort. So my question is: are my results as expected, meaning that quick sort is comparable in speed to merge sort for small inputs but, as the size of the input data grows, the fact that its complexity is quadratic will become evident? Here is a sketch of what I did. In the merge sort implementation, the partitioning consists in calling merge sort recursively, i.e. merge_sort(lines, start, (start + end) / 2); merge_sort(lines, 1 + (start + end) / 2, end); Merging of the two sorted sub-array is performed by reading the data from the array lines and writing it to a global temporary array of pointers (this global array is allocate only once). After each merge the pointers are copied back to the original array. So the strings are stored once but I need twice as much memory for the pointers. For quick sort, the partition function chooses the last element of the array to sort as the pivot and scans the previous elements in one loop. After it has produced a partition of the type start ... {elements <= pivot} ... pivotIndex ... {elements > pivot} ... end it calls itself recursively: quick_sort(lines, start, pivotIndex - 1); quick_sort(lines, pivotIndex + 1, end); Note that this quick sort implementation sorts the array in-place and does not require additional memory, therefore it is more memory efficient than the merge sort implementation. So my question is: is there a better way to implement quick sort that is worthwhile trying out? If I improve the quick sort implementation and perform more tests on different data sets (computing the average of the running times on different data sets) can I expect a better performance of quick sort wrt merge sort? EDIT Thank you for your answers. My implementation is in-place and is based on the pseudo-code I have found on wikipedia in Section In-place version: function partition(array, 'left', 'right', 'pivotIndex') where I choose the last element in the range to be sorted as a pivot, i.e. pivotIndex := right. I have checked the code over and over again and it seems correct to me. In order to rule out the case that I am using the wrong implementation I have uploaded the source code on github (in case you would like to take a look at it). Your answers seem to suggest that I am using the wrong test data. I will look into it and try out different test data sets. I will report as soon as I have some results.

Read the article
Fun tips with Analytics

- by user12620172

If you read this blog, I am assuming you are at least familiar with the Analytic functions in the ZFSSA. They are basically amazing, very powerful and deep. However, you may not be aware of some great, hidden functions inside the Analytic screen. Once you open a metric, the toolbar looks like this: Now, I’m not going over every tool, as we have done that before, and you can hover your mouse over them and they will tell you what they do. But…. Check this out. Open a metric (CPU Percent Utilization works fine), and click on the “Hour” button, which is the 2nd clock icon. That’s easy, you are now looking at the last hour of data. Now, hold down your ‘Shift’ key, and click it again. Now you are looking at 2 hours of data. Hold down Shift and click it again, and you are looking at 3 hours of data. Are you catching on yet? You can do this with not only the ‘Hour’ button, but also with the ‘Minute’, ‘Day’, ‘Week’, and the ‘Month’ buttons. Very cool. It also works with the ‘Show Minimum’ and ‘Show Maximum’ buttons, allowing you to go to the next iteration of either of those. One last button you can Shift-click is the handy ‘Drill’ button. This button usually drills down on one specific aspect of your metric. If you Shift-click it, it will display a “Rainbow Highlight” of the current metric. This works best if this metric has many ‘Range Average’ items in the left-hand window. Give it a shot. Also, one will sometimes click on a certain second of data in the graph, like this: In this case, I clicked 4:57 and 21 seconds, and the 'Range Average' on the left went away, and was replaced by the time stamp. It seems at this point to some people that you are now stuck, and can not get back to an average for the whole chart. However, you can actually click on the actual time stamp of "4:57:21" right above the chart. Even though your mouse does not change into the typical browser finger that most links look like, you can click it, and it will change your range back to the full metric. Another trick you may like is to save a certain view or look of a group of graphs. Most of you know you can save a worksheet, but did you know you could Sync them, Pause them, and then Save it? This will save the paused state, allowing you to view it forever the way you see it now. Heatmaps. Heatmaps are cool, and look like this: Some metrics use them and some don't. If you have one, and wish to zoom it vertically, try this. Open a heatmap metric like my example above (I believe every metric that deals with latency will show as a heatmap). Select one or two of the ranges on the left. Click the "Change Outlier Elimination" button. Click it again and check out what it does. Enjoy. Perhaps my next blog entry will be the best Analytic metrics to keep your eyes on, and how you can use the Alerts feature to watch them for you. Steve

Read the article
Generic Adjacency List Graph implementation

- by DmainEvent

I am trying to come up with a decent Adjacency List graph implementation so I can start tooling around with all kinds of graph problems and algorithms like traveling salesman and other problems... But I can't seem to come up with a decent implementation. This is probably because I am trying to dust the cobwebs off my data structures class. But what I have so far... and this is implemented in Java... is basically an edgeNode class that has a generic type and a weight-in the event the graph is indeed weighted. public class edgeNode<E> { private E y; private int weight; //... getters and setters as well as constructors... } I have a graph class that has a list of edges a value for the number of Vertices and and an int value for edges as well as a boolean value for whether or not it is directed. The brings up my first question, if the graph is indeed directed, shouldn't I have a value in my edgeNode class? Or would I just need to add another vertices to my LinkedList? That would imply that a directed graph is 2X as big as an undirected graph wouldn't it? public class graph { private List<edgeNode<?>> edges; private int nVertices; private int nEdges; private boolean directed; //... getters and setters as well as constructors... } Finally does anybody have a standard way of initializing there graph? I was thinking of reading in a pipe-delimited file but that is so 1997. public graph GenereateGraph(boolean directed, String file){ List<edgeNode<?>> edges; graph g; try{ int count = 0; String line; FileReader input = new FileReader("C:\\Users\\derekww\\Documents\\JavaEE Projects\\graphFile"); BufferedReader bufRead = new BufferedReader(input); line = bufRead.readLine(); count++; edges = new ArrayList<edgeNode<?>>(); while(line != null){ line = bufRead.readLine(); Object edgeInfo = line.split("|")[0]; int weight = Integer.parseInt(line.split("|")[1]); edgeNode<String> e = new edgeNode<String>((String) edges.add(e); } return g; } catch(Exception e){ return null; } } I guess when I am adding edges if boolean is true I would be adding a second edge. So far, this all depends on the file I write. So if I wrote a file with the following Vertices and weights... Buffalo | 18 br Pittsburgh | 20 br New York | 15 br D.C | 45 br I would obviously load them into my list of edges, but how can I represent one vertices connected to the other... so on... I would need the opposite vertices? Say I was representing Highways connected to each city weighted and un-directed (each edge is bi-directional with weights in some fictional distance unit)... Would my implementation be the best way to do that? I found this tutorial online Graph Tutorial that has a connector object. This appears to me be a collection of vertices pointing to each other. So you would have A and B each with there weights and so on, and you would add this to a list and this list of connectors to your graph... That strikes me as somewhat cumbersome and a little dismissive of the adjacency list concept? Am I wrong and that is a novel solution? This is all inspired by steve skiena's Algorithm Design Manual. Which I have to say is pretty good so far. Thanks for any help you can provide.

Read the article
XML generation with java, trying to copy the whole node

- by Pawel Mysior

I've got an xml document that filled with people (parent node is "students", and there are 25+ "student" nodes). Each student looks like this: <student> <name></name> <surname></surname> <grades> <subject name=""> <small_grades></small_grades> <final_grade></final_grade> </subject> <subject name=""> <small_grades></small_grades> <final_grade></final_grade> </subject> </grades> <average></average> </student> Basically, what I want to do ('ve been asked to do) is to make a program that would get 3 students with the best average. While parsing the document and getting three best students isn't too difficult, the XML generation is a pain in the ass. Right now, what I'm doing is getting every single node from student and recreating it to a new file. Is there a way to copy the whole student node with everything that's in it? Regards, Paul

Read the article
My OpenCL kernel is slower on faster hardware.. But why?

- by matdumsa

Hi folks, As I was finishing coding my project for a multicore programming class I came up upon something really weird I wanted to discuss with you. We were asked to create any program that would show significant improvement in being programmed for a multi-core platform. I’ve decided to try and code something on the GPU to try out OpenCL. I’ve chosen the matrix convolution problem since I’m quite familiar with it (I’ve parallelized it before with open_mpi with great speedup for large images). So here it is, I select a large GIF file (2.5 MB) [2816X2112] and I run the sequential version (original code) and I get an average of 15.3 seconds. I then run the new OpenCL version I just wrote on my MBP integrated GeForce 9400M and I get timings of 1.26s in average.. So far so good, it’s a speedup of 12X!! But now I go in my energy saver panel to turn on the “Graphic Performance Mode” That mode turns off the GeForce 9400M and turns on the Geforce 9600M GT my system has. Apple says this card is twice as fast as the integrated one. Guess what, my timing using the kick-ass graphic card are 3.2 seconds in average… My 9600M GT seems to be more than two times slower than the 9400M.. For those of you that are OpenCL inclined, I copy all data to remote buffers before starting, so the actual computation doesn’t require roundtrip to main ram. Also, I let OpenCL determine the optimal local-worksize as I’ve read they’ve done a pretty good implementation at figuring that parameter out.. Anyone has a clue? edit: full source code with makefiles here http://www.mathieusavard.info/convolution.zip cd gimage make cd ../clconvolute make put a large input.gif in clconvolute and run it to see results

Read the article
Temperature anomaly calculation of time series data

- by neel

I have a time series like following: Data <- structure(list(Year = c(1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1991L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L, 1992L), Month = c(8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L), Day = c(30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 28L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L, 10L, 20L, 30L), Hour = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), temperature = c(72.5, 64, 62.5, 64, 64, 53, 52, 52, 45.5, 49, 50, 50, 59, 63.5, 69.5, 61, 61, NaN, NaN, 39.5, 37, 45.5, 45, 39, 43.5, 52, 53, 56, 64, 66, 66.5, 73.5, 81, 85, 89.5, 87.5, 88.5, 83, 84.5, 74, 60.5, 59, 53, 60.5, 62.5, 64.5, 63, 62, 65.5)), .Names = c("Year", "Month", "Day", "Hour", "temperature"), class = "data.frame", row.names = c(NA, -49L)) and I have to calculate standardized anomaly. The steps to calculate the anomalies are following: Monthly premature departures from the long-term (1991-2007) average are obtained. Then standardized by dividing by the standard deviation of monthly temperature. The standardized monthly anomalies are then weighted by multiplying by the fraction of the average temperature for the given month. These weighted anomalies are then summed over 3 month time period. Can you please help me?

Read the article
Get percentiles of data-set with group by month

- by Cylindric

Hello, I have a SQL table with a whole load of records that look like this: | Date | Score | + -----------+-------+ | 01/01/2010 | 4 | | 02/01/2010 | 6 | | 03/01/2010 | 10 | ... | 16/03/2010 | 2 | I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely. Now, what I need to do is include the average score on the chart, so we can see how that changes over time, so I can simply add this to the mix: SELECT YEAR(SCOREDATE) 'Year', MONTH(SCOREDATE) 'Month', MIN(SCORE) MinScore, AVG(SCORE) AverageScore, MAX(SCORE) MaxScore FROM SCORES GROUP BY YEAR(SCOREDATE), MONTH(SCOREDATE) ORDER BY YEAR(SCOREDATE), MONTH(SCOREDATE) That's no problem so far. The problem is, how can I easily calculate the percentiles at each time-period? I'm not sure that's the correct phrase. What I need in total is: A line on the chart for the score (easy) A line on the chart for the average (easy) A line on the chart showing the band that 95% of the scores occupy (stumped) It's the third one that I don't get. I need to calculate the 5% percentile figures, which I can do singly: SELECT MAX(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE ASC) AS SubQ SELECT MIN(SubQ.SCORE) FROM (SELECT TOP 45 PERCENT SCORE FROM SCORES WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1 ORDER BY SCORE DESC) AS SubQ But I can't work out how to get a table of all the months. | Date | Average | 45% | 55% | + -----------+---------+-----+-----+ | 01/01/2010 | 13 | 11 | 15 | | 02/01/2010 | 10 | 8 | 12 | | 03/01/2010 | 5 | 4 | 10 | ... | 16/03/2010 | 7 | 7 | 9 | At the moment I'm going to have to load this lot up into my app, and calculate the figures myself. Or run a larger number of individual queries and collate the results.

Read the article
High Runtime for Dictionary.Add for a large amount of items

- by aaginor

Hi folks, I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed. Here is my function: private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections) { Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count); foreach (NodeConnection con in connections) { ... resultSet.Add(nodeIdPair, newEdge); } return resultSet; } In my tests, I insert ~300k items. I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add). This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item. Any idea, how to make the add-operation faster? Thanks in advance, Frank

Read the article
MySQL Ratings From Two Tables

- by DirtyBirdNJ

I am using MySQL and PHP to build a data layer for a flash game. Retrieving lists of levels is pretty easy, but I've hit a roadblock in trying to fetch the level's average rating along with it's pointer information. Here is an example data set: levels Table: level_id | level_name 1 | Some Level 2 | Second Level 3 | Third Level ratings Table: rating_id | level_id | rating_value 1 | 1 | 3 2 | 1 | 4 3 | 1 | 1 4 | 2 | 3 5 | 2 | 4 6 | 2 | 1 7 | 3 | 3 8 | 3 | 4 9 | 3 | 1 I know this requires a join, but I cannot figure out how to get the average rating value based on the level_id when I request a list of levels. This is what I'm trying to do: SELECT levels.level_id, AVG(ratings.level_rating WHERE levels.level_id = ratings.level_id) FROM levels I know my SQL is flawed there, but I can't figure out how to get this concept across. The only thing I can get to work is returning a single average from the entire ratings table, which is not very useful. Ideal Output from the above conceptually valid but syntactically awry query would be: level_id | level_rating 1| 3.34 2| 1.00 3| 4.54 My main issue is I can't figure out how to use the level_id of each response row before the query has been returned. It's like I want to use a placeholder... or an alias... I really don't know and it's very frustrating. The solution I have in place now is an EPIC band-aid and will only cause me problems long term... please help!

Read the article
C : files manipulation Can't figure out how to simplify this code with files manipulation.

- by Bon_chan

Hey guys, I have been working on this code but I can't find out what is wrong. This program does compile and run but it ends up having a fatal error. I have a file called myFile.txt, with the following content : James------ 07.50 Anthony--- 17.00 And here is the code : int main() { int n =2, valueTest=0,count=0; FILE* file = NULL; float temp= 00.00f, average= 00.00f, flTen = 10.00f; float *totalNote = (float*)malloc(n*sizeof(float)); int position = 0; char selectionNote[5+1], nameBuffer[10+1], noteBuffer[5+1]; file = fopen("c:\\myFile.txt","r"); fseek(file,10,SEEK_SET); while(valueTest<2) { fscanf(file,"%5s",&selectionNote); temp = atof(selectionNote); totalNote[position]= temp; position++; valeurTest++; } for(int counter=0;counter<2;counter++) { average += totalNote[counter]; } printf("The total is : %f \n",average); rewind(file); printf("here is the one with less than 10.00 :\n"); while(count<2) { fscanf(file,"%10s",&nameBuffer); fseek(file,10,SEEK_SET); fscanf(file,"%5s",&noteBuffer); temp = atof(noteBuffer); if(temp<flTen) { printf("%s who has %f\n",nameBuffer,temp); } fseek(file,1,SEEK_SET); count++; } fclose(file); } I am pretty new to c and find it more difficult than c# or java. And I woud like to get some suggestions to help me to get better. I think this code could be simplier. Do you think the same ?

Read the article
C++ new memory allocation fragmentation

- by tamulj

I was trying to look at the behavior of the new allocator and why it doesn't place data contiguously. My code: struct ci { char c; int i; } template <typename T> void memTest() { T * pLast = new T(); for(int i = 0; i < 20; ++i) { T * pNew = new T(); cout << (pNew - pLast) << " "; pLast = pNew; } } So I ran this with char, int, ci. Most allocations were a fixed length from the last, sometimes there were odd jumps from one available block to another. sizeof(char) : 1 Average Jump: 64 bytes sizeof(int): 4 Average Jump: 16 sizeof(ci): 8 (int has to be placed on a 4 byte align) Average Jump: 9 Can anyone explain why the allocator is fragmenting memory like this? Also why is the jump for char so much larger then ints and a structure that contains both an int and char.

Read the article
Array loading with doubles in C

- by user2892120

I am trying to load a 3x8 array of doubles but my code keeps outputting 0.00 for all of the values. The code should be outputting the array (same as the input) under the Read#1 Read#2 Read#3 lines, with the average under average. Here is my code: #include <stdio.h> double getAvg(double num1, double num2, double num3); int main() { int numJ,month,day,year,i,j; double arr[3][8]; scanf("%d %d %d %d",&numJ,&month,&day,&year); for (i = 0; i < 8; i++) { scanf("%f %f %f",&arr[i][0], &arr[i][1], &arr[i][2]); } printf("\nJob %d Date: %d/%d/%d",numJ,month,day,year); printf("\n\nLocation Read#1 Read#2 Read#3 Average"); for (j = 0; j < 8; j++) { printf("\n %d %.2f %.2f %.2f %.2f",j+1,arr[j][0],arr[j] [1],arr[j][2],getAvg(arr[j][0],arr[j][1],arr[j][2])); } return 0; } double getAvg(double num1, double num2, double num3) { double avg = (num1 + num2 + num3) / 3; return avg; } Input example: 157932 09 01 2013 0.00 0.00 0.00 0.36 0.27 0.23 0.18 0.16 0.26 0.27 0.00 0.34 0.24 0.00 0.31 0.16 0.33 0.36 0.29 0.36 0.00 0.21 0.36 0.00

Read the article
The most efficient way to calculate an integral in a dataset range

- by Annalisa

I have an array of 10 rows by 20 columns. Each columns corresponds to a data set that cannot be fitted with any sort of continuous mathematical function (it's a series of numbers derived experimentally). I would like to calculate the integral of each column between row 4 and row 8, then store the obtained result in a new array (20 rows x 1 column). I have tried using different scipy.integrate modules (e.g. quad, trpz,...). The problem is that, from what I understand, scipy.integrate must be applied to functions, and I am not sure how to convert each column of my initial array into a function. As an alternative, I thought of calculating the average of each column between row 4 and row 8, then multiply this number by 4 (i.e. 8-4=4, the x-interval) and then store this into my final 20x1 array. The problem is...ehm...that I don't know how to calculate the average over a given range. The question I am asking are: Which method is more efficient/straightforward? Can integrals be calculated over a data set like the one that I have described? How do I calculate the average over a range of rows?

Read the article
How to adjust and combine multiple lower quality photos into one better using FOSS?

- by Vi

I have multiple noisy photos (caputed without tripod) that needs to be adjusted (moved/rotated) and averaged. How it's better to do it in Linux with FOSS console-based programs? Current way is something like: mplayer mf://*.JPG -vo yuv4mpeg:file=qqq.yuv transcode -i qqq.yuv -y null -J stabilize=maxshift=500:fieldsize=100:fieldnum=6:stepsize=50:shakiness=10 transcode -i qqq.yuv -J transform=smoothing=100000:sharpen=0:optzoom=0 -y raw -o www.yuv mplayer www.yuv -vo pnm gm convert -average 0*.ppm q.ppm i.e.: Convert photos to video Apply Transcode's "Stabilize" filter Convert the video back to images Average the images. It works, but bad: photos still not perfectly adjusted and the whole sequence is very slow. What is better way of doing it?

Read the article
Brightness keeps changing in Windows 8.1 (on Macbook Pro Retina)

- by gzak

Before anyone gets too excited, it's not the "Adaptive Brightness" feature of the OS. I've already turned that off. Also it seems to have nothing to do with ambient light. It actually seems to do with the average "color" of the display. If I'm working in dark-themed Visual Studio, the brightness "pops" brighter. When I switch to the browser, it "pops" darker. So it's kind of adaptive brightness based on average pixel color (or something like that). What makes it rather annoying is that the brightness pops, rather than transitioning gradually. What is this feature, and how do I disable it (or at least make it smoother)?

Read the article
How to monitor current output/receive queue length in Linux

- by IZhen

I want to check the capacity and performance of my network. Besides checking the txkB/s and rxkB/s via Sar, I'd also like to see the average queue length of the network interface(so that the average queueing time in the interface can be calculated). It seems that netstat can give a per socket queue length, is it possible to get a per interface statics(a bit like Network Interface\Output Queue Length in Windows)? A related and kind of reverse questions is How do I view the TCP Send and Receive Queue sizes on Windows? Thanks

Read the article
Ping computername - result format

- by kamleshrao

Hi, I am trying PING command on my Windows 7 PC after many months. While doing this, I notice the following result: Ping using computer name: D:\>ping amdwin764 Pinging AMDWIN764 [fe80::ac53:546f:a730:8bd6%11] with 32 bytes of data: Reply from fe80::ac53:546f:a730:8bd6%11: time=1ms Reply from fe80::ac53:546f:a730:8bd6%11: time=1ms Reply from fe80::ac53:546f:a730:8bd6%11: time=1ms Reply from fe80::ac53:546f:a730:8bd6%11: time=1ms Ping statistics for fe80::ac53:546f:a730:8bd6%11: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 1ms, Average = 1ms Ping using IP address: D:\>ping 192.168.1.2 Pinging 192.168.1.2 with 32 bytes of data: Reply from 192.168.1.2: bytes=32 time=75ms TTL=128 Reply from 192.168.1.2: bytes=32 time=1ms TTL=128 Reply from 192.168.1.2: bytes=32 time=1ms TTL=128 Reply from 192.168.1.2: bytes=32 time=1ms TTL=128 Ping statistics for 192.168.1.2: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 75ms, Average = 19ms Why am I not getting the Ping results with Numeric IP address in my first example? Thanks, Kamlesh

Read the article
Configuring home wireless network

- by dvanaria

I'm new to setting up a home wireless network. I have Comcast tv/internet/phone service (modem included) as well as a wireless router. My question is pretty basic. How can I tell the performance of the following parts of the network? 1. incoming internet speed 2. speed of the modem 3. speed of the wireless router I basically want as fast an internet connection as possible, of course, but I'm not sure where to look for the bottleneck (and so, not sure where I can spend some money to speed things up). Right now I'm getting about 36 Mbps (as it shows in Windows). If I run an online speed test (xfinity has one) it shows Average download speed of 14.91 Mbps and Average upload speed of 5.72 Mbps. Thanks for your help.

Read the article

Search Results

Search found 2156 results on 87 pages for 'weighted average'.

Page 16/87 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >

- by D-Rock

- by Brad Dwyer

- by Mark Virtue

- by Testas

- by tonyrogerson

- by pinaldave

- by pinaldave

- by Giorgio

- by user12620172

- by DmainEvent

- by Pawel Mysior

- by matdumsa

- by neel

- by Cylindric

- by aaginor

- by DirtyBirdNJ

- by Bon_chan

- by tamulj

- by user2892120

- by Annalisa

- by Vi

- by gzak

- by IZhen

- by kamleshrao

- by dvanaria

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >