per magnusson - Page 28

Over gigabit connection, Teracopy does 31MB/s, but Windows 8 does it at ~109MB per second?

- by Gaurang

I got my brain-melting first taste of Gigabit networking today, between my 2011 MacMini and Windows 8 Pro desktop connected via Cat.5e to Linksys WRT320N(sporting dd-WRT). After making sure that the line speed on both systems showed 1Gbps, I proceeded to copying a 2.4GB MP4 from the Mini to the Win 8 desktop (SMB sharing). Although satisfied with the 30-34 MB/s that Teracopy was showing (that was a proper step-up for me from 10 MB/s), I still was curious about this massive difference in the advertised and real-world speed. 2 hours of Google had me believing that there were other factors that resulted in less speed, SMB being one. So just for the sake of doing it, I iPerf'd both the systems and guess what that showed - around 875mbps on both systems! I then stumbled upon this little piece of info after which I turned off Teracopy and copied the same file through Windows 8's regular copier. 109 MB/s. Molten brains :) What exactly is causing this? And can I enable such speeds via Teracopy? I really dig the extra features that Teracopy has, will surely miss them now :D

Read the article

Per-character-set font size in Firefox not working?

- by Coderer

Firefox has a setting (Preferences - Content - Fonts & Colors - Advanced) that is supposed to let you set font preferences for different character sets. I've tried setting larger minimum font sizes for some non-Western character sets (I'm still learning, and have to see extra detail to tell them apart!) and nothing seems to happen. For example, if there's Hangul on a page (like this one), it will show in the same size as the Latin characters around it, even if I set "minimum font size" to 24. Am I misunderstanding how that setting is supposed to work, or does it just not do anything? Is there any other way to blow up only non-Western characters while leaving the letters I know how to read intact?

Read the article

Amazon S3: allow users to upload on a restricted basis (per bucket maybe)?

- by Tom

Hi there, I'm thinking about signing up to the Amazon S3 storage service. What I want to do is create a service where other people can register their own bucket with a certain amount of storage. These users will install my software, which then uploads their files. Of course, the users may only upload what they have paid for. For this to work I would like to create a separate bucket for each customer, each with its own properties. Question 1: is this possible with the API? How? This means that the installed software must have the rights needed to upload to my Amazon S3 account. Question 2: can I create individual authentication IDs for each bucket or customer, so that they can only upload with restrictions I have set? Thanks in advance.

Read the article

How to convert a 1 page PDF to a 2 page per sheet PDF?

- by mokasin

I would like to print a PDF so that on the front of the first page are the first two pages, on the back the 3rd and 4th and so on. ----------------- ----------------- | | | | | | | | | | | | | 1 | 2 | | 3 | 4 | . . . | | | | | | |_______|_______| |_______|_______| page 1 - front page 1 - front Because my printer using Linux fails to support manual duplex printing I'd thought, maybe I could edit the pdf in a according way. But how?

Read the article

Can Apache be configured to specify more than one docroot per virtualhost?

- by syn4k

I have a vhost which specifies <VirtualHost *:80> DocumentRoot "/private/var/www/html/cms/sites/" ServerName localhost.com </VirtualHost> I would like to know if localhost.com can also point to /private/var/www/html/wordpress/. This seems like a no brainer but Apache is like black magic; these things are always possible. Anyway, I already know that I could specify a new ServerName entry and set a new docroot. The problem is, both directories need to be available as roots. If I need to provide more info, I will gladly do so.

Read the article

Exchange 2007: Is there a way to get average number of attachements per day by file type?

- by Bob

I'm curious about the file formats of attachments moving through an Exchange 2007 mail environment. For instance the number of .doc vs. .docx files on average. Exchange Profile Analyzer seems more geared to message size than anything else. Is there anything that can provide this information without being hugely intrusive?

Read the article

Can per-user randomized salts be replaced with iterative hashing?

- by Chas Emerick

In the process of building what I'd like to hope is a properly-architected authentication mechanism, I've come across a lot of materials that specify that: user passwords must be salted the salt used should be sufficiently random and generated per-user ...therefore, the salt must be stored with the user record in order to support verification of the user password I wholeheartedly agree with the first and second points, but it seems like there's an easy workaround for the latter. Instead of doing the equivalent of (pseudocode here): salt = random(); hashedPassword = hash(salt . password); storeUserRecord(username, hashedPassword, salt); Why not use the hash of the username as the salt? This yields a domain of salts that is well-distributed, (roughly) random, and each individual salt is as complex as your salt function provides for. Even better, you don't have to store the salt in the database -- just regenerate it at authentication-time. More pseudocode: salt = hash(username); hashedPassword = hash(salt . password); storeUserRecord(username, hashedPassword); (Of course, hash in the examples above should be something reasonable, like SHA-512, or some other strong hash.) This seems reasonable to me given what (little) I know of crypto, but the fact that it's a simplification over widely-recommended practice makes me wonder whether there's some obvious reason I've gone astray that I'm not aware of.

Read the article

Would it be possible to have a UTF-8-like encoding limited to 3 bytes per character?

- by dan04

UTF-8 requires 4 bytes to represent characters outside the BMP. That's not bad; it's no worse than UTF-16 or UTF-32. But it's not optimal (in terms of storage space). There are 13 bytes (C0-C1 and F5-FF) that are never used. And multi-byte sequences that are not used such as the ones corresponding to "overlong" encodings. If these had been available to encode characters, then more of them could have been represented by 2-byte or 3-byte sequences (of course, at the expense of making the implementation more complex). Would it be possible to represent all 1,114,112 Unicode code points by a UTF-8-like encoding with at most 3 bytes per character? If not, what is the maximum number of characters such an encoding could represent? By "UTF-8-like", I mean, at minimum: The bytes 0x00-0x7F are reserved for ASCII characters. Byte-oriented find / index functions work correctly. You can't find a false positive by starting in the middle of a character like you can in Shift-JIS.

Read the article

Best practices, PHP, tracking millions of impressions per day.

- by John

What do I have to do to make 20k mysql inserts per second possible (during peak hours around 1k/sec during slower times)? I've been doing some research and I've seen the "INSERT DELAYED" suggestion, writing to a flat file, "fopen(file,'a')", and then running a chron job to dump the "needed" data into mysql, etc. I've also heard you need multiple servers and "load balancers" which I've never heard of, to make something like this work. I've also been looking at these "cloud server" thing-a-ma-jigs, and their automatic scalability, but not sure about what's actually scalable. The application is just a tracker script, so if I have 100 websites that get 3 million page loads a day, there will be around 300 million inserts a day. The data will be ran through a script that will run every 15-30 minutes which will normalize the data and insert it into another mysql table. How do the big dogs do it? How do the little dogs do it? I can't afford a huge server anymore so any intuitive ways, if there are multiple ways of going at it, you smart people can think of.. please let me know :)

Read the article

Is there a way to parse XML via SAX/DOM with line numbers available per node.

- by Chris

I already have written a DOM parser for a large XML document format that contains a number of items that can be used to automatically generate Java code. This is limited to small expressions that are then merged into a dynamically generated Java source file. So far - so good. Everything works. BUT - I wish to be able to embed the line number of the XML node where the Java code was included from (so that if the configuration contains uncompilable code, each method will have a pointer to the source XML document and the line number for ease of debugging). I don't require the line number at parse-time and I don't need to validate the XML Source Document and throw an error at a particular line number. I need to be able to access the line number for each node and attribute in my DOM or per SAX event. Any suggestions on how I might be able to achieve this? P.S. Also, I read the StAX has a method to obtain line number whilst parsing, but ideally I would like to achieve the same result with regular SAX/DOM processing in Java 4/5 rather than become a Java 6+ application or take on extra .jar files.

Read the article

Using ASP.NET SQL Membership Provider, how do I store my own per-user data?

- by Gary McGill

I'm using the ASP.NET SQL Membership Provider. So, there's an aspnet_Users table that has details of each of my users. (Actually, the aspnet_Membership table seems to contain most of the actual data). I now want to store some per-user information in my database, so I thought I'd just create a new table with a UserId (GUID) column and an FK relationship to aspnet_Users. However, I then discovered that I can't easily get access to the UserId since it's not exposed via the membership API. (I know I can access it via the ProviderUserKey, but it seems like the API is abstracting away the internal UserID in favor of the UserName, and I don't want to go too far against the grain). So, I thought I should instead put a LoweredUserName column in my table, and create an FK relationship to aspnet_Users using that. Bzzzt. Wrong again, because while there is a unique index in aspnet_Users that includes the LoweredUserName, it also includes the ApplicationId - so in order to create my FK relationship, I'd need to have an ApplicationId column in my table too. At first I thought: fine, I'm only dealing with a single application, so I'll just add such a column and give it a default value. Then I realised that the ApplicationId is a GUID, so it'd be a pain to do this. Not hard exactly, but until I roll out my DB I can't predict what the GUID is going to be. I feel like I'm missing something, or going about things the wrong way. What am I supposed to do?

Read the article

How to disable MSBuild's <RegisterOutput> target on a per-user basis?

- by Roger Lipscombe

I like to do my development as a normal (non-Admin) user. Our VS2010 project build fails with "Failed to register output. Please try enabling Per-user Redirection or register the component from a command prompt with elevated permissions." Since I'm not at liberty to change the project file, is there any way that I can add user-specific MSBuild targets or properties that disable this step on a specific machine, or for a specific user? I'd prefer not to hack on the core MSBuild files. I don't want to change the project file because I might then accidentally check it back in. Nor do I want to hack on the MSBuild core files, because they might get overwritten by a service pack. Given that the Visual C++ project files (and associated .targets and .props files) have about a million places to alter the build order and to import arbitrary files, I was hoping for something along those lines. MSBuild imports/evaluates the project file as follows (I've only looked down the branches that interest me): Foo.vcxproj Microsoft.Cpp.Default.props Microsoft.Cpp.props $(UserRootDir)\Microsoft.Cpp.$(Platform).user.props Microsoft.Cpp.targets Microsoft.Cpp.$(Platform).targets ImportBefore\* Microsoft.CppCommon.targets The "RegisterOutput" target is defined in Microsoft.CppCommon.targets. I was hoping to replace this by putting a do-nothing "RegisterOutput" target in $(UserRootDir)\Microsoft.Cpp.$(Platform).user.props, which is %LOCALAPPDATA%\MSBuild\v4.0\Microsoft.Cpp.Win32.user.props (UserRootDir is set in Microsoft.Cpp.Default.props if it's not already set). Unfortunately, MSBuild uses the last-defined target, which means that mine gets overridden by the built-in one. Alternatively, I could attempt to set the %(Link.RegisterOutput) metadata, but I'd have to do that on all Link items. Any idea how to do that, or even if it'll work?

Read the article

In an Android application, should I have one content provider per table or only one for the entire a

- by Andrew Dyer

I have years of experience with Microsoft .NET development (primarily C#) and have been working to come up to speed on Android and Java. So far, I've built a small application with a couple screens and a working content provider. All of the examples I've seen for developing content providers typically work with a single table, so I got the impression that this was the convention. I built a couple more content providers for other tables and ran into the "Unknown URI" IllegalArgumentException when I tried to test them. The exception is being thrown by one of my content providers, but not the one I was intending to call. It appears that my application is using the first content provider in the AndroidManifest.xml file, which now has me wondering if I should only have a single content provider for the entire application. Are there any best practices and/or examples for working with multiple tables in an Android application? Should I have one content provider per table or only one for the entire application? If the former, how do I resolve URIs to the proper provider? If the latter, how do I keep my content provider code from being polluted with switch statements?

Read the article

Why are a visual studio project's command-line settings stored per user? Is it OK to check-in (and

- by DanO

We're creating an application that understands some command-line parameters. There are some default's we would like to supply on the command-line when debugging, and these are easily set in the project settings as explained here. The thing is visual studio stores these settings in a *.csproj.user file, and the default settings for integrated source control do not check-in *.user files. We would like to just have these default command-line parameters in everyone's IDE when debugging this project. Often (but not always) when visual studio guides you into doing things a certain way it is for good reason. We probably don't want to just check-in someone's .csproj.user file... right? This question is has a few parts: Why does Visual Studio store this particular setting per user? Is there a way to alter this behavior? - Would doing so bring bad juju? Under these circumstances is it OK to check-in and share a .user file? Is there a better way to accomplish what we are trying to do here? Thank you -

Read the article

Fluent NHibernate: Example of a one-to-many relationship on an abstract class of a table-per-subclas

- by BigTommy79

Hi All, I've been trying for ages to find an example (because I can't get it to work myself) of the correct mapping for a one-to-many relationship on an abstract class of a table-per-subclass implementation, in fluent nHibernate. An example below: I'm looking to map the list of Fines on the Debt abstract base class to the Fine class. if anyone knows of any tutorial or example they've come across before please let me know. Thanks, Tim public abstract class Entity { public int Id { get; set; } } public abstract class Debt : Entity { public decimal Balance { get; set; } public IList<Fine> Fines { get; set; } public Debt() { Fines = new List<Fine>(); } } public class CarLoan : Debt { } public class CreditCard : Debt { } public class LoanApplication : Entity { public IList<Debt> ExistingDebts { get; set; } public LoanApplication() { ExistingDebts = new List<Debt>(); } } public class Fine { public Int64 Cash { get; set; } }

Read the article

Is there a way to specify a per-host deploy_to path with Capistrano?

- by Chad Johnson

I have searched and searched and asked a question already and have not received a clear answer. I have the following deploy script (snippet): set :application, "testapplication" set :repository, "ssh://domain.com//srv/hg/#{application}" set :scm, :mercurial set :deploy_to, "/srv/www/#{application}" role :web, "domain1.com", "domain2.com" role :app, "domain1.com", "domain2.com" role :db, "domain1.com", :primary => true, :norelease => true role :db, "domain2.com", :norelease => true As you see, I have set deploy_to to a specific path. And, I also have specified multiple web servers. However, each web server should have a different deployment path. I want to be able to run "cap deploy" and deploy to all hosts in one shot. I am NOT trying to deploy to staging and then to production. This is all production. My question is: how exactly do I specify a path per server? I have read the "Roles" documentation for Capistrano, and this is unclear. Can someone please post a deploy file example? I have read the documentation, and it is unclear how to do this. Does anyone know? Am I crazy? Am I thinking of this wrong or something? No answers anywhere online. Nowhere. Nothing. Please, someone help.

Read the article

Backing up my Windows Home Server to the Cloud…

- by eddraper

Ok, here’s my scenario: Windows Home Server with a little over 3TB of storage. This includes many years of our home network’s PC backups, music, videos, etcetera. I’d like to get a backup off-site, and the existing APIs and apps such as CloudBerry Labs WHS Backup service are making it easy. Now, all it’s down to is vendor and the cost of the actual storage. So, I thought I’d take a lazy Saturday morning and do some research on this and get the ball rolling. What I discovered stunned me… First off, the pricing for just about everything was loaded with complexity. I learned that it wasn’t just about storage… it was about network usage, requests, sites, replication, and on and on. I really don’t see this as rocket science. I have a disk image. I want to put it in the cloud. I’m not going to be be using it but once daily for incremental backups. Sounds like a common scenario. Yes, if “things get real” and my server goes down, I will need to bring down a lot of data and utilize a fair amount of vendor infrastructure. However, this may never happen. Offsite storage is an insurance policy. The complexity of the cost structures, perhaps by design, create an environment where it’s incredibly hard to model bottom line costs and compare vendor all-up pricing. As it is a “lazy Saturday morning,” I’m not in the mood for such antics and I decide to shirk the endeavor entirely. Thus, I decided to simply fire up calc.exe and do some a simple arithmetic model based on price per GB. I shuddered at the results. Certainly something was wrong… did I misplace a decimal point? Then I discovered CloudBerry’s own calculator. Nope, I hadn’t misplaced those decimals after all. Check it out (pricing based on 3174 GB): Amazon S3 $398.00 per month $4761 per year Azure $396.75 per month $4761 per year Google $380.88 per month $4570.56 per year Conclusion: Rampant crack smoking at vendors. Seriously. Out. Of. Their. Minds. Now, to Amazon’s credit, vision, and outright common sense, they had one offering which directly addresses my scenario: Amazon Glacier $31.74 per month $380.88 per year hmmm… It’s on the table. Let’s see what it would cost to just buy some drives, an enclosure and cart them over to a friend’s house. 2 x 2TB Drives from NewEgg.com $199.99 Enclosure $39.99 $239.98 Carting data to back and forth to friend’s within walking distance pain Leave drive unplugged at friend’s $0 for electricity Possible data loss No way I can come and go every day. I think I’ll think on this a bit more…

Read the article

SQL 2012 Licensing Thoughts

- by Geoff N. Hiten

The only thing more controversial than new Federal Tax plans is new Licensing plans from Microsoft. In both cases, everyone calculates several numbers. First, will I pay more or less under this plan? Second, will my competition pay more or less than now? Third, will <insert interesting person/company here> pay more or less? Not that items 2 and 3 are meaningful, that is just how people think. Much like tax plans, the devil is in the details, so lets see how this looks. Microsoft shows it here: http://www.microsoft.com/sqlserver/en/us/future-editions/sql2012-licensing.aspx First up is a switch from per-socket to per-core licensing. Anyone who didn’t see something like this coming should rapidly search for a new line of work because you are not paying attention. The explosion of multi-core processors has made SQL Server a bargain. Microsoft is in business to make money and the old per-socket model was not going to do that going forward. Per-core licensing also simplifies virtualization licensing. Physical Core = Virtual Core, at least for licensing. Oversubscribe your processors, that’s your lookout. You still pay for what is exposed to the VM. The cool part is you can seamlessly move physical and virtual workloads around and the licenses follow. The catch is you have to have Software Assurance to make the licenses mobile. Nice touch there. Let’s have a moment of silence for the late, unlamented, largely ignored Workgroup Edition. To quote the Microsoft FAQ: “Standard becomes our sole edition for basic database needs”. Considering I haven’t encountered a singe instance of SQL Server Workgroup Edition in the wild, I don’t think this will be all that controversial. As for pricing, it looks like a wash with current per-socket pricing based on four core sockets. Interestingly, that is the minimum core count Microsoft proposes to swap to transition per-socket to per-core if you are on Software Assurance. Reading the fine print shows that if you are using more, you will get more core licenses: From the licensing FAQ. 15. How do I migrate from processor licenses to core licenses? What is the migration path? Licenses purchased with Software Assurance (SA) will upgrade to SQL Server 2012 at no additional cost. EA/EAP customers can continue buying processor licenses until your next renewal after June 30, 2012. At that time, processor licenses will be exchanged for core-based licenses sufficient to cover the cores in use by processor-licensed databases (minimum of 4 cores per processor for Standard and Enterprise, and minimum of 8 EE cores per processor for Datacenter). Looks like the folks who invested in the AMD 12-core chips will make out like bandits. Now, on to something new: SQL Server Business Intelligence Edition. Yep, finally a BI-specific SKU licensed for server+CAL configurations only. Note that Enterprise Edition still supports the complete feature set; the BI Edition is intended for smaller shops who want to use the full BI feature set but without needing Enterprise Edition scale (or costs). No, you don’t get ColumnStore, Compression, or Partitioning in the BI Edition. Those are Enterprise scale features, ThankYouVeryMuch. Then again, your starting licensing costs are about one sixth of an Enterprise Edition system (based on an 8 core server). The only part of the message I am missing is if the current Failover Licensing Policy will change. Do we need to fully or partially license failover servers? That is a detail I definitely want to know.

Read the article

What is my miniport's service name?

- by Ian Boyd

i am trying to query the physical sector size of my drive using fsutil: C:\Windows\system32>fsutil fsinfo ntfsinfo c: NTFS Volume Serial Number : 0x78cc11b2cc116c1e Version : 3.1 Number Sectors : 0x000000003a382fff Total Clusters : 0x00000000074705ff Free Clusters : 0x00000000022fc29b Total Reserved : 0x00000000000007d0 Bytes Per Sector : 512 Bytes Per Physical Sector : <Not Supported> Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x00000000305c0000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x0000000003a382ff Mft Zone Start : 0x0000000006951940 Mft Zone End : 0x0000000006951c80 RM Identifier: 19B22CBE-570D-19DE-9C72-CD758F800DDC You can see that the Bytes Per Physical Sector value is Not Supported: Bytes Per Physical Sector : <Not Supported> In KB Article Microsoft support policy for 4K sector hard drives in Windows, Microsoft says: If fsutil.exe continues to display "Bytes Per Physical Sector : " after you apply the latest storage driver and the required hotfixes, make sure that the following registry path exists: HKLM\CurrentControlSet\Services\<miniport’s service name>\Parameters\Device\ Name: EnableQueryAccessAlignment Type: REG_DWORD Value: 1: Enable The only thing i don't know is what my Miniport's service name is. What is my miniport's service name. i know that my SATA drives are in AHCI mode, and AHCI uses the msahci driver service: Is that my miniport service? "MSAHCI"? See also Hitachi - Advanced Format Technology Brief RMPrepUSB - Advanced Format (4K sector) hard disks Microsoft support policy for 4K sector hard drives in Windows OSR Online - Advance Disk Format support in Storport Virtual Mniport diver Default cluster size for NTFS, FAT, and exFAT Wikipedia - Advanced Format

Read the article

Windows 8.1 will not sleep after wake up

- by per

I have problem with sleep/screen saver on my new Windows 8.1 machine. It will go to to sleep or start screen saver after start (or restart). But if it goes to sleep (manually or automatically) and I wake it up, it wont start sleep or start screen saver again automatically. I updated chipset and graphic cards drivers. I don't have any homegroup. Does anyone have similar issue? Thanks for any advice, per

Read the article

SmS Gateways - How do other sites do it? [closed]

- by chobo2

Possible Duplicate: Send and Receive SMS from my Website I would love to have a feature on my site that sends Email reminders and SmS(text messages) to people mobile phones. I been searching around and all I am finding is api's that charge money per SmS message(as low as 1cent per message). However even at 1cent per message that is still too much. The amount of money I am charging per year could be servilely eroded by just the Sms messages along. I could of course charge more money for my service or have an add on for SmS messages but I don't think either would work as most people expect it to be free feature and if they have to pay anything that is because of their carrier charging them not the website. How do other sites do it? I guessing companies like google have their own gateway providers or something like that. But how about smaller sites what do they do? I can't see them paying per sms text message.

Read the article

Init.d script gets return code 1 when calling itself, how can I get output?

- by Per

My question is, how can I modify the script so that it will tell me what goes wrong? The scenario is this: I'm trying to get Sonatype Nexus to start as a service in Ubuntu 10.04, and it just will not work. (I'm not looking for help on how to run Nexus, but on how to get some useful output from a script) It works when invoking it with sudo /etc/init.d/nexus start but fails when using sudo service nexus start I have run the update-rc.d command on it, and done everything according to instructions. The nexus init.d-script has a point where it calls itself when it detects that it should run as another user ('nexus'): su -m $RUN_AS_USER -c "\"$REALPATH\" $2" which expands to su -m nexus -c '"/opt/nexus-2.0.2/bin/jsw/linux-x86-64/nexus" start' when adding the -x debug flag to the script. This command results in return code 1. It never executes - I've set -x debug flag on the script, placed echo commands with redirect to file at the start of script to trace, etc. I cannot get any output telling me why the command will not execute. I've tried appending redirect to file after the above script line, inside the quotes, outside, any way I could imagine. All info I can get is by inserting a line echo $? after the su line, which outputs '1'. Is there a way I can see what happens when the su command runs?

Read the article

Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

Apache2 benchmarks - very poor performance

- by andrzejp

I have two servers on which I test the configuration of apache2. The first server: 4GB of RAM, AMD Athlon (tm) 64 X2 Dual Core Processor 5600 + Apache 2.2.3, mod_php, mpm prefork: Settings: Timeout 100 KeepAlive On MaxKeepAliveRequests 150 KeepAliveTimeout 4 <IfModule Mpm_prefork_module> StartServers 7 MinSpareServers 15 MaxSpareServers 30 MaxClients 250 MaxRequestsPerChild 2000 </ IfModule> Compiled in modules: core.c mod_log_config.c mod_logio.c prefork.c http_core.c mod_so.c Second server: 8GB of RAM, Intel (R) Core (TM) i7 CPU [email protected] Apache 2.2.9, **fcgid, mpm worker, suexec** PHP scripts are running via fcgi-wrapper Settings: Timeout 100 KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 4 <IfModule Mpm_worker_module> StartServers 10 MaxClients 200 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 1000 </ IfModule> Compiled in modules: core.c mod_log_config.c mod_logio.c worker.c http_core.c mod_so.c The following test results, which are very strange! New server (dynamic content - php via fcgid+suexec): Server Software: Apache/2.2.9 Server Hostname: XXXXXXXX Server Port: 80 Document Path: XXXXXXX Document Length: 179512 bytes Concurrency Level: 10 Time taken for tests: 0.26276 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 179935000 bytes HTML transferred: 179512000 bytes Requests per second: 38.06 Transfer rate: 6847.88 kb/s received Connnection Times (ms) min avg max Connect: 2 4 54 Processing: 161 257 449 Total: 163 261 503 Old server (dynamic content - mod_php): Server Software: Apache/2.2.3 Server Hostname: XXXXXX Server Port: 80 Document Path: XXXXXX Document Length: 187537 bytes Concurrency Level: 10 Time taken for tests: 173.073 seconds Complete requests: 1000 Failed requests: 22 (Connect: 0, Length: 22, Exceptions: 0) Total transferred: 188003372 bytes HTML transferred: 187546372 bytes Requests per second: 5777.91 Transfer rate: 1086267.40 kb/s received Connnection Times (ms) min avg max Connect: 3 3 28 Processing: 298 1724 26615 Total: 301 1727 26643 Old server: Static content (jpg file) Server Software: Apache/2.2.3 Server Hostname: xxxxxxxxx Server Port: 80 Document Path: /images/top2.gif Document Length: 40486 bytes Concurrency Level: 100 Time taken for tests: 3.558 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 40864400 bytes HTML transferred: 40557482 bytes Requests per second: 281.09 [#/sec] (mean) Time per request: 355.753 [ms] (mean) Time per request: 3.558 [ms] (mean, across all concurrent requests) Transfer rate: 11217.51 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 3 11 4.5 12 23 Processing: 40 329 61.4 339 1009 Waiting: 6 282 55.2 293 737 Total: 43 340 63.0 351 1020 New server - static content (jpg file) Server Software: Apache/2.2.9 Server Hostname: XXXXX Server Port: 80 Document Path: /images/top2.gif Document Length: 40486 bytes Concurrency Level: 100 Time taken for tests: 3.571531 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 41282792 bytes HTML transferred: 41030080 bytes Requests per second: 279.99 [#/sec] (mean) Time per request: 357.153 [ms] (mean) Time per request: 3.572 [ms] (mean, across all concurrent requests) Transfer rate: 11287.88 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 2 63 24.8 66 119 Processing: 124 278 31.8 282 391 Waiting: 3 70 28.5 66 164 Total: 126 341 35.9 350 443 I noticed that in the apache error.log is a lot of entries: [notice] mod_fcgid: call /www/XXXXX/public_html/forum/index.php with wrapper /www/php-fcgi-scripts/XXXXXX/php-fcgi-starter What I have omitted, or do not understand? Such a difference in requests per second? Is it possible? What could be the cause?

Read the article

Benchmark MySQL Cluster using flexAsynch: No free node id found for mysqld(API)?

- by quanta

I am going to benchmark MySQL Cluster using flexAsynch follow this guide, details as below: mkdir /usr/local/mysqlc732/ cd /usr/local/src/mysql-cluster-gpl-7.3.2 cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysqlc732/ -DWITH_NDB_TEST=ON make make install Everything works fine until this step: # /usr/local/mysqlc732/bin/flexAsynch -t 1 -p 80 -l 2 -o 100 -c 100 -n FLEXASYNCH - Starting normal mode Perform benchmark of insert, update and delete transactions 1 number of concurrent threads 80 number of parallel operation per thread 100 transaction(s) per round 2 iterations Load Factor is 80% 25 attributes per table 1 is the number of 32 bit words per attribute Tables are with logging Transactions are executed with hint provided No force send is used, adaptive algorithm used Key Errors are disallowed Temporary Resource Errors are allowed Insufficient Space Errors are disallowed Node Recovery Errors are allowed Overload Errors are allowed Timeout Errors are allowed Internal NDB Errors are allowed User logic reported Errors are allowed Application Errors are disallowed Using table name TAB0 NDBT_ProgramExit: 1 - Failed ndb_cluster.log: WARNING -- Failed to allocate nodeid for API at 127.0.0.1. Returned eror: 'No free node id found for mysqld(API).' I also have recompiled with -DWITH_DEBUG=1 -DWITH_NDB_DEBUG=1. How can I run flexAsynch in the debug mode? # /usr/local/mysqlc732/bin/flexAsynch -h FLEXASYNCH Perform benchmark of insert, update and delete transactions Arguments: -t Number of threads to start, default 1 -p Number of parallel transactions per thread, default 32 -o Number of transactions per loop, default 500 -l Number of loops to run, default 1, 0=infinite -load_factor Number Load factor in index in percent (40 -> 99) -a Number of attributes, default 25 -c Number of operations per transaction -s Size of each attribute, default 1 (PK is always of size 1, independent of this value) -simple Use simple read to read from database -dirty Use dirty read to read from database -write Use writeTuple in insert and update -n Use standard table names -no_table_create Don't create tables in db -temp Create table(s) without logging -no_hint Don't give hint on where to execute transaction coordinator -adaptive Use adaptive send algorithm (default) -force Force send when communicating -non_adaptive Send at a 10 millisecond interval -local 1 = each thread its own node, 2 = round robin on node per parallel trans 3 = random node per parallel trans -ndbrecord Use NDB Record -r Number of extra loops -insert Only run inserts on standard table -read Only run reads on standard table -update Only run updates on standard table -delete Only run deletes on standard table -create_table Only run Create Table of standard table -drop_table Only run Drop Table on standard table -warmup_time Warmup Time before measurement starts -execution_time Execution Time where measurement is done -cooldown_time Cooldown time after measurement completed -table Number of standard table, default 0

Search Results

Search found 8692 results on 348 pages for 'per magnusson'.

Page 28/348 | < Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35 | Next Page >

- by Gaurang

- by Coderer

- by Tom

- by mokasin

- by syn4k

- by Bob

- by Chas Emerick

- by dan04

- by John

- by Chris

- by Gary McGill

- by Roger Lipscombe

- by Andrew Dyer

- by DanO

- by BigTommy79

- by Chad Johnson

- by eddraper

- by Geoff N. Hiten

- by Ian Boyd

- by per

- by chobo2

- by Per

- by user12620111

- by andrzejp

- by quanta

< Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35 | Next Page >