Search Results

Search found 10417 results on 417 pages for 'large'.

Page 104/417 | < Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111 | Next Page >

Nginx + PHPBB3 reverse proxy images problem

- by siberiano

Hello all I have a problem with my Nginx Frontend + Apache2 backend + PHPBB3 software. It doesn't load the CSS and the images neither. I get constant errors like these: 2010/04/14 16:57:25 [error] 13365#0: *69 open() "/var/www/foo/styles/styles/coffee_time/theme/large.css" failed (2: No such file or directory), client: 83.44.175.237, server: www.foo.com, request: "GET /styles/coffee_time/theme/large.css HTTP/1.1", host: "www.foo.com", referrer: "http://www.foo.com/viewforum.php?f=43" This is my config of the site: server { listen 80; server_name www.foo.com; access_log /var/log/nginx/foo.access.log; # serve static files directly location ~* ^.+.(jpg|jpeg|gif|css|png|js|ico)$ { access_log off; expires 30d; root /var/www/trasteando/; } location / { root /var/www/foo/; index /var/www/foo/index.php; } # proxy the PHP scripts to predefined upstream .apache. # location ~ .php$ { proxy_pass http://apache; } location /styles/ { root /var/www/foo/styles/; }

Read the article
Deploying multiple identical copies of a virtual machine for compute tasks

- by Reid

I have a compute task which has a large number of library dependencies. I would like to deploy it on some of my company's large Linux clusters, where I do not have root. I could probably track down, compile, and install the right versions of all the libraries, but this looks to be quite tedious and would have to be repeated if I deployed it again somewhere else. On the other hand, it's pretty easy to install on current Ubuntu. This led me to wonder about a virtual machine approach. Could I put together a virtual machine which booted up, ran the computation (with parameters from and results to the host), and then shut down? In other words, I'd like a command like this that I could run on the host: $ ./run-vm --ram N --task /path/on/host/foo.sh --results /another/host/dir/ This would boot the VM, run foo.sh, and put the (relatively small) results of the computation in /another/host/dir/. It's important to start up many instances of the VM simultaneously, both on a single node and multiple nodes of the cluster. So it would be nice if I didn't have to make many copies of the VM virtual disk and metadata. As the task instances are completely independent, the VMs would not need any network support once deployed, or any outside communications beyond reading and writing the host filesystem. Is this possible, and if so, how might I go about doing it? Are there assumptions I've made above which are bogus?

Read the article
WD Caviar Green Extremely Slow

- by Steven

I am encountering a really weird problem on my WD Caviar Green HDD. Well first of all I have 2 HDDs on my Desktop, one 160GB Seagate holding my Win7 Ultimate x64 and the problematic one, WD 1.5 Caviar Green for storage purpose. My problem is kinda weird, when I transfer files from my Seagate(C:) to my WD (D:) the speed is good (50-60MB/s). Then the problem arises when I transfer too "many" large files, the transfer speed would go straight down to kilobytes/s. Well after I cancelled the transfer and access my D:, even entering a folder requires loading for like 10 seconds. Such problem not only arises when I am transferring files to my D:, it seems like my WD can't handle much activities. For instance, last time I installed my game on D: and I would face much lag after playing for some time. When the same game is installed on C: no problem arises. Does anyone knows what is the problem? P/S: There was one temporary solution that I used to tried. After the "situation" occurs, I tried to access as many folders on D: as I can and let it load, repeating such actions and giving it some time bring the D: back to speedy transfer. However, large transfers would causes the situation to happen again. Does it have something to do with cache whatsoever?

Read the article
How is the extra mSATA SSD disk used/configured in a Dell XPS laptop?

- by Mark

Some machines in the new XPS laptop range from Dell come with a regular, large (500GB+) HDD and an additional 32GB m-SATA SSD. The only detail I can find about this extra drive on the Dell site is this: Store your important files, multimedia and photos with XPS 15’s large hard drive options. To get instant access to your media, choose an optional mSATA solid-state drive (SSD) that can boot up to twice as fast as a regular hard drive and resumes in less than 1 second. I'd like to know more about how this extra drive is set up and used, specifically: Is anything installed on it (e.g. OS files or a boot loader) or is it just used as swap space? Is the m-SATA drive visible as a lettered drive in Windows? (I'd guess not if it's used for swap file only.) Is this unusual configuration likely to cause any problems later down the line - e.g. when upgrading to Windows 8? As usual, Dell's sales team haven't been able to help. If anyone's actually got a Dell machine with this or a similar hard drive set-up and can give a definitive answer rather than speculation I'll accept the answer.

Read the article
Seagate 3TB hard drive loses format information

- by Victor Bugarin

I have a Windows 7x64 Ultimate, 6 GB memory, 1 TB HD. 3TB Barracuda XT HDD. The HDD is installed on a StarTech 4 bays external enclosure I had troubles so I converted to a GPT, created 1 partition and formatted as NTFS. The hard drive I can write and read to and from the hard drive but it will become unreadable at some point while I am copying files or after I have copied files to it. I have copied large Bluray movies and diverse video files, I have also copied 32 GB of pictures, and I have copied about 86 thousand music files in different formats. At some point the partition becomes unreadable and I have to format the partition again (all files lost) and I have to start the whole process again. At some point I have been unable to copy large ISO (Bluray movies) file images. I have partitioned the HDD in 2 partitions P1 - 2TB, P2 - 1TB and I have lost every single file in either partition the same way. I reformat the HDD and it seems fine. I have run seatools to check the hard drive and it reports to be OK. What gives?

Read the article
How do you update without cutting off users?

- by Griffin

I searched around and I was surprised that I couldn't find an answer to this question. My assumption is that you have multiple servers. Normally they both will be doing their specific take (for the rest of this I will assume a simple website). Now lets say server A & B need updates. Do you update server A while server B keeps pushing out the webpage and then when server A is okay you update server B? This seems like it would work in small scale but seems horrible in large scale due to the fact that you'd need twice the power that you normally have. When dealing with a large number of servers do you update small sections at a time? I thought the problem with this would be if server A can't work alongside server B C D E or F any-longer that's not that bad. But when you start updating you slowly lose this small percentage. What is the proper way to deal with updates like this?

Read the article
Nginx Multiple If Statements Cause Memory Usage to Jump

- by Justin Kulesza

We need to block a large number of requests by IP address with nginx. The requests are proxied by a CDN, and so we cannot block with the actual client IP address (it would be the IP address of the CDN, not the actual client). So, we have $http_x_forwarded_for which contains the IP which we need to block for a given request. Similarly, we cannot use IP tables, as blocking the IP address of the proxied client will have no effect. We need to use nginx to block the requested based on the value of $http_x_forwarded_for. Initially, we tried multiple, simple if statements: http://pastie.org/5110910 However, this caused our nginx memory usage to jump considerably. We went from somewhere around a 40MB resident size to over a 200MB resident size. If we changed things up, and created one large regex that matched the necessary IP addresses, memory usage was fairly normal: http://pastie.org/5110923 Keep in mind that we're trying to block many more than 3 or 4 IP addresses... more like 50 to 100, which may be included in several (20+) nginx server configuration blocks. Thoughts? Suggestions? I'm interested both in why memory usage would spike so greatly using multiple if blocks, and also if there are any better ways to achieve our goal.

Read the article
Why is piping dd through gzip so much faster than a direct copy?

- by Foo Bar

I wanted to backup a path from a computer in my network to another computer in the same network over a 100MBit/s line. For this I did dd if=/local/path of=/remote/path/in/local/network/backup.img which gave me a very low network transfer speed of something about 50 to 100 kB/s, which would have taken forever. So I stopped it and decided to try gzipping it on the fly to make it much smaller so that the amount to transfer is less. So I did dd if=/local/folder | gzip > /remote/path/in/local/network/backup.img.gz But now I get something like 1 MB/s network transfer speed, so a factor of 10 to 20 faster. After noticing this, I tested this on several paths and files and it was always the same. Why does piping dd through gzip also increase the transfer rates by a large factor instead of only reducing the bytelength of the stream by a large factor? I'd expected even a small decrease in transfer rates instead, due to the higher CPU consumption while compressing, but now I get a double plus. Not that I'm not happy, but just wondering. ;)

Read the article
DVD/CD burning .zip: is it more reliable, faster, longer lasting to burn a zip of files rather than the files as a folder?

- by Rob

Is it more reliable, faster, longer lasting to burn to CD/DVD a zip (or a few large zips) of files rather than the files as a folder? Just thinking if 1000s of small files would not be as efficiently recorded compared with one or a few large zips. Also, even after the burning program verifies the disc, I also use Beyond Compare to compare the files with those on the disc. Always binary compares as identical but I hear the drive stuttering presumably as the head is being shifted only slightly each time to seek the next file, which leads me to think that its best to make one or more zips and copy those locally to compare. Or is it that burning invidual files to the disc is not as readable which causes the head to stutter. There aren't any problems, my disc burns are reliable, just thinking more of efficiency and longevity, the discs burn and verify fast enough on my 18x DVD burner. I'm using ImgBurn mostly. Also used Nero in the past. I burn whole discs closed, finalised. Not sure which write mode but would think Disc At Once from a temporary cached image made by the burning program would be the most reliable.

Read the article
What can be done to improve time synchronization on networks with sporadic internet access?

- by anregen

I'm looking for advice setting up time servers for a very non-typical network. I support many closed networks that have occasional access to the internet. A network would get access most days for a few hours, but would frequently go 1-3 weeks blacked-out. The computers/servers on this network are mostly *nix-based, but not all the same flavor. The entire network is mobile, so when it connects, it will have very different hops/latency to internet time servers. The servers on the closed network are powered-off frequently (at least daily). Right now, my gut tells me to use NTP (because I hate re-learning all the stuff that someone else already got working pretty well). But I have several issues, and am looking for someone with experience in this type of strange situation. I currently have no solution in place, I'm simply letting the internal clocks drift. This results in errors of ~600s in a majority of networks. I have seen mismatch worse than 10,000s. Is there something "better" than NTP in this situation? I know NTP likes to have very frequent, consistent access to servers that give nearly identical answers. I won't have that. How many internal NTP servers should I configure, so that during periods of internet blackout, I have internal time that is consistent within the closed network? There is no human access. No matter how large the mismatch, the server(s) must attempt to correct itself. Discrete steps are very bad. No matter how large the mismatch, the correction must be "slewed", not "stepped". I understand that this could take many hours to correct.

Read the article
Modifying value of "Rating" column within Explorer for arbitrary file types

- by Fake Name

Basically, I have a large body of assorted media (text, images, flash files, archives, folders, etc...) and I'm attempting to organize it. Windows Explorer has a rating column, but there seems to be no way to modify the rating of the files short of opening them in their type-specific software (e.g. Media player, or Photo viewer). However, this does not work when the file is of an unsupported type (.rar, .swf ...), or a directory. I'd be more than willing to consider a file-manager replacement (I've alreadly looked at quite a few, Directory Opus, Total Commander, etc...), or even a solution that stores the rating metadata in a hidden file in each folder, or a separate database. The one real critical requirement is the ability to sort by rating, and being filetype-agnostic. Basically, is there any way to categorize a large collection of assorted files by rating that will work with any file type, including directories? - Ideally, there would be an easy way to add arbitrary columns to windows explorer, and edit them directly. However, there seems to be no way to do this. The rating column is the next best thing.

Read the article
Mod_pagespeed, Varnish and Apache cache issues after new code pushes

- by WerkkreW

I have a rather strange issue. In my environment we are running a load balanced cluster of 8 apache servers with a master-master MySQL backend. In front of apache we have Varnish in the cache layer. We have been running Apache mod_pagespeed for several weeks now and for the most part it has been working great. The issue arises when we do fresh code updates from Git, and and/all of the JS/CSS assets change. Basically the problem appears to be two fold. One, after the code push we generally take the opportunity to flush varnish, restart apache, and restart varnish. In doing this all of the mod_pagespeed combinied/minified files are cleared out ensuring that all of the new JS/CSS assets are fresh. The problem is, upon doing this the file names that mod_pagespeed creates change, but the old files (appear) to be still cached for many people client side leading to very unexpected results. However, if we do not restart apache, the changes to the files may or may not appear client side due to the cached minified assets. The simple solution is to disable mod_pagespeed, however I would rather not do that as it has made a fairly large impact in performance. I feel as if there must be a better way to deal with the inconsistencies in cache between the client and server to prevent having people to go to great lengths or perform a large number of page refreshes to see a working page. I can provide configuration snippets if anyone needs them. If you would like to inspect the site, source, headers, or anything try the following addresses: http://wellplayed.org http://wellplayed.org/tv Thanks in advance!

Read the article
Which components should I invest in.. for a backup machine.

- by Senthil

I am a freelance developer. I have a PC, a laptop and an old testing and file server machine. I might add one or two in future. I want to have an on-site backup machine that can handle backups of ALL these machines - file backups, MySQL backups, backup of subversion repository, etc.. When building the machine, which components should I invest more in? Examples: The cabinet should have lots of room for expansion. Hard disk size should be large. But I guess hard disk speed need not be high (?) But other components like, RAM, PSU, Processor, Network card, Cooling, etc.. how much relative importance do these have in a backup machine? Which of these components should be high-end or large, and which ones need not be? Some Idea of the load: There will TBs of data. File backups and subversion repository backups will at least be done daily. MySQL backups done weekly. assume 3 machines at the moment and somewhere around 10 machines in the future.

Read the article
Adding a transaction ID to ruby-on-rails logs

- by Blue Warrior NFB

We have a RoR app (rails version 3.2.15 right now). As it has been getting busier, the log-files it's producing are becoming less and less useful for troubleshooting. When they come in like this, it's not a problem: Started GET "/accounts/28088166/kittens/22894/rendered_png?file_id=5d3eaec77954a489b5ddd75143091767&kitten_store_id=9970569bbacf7b6dbeb4eb9295960d69&size=large" for 172.16.202.30 at 2013-11-12 13:45:00 +0000 Processing by KittenController#rendered_png as HTML Parameters: {"file_id"="5d3eaec77954a489b5ddd75143091767", "kitten_store_id"="9970569bbacf7b6dbeb4eb9295960d69", "size"="large", "kitten_cam_id"="280941", "id"="kjlak357aw479607t"} Rendered text template (0.0ms) Sent data (1.8ms) Completed 200 OK in 1037.4ms (Views: 1.4ms | ActiveRecord: 98.4ms) Short request, quickly assembled, all the relevant log-lines are in one block. However, not all of our code renders in 1037ms. There are a few calls that can exceed several seconds, and during that time several of these quicker ones can come in. When that happens, its very, very hard to identify which log-lines belong to which GET. Sent data (4.1ms) Completed 200 OK in 767.4ms (Views: 3.2ms | ActiveRecord: 72.2ms) Completed 200 OK in 2338.0ms (Views: 0.2ms | ActiveRecord: 0.0ms) Ooookaaaay... which goes to what? Is it possible to add something like a transaction-ID to these log-lines? The log-spam would be interspersed, but at least grep-magic would give me the unified entries that I need.

Read the article
RAIDs with a lot of spindles - how to safely put to use the "wasted" space

- by kubanczyk

I have a fairly large number of RAID arrays (server controllers as well as midrange SAN storage) that all suffer from the same problem: barely enough spindles to keep the peak I/O performance, and tons of unused disk space. I guess it's a universal issue since vendors offer the smallest drives of 300 GB capacity but the random I/O performance hasn't really grown much since the time when the smallest drives were 36 GB. One example is a database that has 300 GB and needs random performance of 3200 IOPS, so it gets 16 disks (4800 GB minus 300 GB and we have 4.5 TB wasted space). Another common example are redo logs for a OLTP database that is sensitive in terms of response time. The redo logs get their own 300 GB mirror, but take 30 GB: 270 GB wasted. What I would like to see is a systematic approach for both Linux and Windows environment. How to set up the space so sysadmin team would be reminded about the risk of hindering the performance of the main db/app? Or, even better, to be protected from that risk? The typical situation that comes to my mind is "oh, I have this very large zip file, where do I uncompress it? Umm let's see the df -h and we figure something out in no time..." I don't put emphasis on strictness of the security (sysadmins are trusted to act in good faith), but on overall simplicity of the approach. For Linux, it would be great to have a filesystem customized to cap I/O rate to a very low level - is this possible?

Read the article
Multiple Screen - Keyboard Sharing

- by nhbdesign

I run a small architectural firm with several drafters employed, I'm currently setting up a new office space and one of the things on top of my list is a figuring out a way to keep tabs on my drafters work and being able to collaborate in real time. Here's the challenge, they sit in a separate large cubicle room and I'm on the other end of the hallway, the way it is now; every time they’ve got some question on how to proceed on a certain design, they would come all the way to my office, I'd open their file (in read only) give some ideas, save-as new file, they go back copy paste... in short, nonsense. What I've been thinking of is to setup a hardwired solution that should enable me to have an extra monitor on my desktop which is hardwired (through KVM or something) to each of my employees workstations serving as a secondary display, so that I can watch live what they do, interact with them just as if they would have an extra keyboard and monitor in my office, except; I don’t want to have on my desk a separate monitor for each employee.. so I'd want them to be tiled on a single large screen, watching all screens alive, and whenever they ask me (or I just decide..) to step in, I just click on any tile and hurray, I'm in, editing and saving in real time on their workstation. I also want to reserve the option when I want to, to just use that monitor as just an extra screen for my workstation. Is something like that possible in 2013? P.S. I know of TeamViewer and similar internet/software based stuff, but I'm specifically looking for something solid hardwired and maintenance free, and also something that would allow to watch without my employees getting notified every time I do so (I’m not a tough boss though...).

Read the article
Modifying value of "Rating" column within Explorer for arbitrary file types.

- by Fake Name

Basically, I have a large body of assorted media (text, images, flash files, archives, folders, etc...) and I'm attempting to organize it. Windows Explorer has a rating column, but there seems to be no way to modify the rating of the files short of opening them in their type-specific software (e.g. Media player, or Photo viewer). However, this does not work when the file is of an unsupported type (.rar, .swf ...), or a directory. I'd be more than willing to consider a file-manager replacement (I've alreadly looked at quite a few, Directory Opus, Total Commander, etc...), or even a solution that stores the rating metadata in a hidden file in each folder, or a separate database. The one real critical requirement is the ability to sort by rating, and being filetype-agnostic. Basically, is there any way to categorize a large collection of assorted files by rating that will work with any file type, including directories? - Ideally, there would be an easy way to add arbitrary columns to windows explorer, and edit them directly. However, there seems to be no way to do this. The rating column is the next best thing.

Read the article
Online FTP or file sharing service [on hold]

- by Frede

We need to share large files with clients, e.g. clients upload a large file, we modify it and later make it available for download. Up until now we've used FTP but this has a number of drawbacks. A lot of management of files and setting up accounts etc. We are therefore considering online alternatives. Requirements: Cheap, 8-) Easy to use, ideally just requiring a web browser, but also possible for power users to connect e.g. via FTPS/SFTP No registration requried for users to upload/download files. We ourselves of course need to be able to login an view uploaded files and upload new files. No per user fee High bandwidth. As files may be GBs in size both upload and download speed cannot be too slow Secure. Encryption during upload/download. No way for users to access uploaded files. Once a user has uploaded a file they (or anyone else besides us) should be able to access the file. To download files users get a link with a password. Ideally the link expires after a set time. No software installation We do NOT need any sync features, backup, versioning etc. Just a quick, easy, secure way for us to share files with our clients. Services like JustCloud, DriveHQ etc seems bloated and "too much" for what we need. What other alternatives exist? Thanks!

Read the article
Outlook 2010: Can I search Only My: Inbox, All Inbox Subfolders, and Specified Archive File Folders all at once

- by JLH

The setup is a user that has a laptop with Outlook 2010. We have Outlook hosted by Sherweb. The user that has a large number of emails (40,000) in a single Inbox subfolder. (I believe) Having such a large number of emails in an inox is slowing the users laptop down and I want to start moving old emails to a seperate pst file on a machine on our network. The problem I have is the user needs to be able to search all 40,000 emails. Right now he can can search do a search on the single subfolder. I would like to be able to move some of the emails to a seperate pst so I can compact the Inbox and still give them a 'one-click' search function that is still fairly quick. I don't think the 'Search All Outlook Items' is the soltuion because this will search all outlook folders -- sent items, other public folders. P.S. I'm not a expericenced outlook administrator, so there may be some assumptions in my questions that are wrong. I have no problem with somebody showing the error of my ways.

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Loading PNGs into OpenGL performance issues - Java & JOGL much slower than C# & Tao.OpenGL

- by Edward Cresswell

I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL. This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference. With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms The same on C# takes 54ms to load the image, and 34ms to load/create texture. The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing). On the Java side the code looks like this: BufferedImage image = ImageIO.read(new File(fileName)); texture = TextureIO.newTexture(image, false); texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR); texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR); The C# code uses: Bitmap t = new Bitmap(fileName); t.RotateFlip(RotateFlipType.RotateNoneFlipY); Rectangle r = new Rectangle(0, 0, t.Width, t.Height); BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb); Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID); Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0); Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR); Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR); t.UnlockBits(bd); t.Dispose(); After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong. Thanks. Edit2: I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it. Edit3: Benchmark results for 5 variations. I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times. testStart: ImageIO.read(file) -> TextureIO.newTexture(image) result: avg = 10250ms, total = 51251 testStart: ImageIO.read(bis) -> TextureIO.newTexture(image) result: avg = 10029ms, total = 50147 testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage) result: avg = 5343ms, total = 26717 testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage) result: avg = 5534ms, total = 27673 testStart: TextureIO.newTexture(file) result: avg = 10395ms, total = 51979 ImageIO.read(bis) refers to the technique described in James Branigan's answer below. argbImage refers to the technique described in my previous edit: img = ImageIO.read(file); argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE); g = argbImg.createGraphics(); g.drawImage(img, 0, 0, null); texture = TextureIO.newTexture(argbImg, false); Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.

Read the article
Odd tcp deadlock under windows

- by John Robertson

We are moving large amounts of data on a LAN and it has to happen very rapidly and reliably. Currently we use windows TCP as implemented in C++. Using large (synchronous) sends moves the data much faster than a bunch of smaller (synchronous) sends but will frequently deadlock for large gaps of time (.15 seconds) causing the overall transfer rate to plummet. This deadlock happens in very particular circumstances which makes me believe it should be preventable altogether. More importantly if we don't really know the cause we don't really know it won't happen some time with smaller sends anyway. Can anyone explain this deadlock? Deadlock description (OK, zombie-locked, it isn't dead, but for .15 or so seconds it stops, then starts again) The receiving side sends an ACK. The sending side sends a packet containing the end of a message (push flag is set) The call to socket.recv takes about .15 seconds(!) to return About the time the call returns an ACK is sent by the receiving side The the next packet from the sender is finally sent (why is it waiting? the tcp window is plenty big) The odd thing about (3) is that typically that call doesn't take much time at all and receives exactly the same amount of data. On a 2Ghz machine that's 300 million instructions worth of time. I am assuming the call doesn't (heaven forbid) wait for the received data to be acked before it returns, so the ack must be waiting for the call to return, or both must be delayed by something else. The problem NEVER happens when there is a second packet of data (part of the same message) arriving between 1 and 2. That part very clearly makes it sound like it has to do with the fact that windows TCP will not send back a no-data ACK until either a second packet arrives or a 200ms timer expires. However the delay is less than 200 ms (its more like 150 ms). The third unseemly character (and to my mind the real culprit) is (5). Send is definitely being called well before that .15 seconds is up, but the data NEVER hits the wire before that ack returns. That is the most bizarre part of this deadlock to me. Its not a tcp blockage because the TCP window is plenty big since we set SO_RCVBUF to something like 500*1460 (which is still under a meg). The data is coming in very fast (basically there is a loop spinning out data via send) so the buffer should fill almost immediately. According to msdn the buffer being full and at least one pending send should cause the data to be sent (though in another place it mentions that there various "heuristics" used in deciding when a send hits the wire). Anway, why the sender doesn't actually send more data during that .15 second pause is the most bizarre part to me. The information above was captured on the receiving side via wireshark (except of course the socket.recv return times which were logged in a text file). We tried changing the send buffer to zero and turning off Nagle on the sender (yes, I know Nagle is about not sending small packets - but we tried turning Nagle off in case that was part of the unstated "heuristics" affecting whether the message would be posted to the wire. Technically microsoft's Nagle is that a small packet isn't sent if the buffer is full and there is an outstanding ACK, so it seemed like a possibility).

Read the article
JOptionPane opening another JFrame

- by mike_hornbeck

So I'm continuing my fight with this : http://stackoverflow.com/questions/2923545/creating-java-dialogs/2926126 task. Now my JOptionPane opens new window with envelope overfiew, but I can't change size of this window. Also I wanted to have sender's data in upper left corner, and receiver's data in bottom right. How can I achieve that ? There is also issue with OptionPane itself. After I click 'OK' it opens small window in the upper left corner of the screen. What is this and why it's appearing ? My code: import java.awt.*; import java.awt.Font; import javax.swing.*; public class Main extends JFrame { private static JTextField nameField = new JTextField(20); private static JTextField surnameField = new JTextField(); private static JTextField addr1Field = new JTextField(); private static JTextField addr2Field = new JTextField(); private static JComboBox sizes = new JComboBox(new String[] { "small", "medium", "large", "extra-large" }); public Main(){ JPanel mainPanel = new JPanel(); mainPanel.setLayout(new BoxLayout(mainPanel, BoxLayout.Y_AXIS)); getContentPane().add(mainPanel); JPanel addrPanel = new JPanel(new GridLayout(0, 1)); addrPanel.setBorder(BorderFactory.createTitledBorder("Receiver")); addrPanel.add(new JLabel("Name")); addrPanel.add(nameField); addrPanel.add(new JLabel("Surname")); addrPanel.add(surnameField); addrPanel.add(new JLabel("Address 1")); addrPanel.add(addr1Field); addrPanel.add(new JLabel("Address 2")); addrPanel.add(addr2Field); mainPanel.add(addrPanel); mainPanel.add(new JLabel(" ")); mainPanel.add(sizes); String[] buttons = { "OK", "Cancel"}; int c = JOptionPane.showOptionDialog( null, mainPanel, "My Panel", JOptionPane.DEFAULT_OPTION, JOptionPane.PLAIN_MESSAGE, null, buttons, buttons[0] ); if(c ==0){ new Envelope(nameField.getText(), surnameField.getText(), addr1Field.getText() , addr2Field.getText(), sizes.getSelectedIndex()); } setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); pack(); setVisible(true); } public static void main(String[] args) { new Main(); } } class Envelope extends JFrame { private final int SMALL=0; private final int MEDIUM=1; private final int LARGE=2; private final int XLARGE=3; public Envelope(String n, String s, String a1, String a2, int i){ Container content = getContentPane(); JPanel mainPanel = new JPanel(); mainPanel.setLayout(new BoxLayout(mainPanel, BoxLayout.Y_AXIS)); mainPanel.add(new JLabel("John Doe")); mainPanel.add(new JLabel("FooBar str 14")); mainPanel.add(new JLabel("Newark, 45-99")); JPanel dataPanel = new JPanel(); dataPanel.setFont(new Font("sansserif", Font.PLAIN, 32)); //set size from i mainPanel.setLayout(new BoxLayout(mainPanel, BoxLayout.Y_AXIS)); mainPanel.setBackground(Color.ORANGE); mainPanel.add(new JLabel("Mr "+n+" "+s)); mainPanel.add(new JLabel(a1)); mainPanel.add(new JLabel(a2)); content.setSize(450, 600); content.setBackground(Color.ORANGE); content.add(mainPanel, BorderLayout.NORTH); content.add(dataPanel, BorderLayout.SOUTH); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); pack(); setVisible(true); } }

Read the article
CSS optimization - extra classes in dom or preprocessor-repetitive styling in css file?

- by anna.mi

I'm starting on a fairly large project and I'm considering the option of using LESS for pre-processing my css. the useful thing about LESS is that you can define a mixin that contains for example: .border-radius(@radius) { -webkit-border-radius: @radius; -moz-border-radius: @radius; -o-border-radius: @radius; -ms-border-radius: @radius; border-radius: @radius; } and then use it in a class declaration as .rounded-div { .border-radius(10px); } to get the outputted css as: .rounded-div { -webkit-border-radius: 10px; -moz-border-radius: 10px; -o-border-radius: 10px; -ms-border-radius: 10px; border-radius: 10px; } this is extremely useful in the case of browser prefixes. However this same concept could be used to encapsulate commonly-used css, for example: .column-container { overflow: hidden; display: block; width: 100%; } .column(@width) { float: left; width: @width; } and then use this mixin whenever i need columns in my design: .my-column-outer { .column-container(); background: red; } .my-column-inner { .column(50%); font-color: yellow; } (of course, using the preprocessor we could easily expand this to be much more useful, eg. pass the number of columns and the container width as variables and have LESS determine the width of each column depending on the number of columns and container width!) the problem with this is that when compliled, my final css file would have 100 such declarations, copy&pasted, making the file huge and bloated and repetitive. The alternative to this would be to use a grid system which has predefined classes for each column-layout option, eg .c-50 ( with a "float: left; width:50%;" definition ), .c-33, .c-25 to accomodate for a 2-column, 3-column and 4-column layout and then use these classes to my dom. i really mislike the idea of the extra classes, from experience it results to bloated dom (creating extra divs just to attach the grid classes to). Also the most basic tutorial for html/css would tell you that the dom should be separated from the styling - grid classes are styling related! to me, its the same as attaching a "border-radius-10" class to the .rounded-div example above! on the other hand, the large css file that would result from the repetitive code is also a disadvantage so i guess my question is, which one would you recommend? and which do you use? and, which solution is best for optimization? apart from the larger file size, has there even been any research on whether browser renders multiple classes faster than a large css file, or the other way round? tnx! i'd love to hear your opinion!

Read the article
Ada and 'The Book'

- by Phil Factor

The long friendship between Charles Babbage and Ada Lovelace created one of the most exciting and mysterious of collaborations ever to have resulted in a technological breakthrough. The fireworks that created by the collision of two prodigious mathematical and creative talents resulted in an invention, the Analytical Engine, which went on to change society fundamentally. However, beyond that, we just don't know what the bulk of their collaborative work was about:; it was done in strictest secrecy. Even the known outcome of their friendship, the first programmable computer, was shrouded in mystery. At the time, nobody, except close friends and family, had any idea of Ada Byron's contribution to the invention of the ‘Engine’, and how to program it. Her great insight was published in August 1843, under the initials AAL, standing for Ada Augusta Lovelace, her title then being the Countess of Lovelace. It was contained in a lengthy ‘note’ to her translation of a publication that remains the best description of Babbage's amazing Analytical Engine. The secret identity of the person behind those enigmatic initials was finally revealed by Prince de Polignac who, seventy years later, wrote to Ada's daughter to seek confirmation that her mother had, indeed, been the author of the brilliant sentences that described so accurately how Babbage's mechanical computer could be programmed with punch-cards. L.F. Menabrea's paper on the Analytical Engine first appeared in the 'Bibliotheque Universelle de Geneve' in October 1842, and Ada translated it anonymously for Taylor's 'Scientific Memoirs'. Charles Babbage was surprised that she had not written an original paper as she already knew a surprising amount about the way the machine worked. He persuaded her to at least write some explanatory notes. These notes ended up extending to four times the length of the original article and represented the first published account of how a machine could be programmed to perform any calculation. Her example of programming the Bernoulli sequence would have worked on the Analytical engine had the device’s construction been completed, and gave Ada an unassailable claim to have invented the art of programming. What was the reason for Ada's secrecy? She was the only legitimate child of Lord Byron, who was probably the best known celebrity of the age, so she was already famous. She was a senior aristocrat, with titles, a fortune in money and vast estates in the Midlands. She had political influence, and was the cousin of Lord Melbourne, who was the Prime Minister at that time. She was friendly with the young Queen Victoria. Her mathematical activities were a pastime, and not one that would be considered by others to be in keeping with her roles and responsibilities. You wouldn't dare to dream up a fictional heroine like Ada. She was dazzlingly beautiful and talented. She could speak several languages fluently, and play some musical instruments with professional skill. Contemporary accounts refer to her being 'accomplished in science, art and literature'. On top of that, she was a brilliant mathematician, a talent inherited from her mother, Annabella Milbanke. In her mother's circle of literary and scientific friends was Charles Babbage, and Ada's friendship with him dates from her teenage zest for Mathematics. She was one of the first people he'd ever met who understood what he had attempted to achieve with the 'Difference Engine', and with whom he could converse as intellectual equals. He arranged for her to have an education from the most talented academics in the country. Ada melted the heart of the cantankerous genius to the point that he became a faithful and loyal father-figure to her. She was one of the very few who could grasp the principles of the later, and very different, ‘Analytical Engine’ which was designed from the start to tackle a variety of tasks. Sadly, Ada Byron's life ended less than a decade after completing the work that assured her long-term fame, in November 1852. She was dying of cancer, her gambling habits had caused her to run up huge debts, she'd had more than one affairs, and she was being blackmailed. Her brilliant but unempathic mother was nursing her in her final illness, destroying her personal letters and records, and repaying her debts. Her husband was distraught but helpless. Charles Babbage, however, maintained his steadfast paternalistic friendship to the end. She appointed her loyal friend to be her executor. For years, she and Babbage had been working together on a secret project, known only as 'The Book'. We have a clue to what it was in a letter written by her nine years earlier, on 11th August 1843. It was a joint project by herself and Lord Lovelace, her husband, and was intended to involve Babbage's 'undivided energies'. It involved 'consulting your Engine' (it required Babbage’s computer). The letter gives no hint about the project except for the high-minded nature of its purpose, and its highly mathematical nature. From then on, the surviving correspondence between the two gives only veiled references to 'The Book'. There isn't much, since Babbage later destroyed any letters that could have damaged her reputation within the Establishment. 'I cannot spare the book today, which I am very sorry for. At the moment I want it for constant reference, but I think you can have it tomorrow' (Oct 1844) And 'I will send you the book directly, and you can say, when you receive it, how long you will want to keep it'. (Nov 1844) The two of them were obviously intent on the work: She writes, four years later, 'I have an engagement for Wednesday which will prevent me from attending to your wishes about the book' (Dec 1848). This was something that they both needed to work on, but could not do in parallel: 'I will send the book on Tuesday, and it can be left with you till Friday' (11 Feb 1849). After six years work, it had been so well-handled that it was beginning to fall apart: 'Don't forget the new cover you promised for the book. The poor book is very shabby and wants one' (20 Sept 1849). So what was going on? The word 'book' was not a code-word: it was a real book, probably a 'printer's blank', plain paper, but properly bound so printers and publishers could show off how the published work might look. The hints from the correspondence are of advanced mathematics. It is obvious that the book was travelling between them, back and forth, each one working on it for less than a week before passing it back. Ada and her husband were certainly involved in gambling large sums of money on the horses, and so most biographers have concluded that the three of them were trying to calculate the mathematical odds on the horses. This theory has three large problems. Firstly, Ada's original letter proposing the project refers to its high-minded nature. Babbage was temperamentally opposed to gambling and would scarcely have given so much time to the project, even though he was devoted to Ada. Secondly, Babbage would have very soon have realized the hopelessness of trying to beat the bookies. This sort of betting never attracts his type of intellectual background. The third problem is that any work on calculating the odds on horses would not need a well-thumbed book to pass back and forth between them; they would have not had to work in series. The original project was instigated by Ada, along with her husband, William King-Noel, 1st Earl of Lovelace. Charles Babbage was invited to join the project after the couple had come up with the idea. What could William have contributed? One might assume that William was a Bertie Wooster character, addicted only to the joys of the turf, but this was far from the truth. He was a scientist, a Cambridge graduate who was later elected to be a Fellow of the Royal Society. After Eton, he went to Trinity College, Cambridge. On graduation, he entered the diplomatic service and acted as secretary under Lord Nugent, who was Lord Commissioner of the Ionian Islands. William was very friendly with Babbage too, able to discuss scientific matters on equal terms. He was a capable engineer who invented a process for bending large timbers by the application of steam heat. He delivered a paper to the Institution of Civil Engineers in 1849, and received praise from the great engineer, Isambard Kingdom Brunel. As well as being Lord Lieutenant of the County of Surrey for most of Victoria's reign, he had time for a string of scientific and engineering achievements. Whatever the project was, it is unlikely that William was a junior partner. After Ada's death, the project disappeared. Then, two years later, Babbage, through one of his occasional outbursts of temper, demonstrated that he was able to decrypt one of the most powerful of secret codes, Vigenère's autokey cipher. All contemporary diplomatic and military messages used a variant of this cipher. Babbage had made three important discoveries, namely, the mathematical law of this cipher, the principle of the key periodicity, and the technique of the symmetry of position. The technique is now known as the Kasiski examination, also called the Kasiski test, but Babbage got there first. At one time, he listed amongst his future projects, the writing of a book 'The Philosophy of Decyphering', but it never came to anything. This discovery was going to change the course of history, since it was used to decipher the Russians’ military dispatches in the Crimean war. Babbage himself played a role during the Crimean War as a cryptographical adviser to his friend, Rear-Admiral Sir Francis Beaufort of the Admiralty. This is as much as we can be certain about in trying to make sense of the bulk of the time that Charles Babbage and Ada Lovelace worked together. Nine years of intensive work, involving the 'Engine' and a great deal of mathematics and research seems to have been lost: or has it? I've argued in the past http://www.simple-talk.com/community/blogs/philfactor/archive/2008/06/13/59614.aspx that the cracking of the Vigenère autokey cipher, was a fundamental motive behind the British Government's support and funding of the 'Difference Engine'. The Duke of Wellington, whose understanding of the military significance of being able to read enemy dispatches, was the most steadfast advocate of the project. If the three friends were actually doing the work of cracking codes by mathematical techniques that used the techniques of key periodicity, and symmetry of position (the use of a book being passed quickly to and fro is very suggestive), intending to then use the 'Engine' to do the routine cracking of each dispatch, then this is a rather different story. The project was Ada and William's idea. (William had served in the diplomatic service and would be familiar with the use of codes). This makes Ada Lovelace the initiator of a project which, by giving both Britain, and probably the USA, a diplomatic and military advantage in the second part of the Nineteenth century, changed world history. Ada would never have wanted any credit for cracking the cipher, and developing the method that rendered all contemporary military and diplomatic ciphering techniques nugatory; quite the reverse. And it is clear from the gaps in the record of the letters between the collaborators that the evidence was destroyed, probably on her request by her irascible but intensely honorable executor, Charles Babbage. Charles Babbage toyed with the idea of going public, but the Crimean war put an end to that. The British Government had a valuable secret, and intended to keep it that way. Ada and Charles had quite often discussed possible moneymaking projects that would fund the development of the Analytic Engine, the first programmable computer, but their secret work was never in the running as a potential cash cow. I suspect that the British Government was, even then, working on the concealment of a discovery whose value to the nation depended on it remaining so. The success of code-breaking in the Crimean war, and the American Civil war, led to the British and Americans subsequently giving much more weight and funding to the science of decryption. Paradoxically, this makes Ada's contribution even closer to the creation of Colossus, the first digital computer, at Bletchley Park, specifically to crack the Nazi’s secret codes.

Read the article

Search Results

Search found 10417 results on 417 pages for 'large'.

Page 104/417 | < Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111 | Next Page >

- by siberiano

- by Reid

- by Steven

- by Mark

- by Victor Bugarin

- by Griffin

- by Justin Kulesza

- by Foo Bar

- by Rob

- by anregen

- by Fake Name

- by WerkkreW

- by Senthil

- by Blue Warrior NFB

- by kubanczyk

- by nhbdesign

- by Fake Name

- by Frede

- by JLH

- by user12620111

- by Edward Cresswell

- by John Robertson

- by mike_hornbeck

- by anna.mi

- by Phil Factor

< Previous Page | 100 101 102 103 104 105 106 107 108 109 110 111 | Next Page >