ostringstream - Page 2 - Developer IT

C++ string sort like a human being?

- by Walter Nissen

I would like to sort alphanumeric strings the way a human being would sort them. I.e., "A2" comes before "A10", and "a" certainly comes before "Z"! Is there any way to do with without writing a mini-parser? Ideally it would also put "A1B1" before "A1B10". I see the question "Natural (human alpha-numeric) sort in Microsoft SQL 2005" with a possible answer, but it uses various library functions, as does "Sorting Strings for Humans with IComparer". Below is a test case that currently fails: #include <set> #include <iterator> #include <iostream> #include <vector> #include <cassert> template <typename T> struct LexicographicSort { inline bool operator() (const T& lhs, const T& rhs) const{ std::ostringstream s1,s2; s1 << toLower(lhs); s2 << toLower(rhs); bool less = s1.str() < s2.str(); std::cout<<s1.str()<<" "<<s2.str()<<" "<<less<<"\n"; return less; } inline std::string toLower(const std::string& str) const { std::string newString(""); for (std::string::const_iterator charIt = str.begin(); charIt!=str.end();++charIt) { newString.push_back(std::tolower(*charIt)); } return newString; } }; int main(void) { const std::string reference[5] = {"ab","B","c1","c2","c10"}; std::vector<std::string> referenceStrings(&(reference[0]), &(reference[5])); //Insert in reverse order so we know they get sorted std::set<std::string,LexicographicSort<std::string> > strings(referenceStrings.rbegin(), referenceStrings.rend()); std::cout<<"Items:\n"; std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, "\n")); std::vector<std::string> sortedStrings(strings.begin(), strings.end()); assert(sortedStrings == referenceStrings); }

Read the article

Can't insert a number into a C++ custom streambuf/ostream

- by 0xbe5077ed

I have written a custom std::basic_streambuf and std::basic_ostream because I want an output stream that I can get a JNI string from in a manner similar to how you can call std::ostringstream::str(). These classes are quite simple. namespace myns { class jni_utf16_streambuf : public std::basic_streambuf<char16_t> { JNIEnv * d_env; std::vector<char16_t> d_buf; virtual int_type overflow(int_type); public: jni_utf16_streambuf(JNIEnv *); jstring jstr() const; }; typedef std::basic_ostream<char16_t, std::char_traits<char16_t>> utf16_ostream; class jni_utf16_ostream : public utf16_ostream { jni_utf16_streambuf d_buf; public: jni_utf16_ostream(JNIEnv *); jstring jstr() const; }; // ... } // namespace myns In addition, I have made four overloads of operator<<, all in the same namespace: namespace myns { // ... utf16_ostream& operator<<(utf16_ostream&, jstring) throw(std::bad_cast); utf16_ostream& operator<<(utf16_ostream&, const char *); utf16_ostream& operator<<(utf16_ostream&, const jni_utf16_string_region&); jni_utf16_ostream& operator<<(jni_utf16_ostream&, jstring); // ... } // namespace myns The implementation of jni_utf16_streambuf::overflow(int_type) is trivial. It just doubles the buffer width, puts the requested character, and sets the base, put, and end pointers correctly. It is tested and I am quite sure it works. The jni_utf16_ostream works fine inserting unicode characters. For example, this works fine and results in the stream containing "hello, world": myns::jni_utf16_ostream o(env); o << u"hello, wor" << u'l' << u'd'; My problem is as soon as I try to insert an integer value, the stream's bad bit gets set, for example: myns::jni_utf16_ostream o(env); if (o.badbit()) throw "bad bit before"; // does not throw int32_t x(5); o << x; if (o.badbit()) throw "bad bit after"; // throws :( I don't understand why this is happening! Is there some other method on std::basic_streambuf I need to be implementing????

Read the article

c++ program debugged well with Cygwin4 (under Netbeans 7.2) but not with MinGW (under QT 4.8.1)

- by GoldenAxe

I have a c++ program which take a map text file and output it to a graph data structure I have made, I am using QT as I needed cross-platform program and GUI as well as visual representation of the map. I have several maps in different sizes (8x8 to 4096x4096). I am using unordered_map with a vector as key and vertex as value, I'm sending hash(1) and equal functions which I wrote to the unordered_map in creation. Under QT I am debugging my program with QT 4.8.1 for desktop MinGW (QT SDK), the program works and debug well until I try the largest map of 4096x4096, then the program stuck with the following error: "the inferior stopped because it received a signal from operating system", when debugging, the program halt at the hash function which used inside the unordered_map and not as part of the insertion state, but at a getter(2). Under Netbeans IDE 7.2 and Cygwin4 all works fine (debug and run). some code info: typedef std::vector<double> coordinate; typedef std::unordered_map<coordinate const*, Vertex<Element>*, container_hash, container_equal> vertexsContainer; vertexsContainer *m_vertexes (1) hash function: struct container_hash { size_t operator()(coordinate const *cord) const { size_t sum = 0; std::ostringstream ss; for ( auto it = cord->begin() ; it != cord->end() ; ++it ) { ss << *it; } sum = std::hash<std::string>()(ss.str()); return sum; } }; (2) the getter: template <class Element> Vertex<Element> *Graph<Element>::getVertex(const coordinate &cord) { try { Vertex<Element> *v = m_vertexes->at(&cord); return v; } catch (std::exception& e) { return NULL; } } I was thinking maybe it was some memory issue at the beginning, so before I was thinking of trying Netbeans I checked it with QT on my friend pc with a 16GB RAM and got the same error. Thanks.

Read the article

boost::asio::async_read_until problem

- by user368831

Hi again, I'm modify the boost asio echo example to use async_read_until to read the input word by word. Even though I am using async_read_until all the data sent seems to be read from the socket. Could someone please advise: #include <cstdlib> #include <iostream> #include <boost/bind.hpp> #include <boost/asio.hpp> using boost::asio::ip::tcp; class session { public: session(boost::asio::io_service& io_service) : socket_(io_service) { } tcp::socket& socket() { return socket_; } void start() { std::cout<<"starting"<<std::endl; boost::asio::async_read_until(socket_, buffer, ' ', boost::bind(&session::handle_read, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); } void handle_read(const boost::system::error_code& error, size_t bytes_transferred) { std::ostringstream ss; ss<<&buffer; std::string s = ss.str(); std::cout<<s<<std::endl; if (!error) { boost::asio::async_write(socket_, boost::asio::buffer(s), boost::bind(&session::handle_write, this, boost::asio::placeholders::error)); } else { delete this; } } void handle_write(const boost::system::error_code& error) { std::cout<<"handling write"<<std::endl; if (!error) { } else { delete this; } } private: tcp::socket socket_; boost::asio::streambuf buffer; }; class server { public: server(boost::asio::io_service& io_service, short port) : io_service_(io_service), acceptor_(io_service, tcp::endpoint(tcp::v4(), port)) { session* new_session = new session(io_service_); acceptor_.async_accept(new_session->socket(), boost::bind(&server::handle_accept, this, new_session, boost::asio::placeholders::error)); } void handle_accept(session* new_session, const boost::system::error_code& error) { if (!error) { new_session->start(); new_session = new session(io_service_); acceptor_.async_accept(new_session->socket(), boost::bind(&server::handle_accept, this, new_session, boost::asio::placeholders::error)); } else { delete new_session; } } private: boost::asio::io_service& io_service_; tcp::acceptor acceptor_; }; int main(int argc, char* argv[]) { try { if (argc != 2) { std::cerr << "Usage: async_tcp_echo_server <port>\n"; return 1; } boost::asio::io_service io_service; using namespace std; // For atoi. server s(io_service, atoi(argv[1])); io_service.run(); } catch (std::exception& e) { std::cerr << "Exception: " << e.what() << "\n"; } return 0; } Thanks!

Read the article

boost::asio::async_write problem

- by user368831

Hi, I'm trying to figure out how asynchronous reads and writes work in boost asio by manipulating the echo example. Currently, I have a server that should, when sent a sentence, respond with only the first word. However, the boost::asio::async_write never seems to complete even though the write handler is being called. Can someone please explain what's going on? Here's the code: #include <cstdlib> #include <iostream> #include <boost/bind.hpp> #include <boost/asio.hpp> using boost::asio::ip::tcp; class session { public: session(boost::asio::io_service& io_service) : socket_(io_service) { } tcp::socket& socket() { return socket_; } void start() { std::cout<<"starting"<<std::endl; boost::asio::async_read_until(socket_, buffer, ' ', boost::bind(&session::handle_read, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); } void handle_read(const boost::system::error_code& error, size_t bytes_transferred) { // std::ostringstream ss; // ss<<&buffer; char* c = new char[bytes_transferred]; //std::string s; buffer.sgetn(c,bytes_transferred); std::cout<<"data: "<< c<<" bytes: "<<bytes_transferred<<std::endl; if (!error) { boost::asio::async_write(socket_, boost::asio::buffer(c,bytes_transferred), boost::bind(&session::handle_write, this, boost::asio::placeholders::error)); } else { delete this; } } void handle_write(const boost::system::error_code& error) { std::cout<<"handling write"<<std::endl; if (!error) { } else { delete this; } } private: tcp::socket socket_; boost::asio::streambuf buffer; }; class server { public: server(boost::asio::io_service& io_service, short port) : io_service_(io_service), acceptor_(io_service, tcp::endpoint(tcp::v4(), port)) { session* new_session = new session(io_service_); acceptor_.async_accept(new_session->socket(), boost::bind(&server::handle_accept, this, new_session, boost::asio::placeholders::error)); } void handle_accept(session* new_session, const boost::system::error_code& error) { if (!error) { new_session->start(); new_session = new session(io_service_); acceptor_.async_accept(new_session->socket(), boost::bind(&server::handle_accept, this, new_session, boost::asio::placeholders::error)); } else { delete new_session; } } private: boost::asio::io_service& io_service_; tcp::acceptor acceptor_; }; int main(int argc, char* argv[]) { try { if (argc != 2) { std::cerr << "Usage: async_tcp_echo_server <port>\n"; return 1; } boost::asio::io_service io_service; using namespace std; // For atoi. server s(io_service, atoi(argv[1])); io_service.run(); } catch (std::exception& e) { std::cerr << "Exception: " << e.what() << "\n"; } return 0; } Thanks!

Read the article

42 passed to TerminateProcess, sometimes GetExitCodeProcess returns 0

- by Emil

After I get a handle returned by CreateProcess, I call TerminateProcess, passing 42 for the process exit code. Then, I use WaitForSingleObject for the process to terminate, and finally I call GetExitCodeProcess. None of the function calls report errors. The child process is an infinite loop and does not terminate on its own. The problem is that sometimes GetExitCodeProcess returns 42 for the exit code (as it should) and sometimes it returns 0. Any idea why? #include <string> #include <sstream> #include <iostream> #include <assert.h> #include <windows.h> void check_call( bool result, char const * call ); #define CHECK_CALL(call) check_call(call,#call); int main( int argc, char const * argv[] ) { if( argc>1 ) { assert( !strcmp(argv[1],"inf") ); for(;;) { } } int err=0; for( int i=0; i!=200; ++i ) { STARTUPINFO sinfo; ZeroMemory(&sinfo,sizeof(STARTUPINFO)); sinfo.cb=sizeof(STARTUPINFO); PROCESS_INFORMATION pe; char cmd_line[32768]; strcat(strcpy(cmd_line,argv[0])," inf"); CHECK_CALL((CreateProcess(0,cmd_line,0,0,TRUE,0,0,0,&sinfo,&pe)!=0)); CHECK_CALL((CloseHandle(pe.hThread)!=0)); CHECK_CALL((TerminateProcess(pe.hProcess,42)!=0)); CHECK_CALL((WaitForSingleObject(pe.hProcess,INFINITE)==WAIT_OBJECT_0)); DWORD ec=0; CHECK_CALL((GetExitCodeProcess(pe.hProcess,&ec)!=0)); CHECK_CALL((CloseHandle(pe.hProcess)!=0)); err += (ec!=42); } std::cout << err; return 0; } std::string get_last_error_str( DWORD err ) { std::ostringstream s; s << err; LPVOID lpMsgBuf=0; if( FormatMessageA( FORMAT_MESSAGE_ALLOCATE_BUFFER|FORMAT_MESSAGE_FROM_SYSTEM|FORMAT_MESSAGE_IGNORE_INSERTS, 0, err, MAKELANGID(LANG_NEUTRAL,SUBLANG_DEFAULT), (LPSTR)&lpMsgBuf, 0, 0) ) { assert(lpMsgBuf!=0); std::string msg; try { std::string((LPCSTR)lpMsgBuf).swap(msg); } catch( ... ) { } LocalFree(lpMsgBuf); if( !msg.empty() && msg[msg.size()-1]=='\n' ) msg.resize(msg.size()-1); if( !msg.empty() && msg[msg.size()-1]=='\r' ) msg.resize(msg.size()-1); s << ", \"" << msg << '"'; } return s.str(); } void check_call( bool result, char const * call ) { assert(call && *call); if( !result ) { std::cerr << call << " failed.\nGetLastError:" << get_last_error_str(GetLastError()) << std::endl; exit(2); } }

Read the article

Can you help me get my head around openssl public key encryption with rsa.h in c++?

- by Ben

Hi there, I am trying to get my head around public key encryption using the openssl implementation of rsa in C++. Can you help? So far these are my thoughts (please do correct if necessary) Alice is connected to Bob over a network Alice and Bob want secure communications Alice generates a public / private key pair and sends public key to Bob Bob receives public key and encrypts a randomly generated symmetric cypher key (e.g. blowfish) with the public key and sends the result to Alice Alice decrypts the ciphertext with the originally generated private key and obtains the symmetric blowfish key Alice and Bob now both have knowledge of symmetric blowfish key and can establish a secure communication channel Now, I have looked at the openssl/rsa.h rsa implementation (since I already have practical experience with openssl/blowfish.h), and I see these two functions: int RSA_public_encrypt(int flen, unsigned char *from, unsigned char *to, RSA *rsa, int padding); int RSA_private_decrypt(int flen, unsigned char *from, unsigned char *to, RSA *rsa, int padding); If Alice is to generate *rsa, how does this yield the rsa key pair? Is there something like rsa_public and rsa_private which are derived from rsa? Does *rsa contain both public and private key and the above function automatically strips out the necessary key depending on whether it requires the public or private part? Should two unique *rsa pointers be generated so that actually, we have the following: int RSA_public_encrypt(int flen, unsigned char *from, unsigned char *to, RSA *rsa_public, int padding); int RSA_private_decrypt(int flen, unsigned char *from, unsigned char *to, RSA *rsa_private, int padding); Secondly, in what format should the *rsa public key be sent to Bob? Must it be reinterpreted in to a character array and then sent the standard way? I've heard something about certificates -- are they anything to do with it? Sorry for all the questions, Best Wishes, Ben. EDIT: Coe I am currently employing: /* * theEncryptor.cpp * * * Created by ben on 14/01/2010. * Copyright 2010 __MyCompanyName__. All rights reserved. * */ #include "theEncryptor.h" #include <iostream> #include <sys/socket.h> #include <sstream> theEncryptor::theEncryptor() { } void theEncryptor::blowfish(unsigned char *data, int data_len, unsigned char* key, int enc) { // hash the key first! unsigned char obuf[20]; bzero(obuf,20); SHA1((const unsigned char*)key, 64, obuf); BF_KEY bfkey; int keySize = 16;//strlen((char*)key); BF_set_key(&bfkey, keySize, obuf); unsigned char ivec[16]; memset(ivec, 0, 16); unsigned char* out=(unsigned char*) malloc(data_len); bzero(out,data_len); int num = 0; BF_cfb64_encrypt(data, out, data_len, &bfkey, ivec, &num, enc); //for(int i = 0;i<data_len;i++)data[i]=out[i]; memcpy(data, out, data_len); free(out); } void theEncryptor::generateRSAKeyPair(int bits) { rsa = RSA_generate_key(bits, 65537, NULL, NULL); } int theEncryptor::publicEncrypt(unsigned char* data, unsigned char* dataEncrypted,int dataLen) { return RSA_public_encrypt(dataLen, data, dataEncrypted, rsa, RSA_PKCS1_OAEP_PADDING); } int theEncryptor::privateDecrypt(unsigned char* dataEncrypted, unsigned char* dataDecrypted) { return RSA_private_decrypt(RSA_size(rsa), dataEncrypted, dataDecrypted, rsa, RSA_PKCS1_OAEP_PADDING); } void theEncryptor::receivePublicKeyAndSetRSA(int sock, int bits) { int max_hex_size = (bits / 4) + 1; char keybufA[max_hex_size]; bzero(keybufA,max_hex_size); char keybufB[max_hex_size]; bzero(keybufB,max_hex_size); int n = recv(sock,keybufA,max_hex_size,0); n = send(sock,"OK",2,0); n = recv(sock,keybufB,max_hex_size,0); n = send(sock,"OK",2,0); rsa = RSA_new(); BN_hex2bn(&rsa->n, keybufA); BN_hex2bn(&rsa->e, keybufB); } void theEncryptor::transmitPublicKey(int sock, int bits) { const int max_hex_size = (bits / 4) + 1; long size = max_hex_size; char keyBufferA[size]; char keyBufferB[size]; bzero(keyBufferA,size); bzero(keyBufferB,size); sprintf(keyBufferA,"%s\r\n",BN_bn2hex(rsa->n)); sprintf(keyBufferB,"%s\r\n",BN_bn2hex(rsa->e)); int n = send(sock,keyBufferA,size,0); char recBuf[2]; n = recv(sock,recBuf,2,0); n = send(sock,keyBufferB,size,0); n = recv(sock,recBuf,2,0); } void theEncryptor::generateRandomBlowfishKey(unsigned char* key, int bytes) { /* srand( (unsigned)time( NULL ) ); std::ostringstream stm; for(int i = 0;i<bytes;i++){ int randomValue = 65 + rand()% 26; stm << (char)((int)randomValue); } std::string str(stm.str()); const char* strs = str.c_str(); for(int i = 0;bytes;i++)key[i]=strs[i]; */ int n = RAND_bytes(key, bytes); if(n==0)std::cout<<"Warning key was generated with bad entropy. You should not consider communication to be secure"<<std::endl; } theEncryptor::~theEncryptor(){}

Read the article

Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

Templated << friend not working when in interrelationship with other templated union types

- by Dwight

While working on my basic vector library, I've been trying to use a nice syntax for swizzle-based printing. The problem occurs when attempting to print a swizzle of a different dimension than the vector in question. In GCC 4.0, I originally had the friend << overloaded functions (with a body, even though it duplicated code) for every dimension in each vector, which caused the code to work, even if the non-native dimension code never actually was called. This failed in GCC 4.2. I recently realized (silly me) that only the function declaration was needed, not the body of the code, so I did that. Now I get the same warning on both GCC 4.0 and 4.2: LINE 50 warning: friend declaration 'std::ostream& operator<<(std::ostream&, const VECTOR3<TYPE>&)' declares a non-template function Plus the five identical warnings more for the other function declarations. The below example code shows off exactly what's going on and has all code necessary to reproduce the problem. #include <iostream> // cout, endl #include <sstream> // ostream, ostringstream, string using std::cout; using std::endl; using std::string; using std::ostream; // Predefines template <typename TYPE> union VECTOR2; template <typename TYPE> union VECTOR3; template <typename TYPE> union VECTOR4; typedef VECTOR2<float> vec2; typedef VECTOR3<float> vec3; typedef VECTOR4<float> vec4; template <typename TYPE> union VECTOR2 { private: struct { TYPE x, y; } v; struct s1 { protected: TYPE x, y; }; struct s2 { protected: TYPE x, y; }; struct s3 { protected: TYPE x, y; }; struct s4 { protected: TYPE x, y; }; struct X : s1 { operator TYPE() const { return s1::x; } }; struct XX : s2 { operator VECTOR2<TYPE>() const { return VECTOR2<TYPE>(s2::x, s2::x); } }; struct XXX : s3 { operator VECTOR3<TYPE>() const { return VECTOR3<TYPE>(s3::x, s3::x, s3::x); } }; struct XXXX : s4 { operator VECTOR4<TYPE>() const { return VECTOR4<TYPE>(s4::x, s4::x, s4::x, s4::x); } }; public: VECTOR2() {} VECTOR2(const TYPE& x, const TYPE& y) { v.x = x; v.y = y; } X x; XX xx; XXX xxx; XXXX xxxx; // Overload for cout friend ostream& operator<<(ostream& os, const VECTOR2<TYPE>& toString) { os << "(" << toString.v.x << ", " << toString.v.y << ")"; return os; } friend ostream& operator<<(ostream& os, const VECTOR3<TYPE>& toString); friend ostream& operator<<(ostream& os, const VECTOR4<TYPE>& toString); }; template <typename TYPE> union VECTOR3 { private: struct { TYPE x, y, z; } v; struct s1 { protected: TYPE x, y, z; }; struct s2 { protected: TYPE x, y, z; }; struct s3 { protected: TYPE x, y, z; }; struct s4 { protected: TYPE x, y, z; }; struct X : s1 { operator TYPE() const { return s1::x; } }; struct XX : s2 { operator VECTOR2<TYPE>() const { return VECTOR2<TYPE>(s2::x, s2::x); } }; struct XXX : s3 { operator VECTOR3<TYPE>() const { return VECTOR3<TYPE>(s3::x, s3::x, s3::x); } }; struct XXXX : s4 { operator VECTOR4<TYPE>() const { return VECTOR4<TYPE>(s4::x, s4::x, s4::x, s4::x); } }; public: VECTOR3() {} VECTOR3(const TYPE& x, const TYPE& y, const TYPE& z) { v.x = x; v.y = y; v.z = z; } X x; XX xx; XXX xxx; XXXX xxxx; // Overload for cout friend ostream& operator<<(ostream& os, const VECTOR3<TYPE>& toString) { os << "(" << toString.v.x << ", " << toString.v.y << ", " << toString.v.z << ")"; return os; } friend ostream& operator<<(ostream& os, const VECTOR2<TYPE>& toString); friend ostream& operator<<(ostream& os, const VECTOR4<TYPE>& toString); }; template <typename TYPE> union VECTOR4 { private: struct { TYPE x, y, z, w; } v; struct s1 { protected: TYPE x, y, z, w; }; struct s2 { protected: TYPE x, y, z, w; }; struct s3 { protected: TYPE x, y, z, w; }; struct s4 { protected: TYPE x, y, z, w; }; struct X : s1 { operator TYPE() const { return s1::x; } }; struct XX : s2 { operator VECTOR2<TYPE>() const { return VECTOR2<TYPE>(s2::x, s2::x); } }; struct XXX : s3 { operator VECTOR3<TYPE>() const { return VECTOR3<TYPE>(s3::x, s3::x, s3::x); } }; struct XXXX : s4 { operator VECTOR4<TYPE>() const { return VECTOR4<TYPE>(s4::x, s4::x, s4::x, s4::x); } }; public: VECTOR4() {} VECTOR4(const TYPE& x, const TYPE& y, const TYPE& z, const TYPE& w) { v.x = x; v.y = y; v.z = z; v.w = w; } X x; XX xx; XXX xxx; XXXX xxxx; // Overload for cout friend ostream& operator<<(ostream& os, const VECTOR4& toString) { os << "(" << toString.v.x << ", " << toString.v.y << ", " << toString.v.z << ", " << toString.v.w << ")"; return os; } friend ostream& operator<<(ostream& os, const VECTOR2<TYPE>& toString); friend ostream& operator<<(ostream& os, const VECTOR3<TYPE>& toString); }; // Test code int main (int argc, char * const argv[]) { vec2 my2dVector(1, 2); cout << my2dVector.x << endl; cout << my2dVector.xx << endl; cout << my2dVector.xxx << endl; cout << my2dVector.xxxx << endl; vec3 my3dVector(3, 4, 5); cout << my3dVector.x << endl; cout << my3dVector.xx << endl; cout << my3dVector.xxx << endl; cout << my3dVector.xxxx << endl; vec4 my4dVector(6, 7, 8, 9); cout << my4dVector.x << endl; cout << my4dVector.xx << endl; cout << my4dVector.xxx << endl; cout << my4dVector.xxxx << endl; return 0; } The code WORKS and produces the correct output, but I prefer warning free code whenever possible. I followed the advice the compiler gave me (summarized here and described by forums and StackOverflow as the answer to this warning) and added the two things that supposedly tells the compiler what's going on. That is, I added the function definitions as non-friends after the predefinitions of the templated unions: template <typename TYPE> ostream& operator<<(ostream& os, const VECTOR2<TYPE>& toString); template <typename TYPE> ostream& operator<<(ostream& os, const VECTOR3<TYPE>& toString); template <typename TYPE> ostream& operator<<(ostream& os, const VECTOR4<TYPE>& toString); And, to each friend function that causes the issue, I added the <> after the function name, such as for VECTOR2's case: friend ostream& operator<< <> (ostream& os, const VECTOR3<TYPE>& toString); friend ostream& operator<< <> (ostream& os, const VECTOR4<TYPE>& toString); However, doing so leads to errors, such as: LINE 139: error: no match for 'operator<<' in 'std::cout << my2dVector.VECTOR2<float>::xxx' What's going on? Is it something related to how these templated union class-like structures are interrelated, or is it due to the unions themselves? Update After rethinking the issues involved and listening to the various suggestions of Potatoswatter, I found the final solution. Unlike just about every single cout overload example on the internet, I don't need access to the private member information, but can use the public interface to do what I wish. So, I make a non-friend overload functions that are inline for the swizzle parts that call the real friend overload functions. This bypasses the issues the compiler has with templated friend functions. I've added to the latest version of my project. It now works on both versions of GCC I tried with no warnings. The code in question looks like this: template <typename SWIZZLE> inline typename EnableIf< Is2D< typename SWIZZLE::PARENT >, ostream >::type& operator<<(ostream& os, const SWIZZLE& printVector) { os << (typename SWIZZLE::PARENT(printVector)); return os; } template <typename SWIZZLE> inline typename EnableIf< Is3D< typename SWIZZLE::PARENT >, ostream >::type& operator<<(ostream& os, const SWIZZLE& printVector) { os << (typename SWIZZLE::PARENT(printVector)); return os; } template <typename SWIZZLE> inline typename EnableIf< Is4D< typename SWIZZLE::PARENT >, ostream >::type& operator<<(ostream& os, const SWIZZLE& printVector) { os << (typename SWIZZLE::PARENT(printVector)); return os; }

Developer IT