awk formatting - Page 55

cygwin: svn does not work anymore

- by mtim

All of the sudden svn stopped working in cygwin installation on windows xp. when I execute svn binary, nothing happens, svn process does not even show up in the Task Manager. I've reinstalled svn but it did not help (the last resort would be to uninstall cygwin itself). Everything else in cygwin works fine: awk,python,sed,more,less,tail and etc. here is what is happening ... mt@s022 ~ $ which svn /usr/bin/svn mt@s022 ~ $ svn --version mt@s022 ~ $ svn status mt@s022 ~ $ svn info mt@s022 ~ $

Read the article

how to get list of databases that a user owns?

- by chiggsy

Id like to find out ( and delete ) all the databases owned by an owner in postgres 8.4.3 I'm new to postgres also, and although I can , and will , read the whole manual today i was forced to use for i in $(psql -l |grep novicedba | awk '{print $1}') psql -d postgres -c " drop database \"$i\"" out of desperation. What's the postgresql way to do this?

Read the article

Equivalent of print "-"x20 of perl in Bash or command line

- by khmarbaise

In perl you can simply write print "-" x 20 and you get a line with dashes...but i need the same thing in bash/commandline on linux without perl/(g)awk etc. any ideas? The intention is to use it in the -exec of the find command and i want to prevent using simple echo "---------" ...

Read the article

Shell Script - print selected columns

- by teepusink

Hi, I have a txt file with columns separated by tabs and based on that file, I want to create a new file that only contains information from some of the columns. This is what I have now awk '{ print $1, $5 }' filename newfilename That works except that when column 5 contains spaces e.g 123 Street, only 123 shows up and the street is considered as another column. How can I achieve what I'm trying to do? Thanks, Tee

Read the article

How can I check for a file size and add that result in an Excel spreadsheet in Perl?

- by Nano Taboada

Currently I monitoring a particular file with a simple shell one-liner: filesize=$(ls -lah somefile | awk '{print $5}') I'm aware that Perl has some nice modules to deal with Excel files so the idea is to, let's say, run that check daily, perhaps with cron, and write the result on a spreadsheet for further statistical use.

Read the article

Perl chomp backwording the string

- by joe

my $cmd = "grep -h $text $file2 $file1 | tail -1 | awk '{print \$NF }' "; my $port_number; $port_number =`$cmd`; print "port No : ==$port_number=="; the output is : "port No :== 2323 == and i tried chomp its not working

Read the article

Unexpected variable update when using bash's $(( )) operator for arithmetic

- by philo

I'm trying to trim a few lines from a file. I know exactly how many lines to remove (say, 2 from the top), but not how many total lines are in the file. So I tried this straightforward solution: $ wc -l $FILENAME 119559 my_filename.txt $ LINES=$(wc -l $FILENAME | awk '{print $1}') $ tail -n $(($LINES - 2)) $FILENAME > $OUTPUT_FILE The output is fine, but what happened to LINES?? $ wc -l $OUTPUT_FILE 119557 my_output_file.txt $ echo $LINES 107 Hoping someone can help me understand what's going on.

Read the article

Linux shell sum columns

- by Yassin

How can I sum (using Linux shell) numbers in a column? If possible, I don't want to use powerful tools like awk or perl. I want something like giveMeNumber | sum

Read the article

UNIX Programs (Shell Scripting) [closed]

- by atif089

Hi, I have an exam tomorrow and I need some help with these programs. Or if you can tell me where I can get these. Write a program which uses grep to search a file for a pattern and display search patterns on standard output Write an awk program to print only odd numbered lines of a file. Write a program to open the command ls and give the output to the command through which we count the number of files Thank You :)

Read the article

Is it possible to change names of Doxygen generated html files?

- by Dmitriy

We are going to publish API documentation on our web site. The documentation is generated by Doxygen from sources. The problem is that Doxygen generate weird file names (which is no so good for SEO). For example, for source file RO4_Languages.h Doxygen generate _r_o4___languages_8h.htm. Is it possible to change name of generated files? PS: I know that it possible to change output using 3rd party tools/scripts (awk/sed/perl/etc).

Read the article

I need to cut a portion of a string in linux

- by Abeed Salam

I have a file in a folder like this: installer-x86_64-XXX.XX-diagnostic.run The XXX.XX is a version number and I need the version number only. How to do it in linux? I have this code: #!/bin/bash current_ver=$(find /mnt/builds/current -name '*.run'|awk -F/ '{print $NF}') So this gives me just the name of the file correctly (minus the location, which I dont want). But how do I only get the XXX.XX version number into a variable such as $version

Read the article

Filtering Linux command output

- by Raajkumar

Hi, I need to get a row based on column value just like querying a database. I have a command output like this, Name ID Mem VCPUs State Time(s) Domain-0 0 15485 16 r----- 1779042.1 prime95-01 512 1 -b---- 61.9 Here I need to list only those rows where state is "r". Something like this, Domain-0 0 15485 16 r----- 1779042.1 I have tried using "grep" and "awk" but still I am not able to succeed. Please help me on this issue. Regards, Raaj

Read the article

Removing spaces from columns of a CSV file in bash

- by vikas ramnani

I have a CSV file in which every column contains unnecessary spaces(or tabs) after the actual value. I want to create a new CSV file removing all the spaces using bash. For example One line in input CSV file value1 ;value2 ;value3 ;value4 same line in output csv file should be value1;value2;value3;value4 I tried using awk to trim each column but it didnt work. Can anyone please help me on this ? Thanks in advance :)

Read the article

What's the simplest way to git conflicted files?

- by inger

I just need a plain list of conflicted files. Is there anything simpler than: git ls-files -u | cut -f 2 | sort | uniq or git ls-files -u | awk '{print $4}' | sort | uniq ?

Read the article

Ubuntu Bash Script changing file names chronologicaly

- by Manifold

I have this bash script where I am trying to change all *.txt files in a directory to their date of last modification. This is the script: #!/bin/bash # Renames the .txt files to the date modified # FROM: foo.txt Created on: 2012-04-18 18:51:44 # TO: 20120418_185144.txt for i in *.txt do mod_date=$(stat --format %y "$i"|awk '{print $1"_"$2}'|cut -f1 -d'.'|sed 's/[: -]//g') mv "$i" "$mod_date".txt done The error I am getting is: renamer.sh: 6: renamer.sh: Syntax error: word unexpected (expecting "do") Any help would be greatly appreciated. Thank you for your time.

Read the article

Modify an exact line number of a text file and add certain string after the end of the line

- by Werner

Hi, given a plain text file, how can I do, using bash, awk, sed, etc, to, at the line number NLINE, add the string STR, just n spaces after the end of the line? So, for instance, if this is the line NLINE: date march 13th for 5 spaces, we get date march 13th STR and one gets the modification in a new file. Thanks

Read the article

Extract IDs from CSS

- by nosuchip

I've the CSS file with many entry like id1, #id2, #id3, #id4 { ... } id3, #id2 { ... } id2, #id4 { ... } I want to extract list of unique IDs using command line tools (msys). Unique means any entry in list presented only once. How? PS: I know how doing it using python, but what about awk/sed/cat?

Read the article

Blogging: MacJournal & Windows Live Writer

- by Jeff Julian

One thing I have learned about using a Mac is that Apple does not produce very many free applications. The ones they do are typically not full featured and to get the full feature you need to buy their upgraded version. For example, when it comes to Photo editing and cataloging, iPhoto is not a solution for large files or RAW processing, you need Aperture which is a couple hundred dollars. I am not complaining because I like it when an application has a product team who generates revenue with it, because the chance of them being around longer seems to be higher. What is my point in all of this? Apple does not produce a product for blogging/journaling like Microsoft does with Windows Live Writer. I love Windows Live Writer. If you are on a Windows box, it is a required tool in your toolbox if you publish to a blog. The cleanness of the interface, integration with most blog APIs and ability to Save Local or Publish as a Draft make capturing your thoughts for publishing now or later a very easy task. My hope is that Microsoft will port it to the Mac, but I don’t believe that will ever happen as it is not a revenue generating product and Microsoft doesn’t often port to a Mac besides Remote Desktop Connection and MSN Messenger. For my configuration I used to use only Boot Camp on my two MacBook Pros I have owned in the past three years because I’m a PC, but after four different rebuilds (not typically due to Windows, but Boot Camp or Parallels) I decided to move off the Boot Camp platform and to VMWare Fusion. This is a complete separate blog post that I should spec out in MacJournal, but I now always boot into the Mac OS and use Fusion for my AJI Software VM or my client’s VMs. It just seems to work better for me and I have a very nice way to backup my Windows environments with VMWare.Needless to say, there was need in my new laptop configuration for a blogging tool that worked natively on a Mac. I don’t like to power up my machine for writing a document or working on an image and need to boot up a VM just so I can use Windows. Some would say why not just use a Windows laptop and put the MBP on eBay? It is just a preference and right now, I like the Mac OS for day to day work. So in comes MacJournal, part of the current MacHeist package for $19.95 (MacJournal is normally $39.95). This product is definitely not WLW, but WLW is missing some features I like in MacJournal. I hope the price point comes down on MacJournal cause I could see paying $19.95 for it, but it is always hard for me to buy a piece of software for $39.95 when I can use something else. But I am a cheapskate when it comes to software packages. I suggest if you are using a Mac to drop what you are doing pick up the MacHeist bundle today before it is over, but if you are reading this later, than download the trial and see if MacJournal is a solution for you. If you have any other suggestions that are as nice or cheaper, please comment.Product LinksMacJournal by Mariners Software $39.95 (part of MacHeist bundle for $19.95 with only one day left)Windows Live Writer by MicrosoftThis post was created using MacJournal.[Update: The joys of formatting. Make sure if you are a Geekswithblogs.net member that you use this configuration to setup the Metablog formatting of paragraphs correctly]

Read the article

SQL SERVER – Auto Complete and Format T-SQL Code – Devart SQL Complete

- by pinaldave

Some people call it laziness, some will call it efficiency, some think it is the right thing to do. At any rate, tools are meant to make a job easier, and I like to use various tools. If we consider the history of the world, if we all wanted to keep traditional practices, we would have never invented the wheel. But as time progressed, people wanted convenience and efficiency, which then led to laziness. Wanting a more efficient way to do something is not inherently lazy. That’s how I see any efficiency tools. A few days ago I found Devart SQL Complete. It took less than a minute to install, and after installation it just worked without needing any tweaking. Once I started using it I was impressed with how fast it formats SQL code – you can write down any terms or even copy and paste. You can start typing right away, and it will complete keywords, object names, and fragmentations. It completes statement expressions. How many times do we write insert, update, delete? Take this example: to alter a stored procedure name, we don’t remember the code written in it, you have to write it over again, or go back to SQL Server Studio Manager to create and alter which is very difficult. With SQL Complete , you can write “alter stored procedure,” and it will finish it for you, and you can modify as needed. I love to write code, and I love well-written code. When I am working with clients, and I find people whose code have not been written properly, I feel a little uncomfortable. It is difficult to deal with code that is in the wrong case, with no line breaks, no white spaces, improper indents, and no text wrapping. The worst thing to encounter is code that goes all the way to the right side, and you have to scroll a million times because there are no breaks or indents. SQL Complete will take care of this for you – if a developer is too lazy for proper formatting, then Devart’s SQL formatter tool will make them better, not lazier. SQL Management Studio gives information about your code when you hover your mouse over it, however SQL Complete goes further in it, going into the work table, and the current rate idea, too. It gives you more information about the parameters; and last but not least, it will just take you to the help file of code navigation. It will open object explorer in a document viewer. You can start going through the various properties of your code – a very important thing to do. Here are are interesting Intellisense examples: 1) We are often very lazy to expand *however, when we are using SQL Complete we can just mouse over the * and it will give us all the the column names and we can select the appropriate columns. 2) We can put the cursor after * and it will give us option to expand it to all the column names by pressing the Tab key. 3) Here is one more Intellisense feature I really liked it. I always alias my tables and I always select the alias with special logic. When I was using SQL Complete I selected just a tablename (without schema name) and…(just like below image) … and it autocompleted the schema and alias name (the way I needed it). I believe using SQL Complete we can work faster. It supports all versions of SQL Server, and works SQL formatting. Many businesses perform code review and have code standards, so why not use an efficiency tool on everyone’s computer and make sure the code is written correctly from the first time? If you’re interested in this tool, there are free editions available. If you like it, you can buy it. I bought it because it works. I love it, and I want to hear all your opinions on it, too. You can get the product for FREE. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article

MSMQ messages using HTTP just won't get delivered

- by John Breakwell

I'm starting off the blog with a discussion of an unusual problem that has hit a couple of my customers this month. It's not a problem you'd expect to bump into and the solution is potentially painful. Scenario You want to make use of the HTTP protocol to send MSMQ messages from one machine to another. You have installed HTTP support for MSMQ and have addressed your messages correctly but they will not leave the outgoing queue. There is no configuration for HTTP support - setup has already done all that for you (although you may want to check the most recent "Installation of the MSMQ HTTP Support Subcomponent" section of MSMQINST.LOG to see if anything DID go wrong) - so you can't tweak anything. Restarting services and servers makes no difference - the messages just will not get delivered. The problem is documented and resolved by Knowledgebase article 916699 "The message may not be delivered when you use the HTTP protocol to send a message to a server that is running Message Queuing 3.0". It is unlikely that you would be able to resolve the problem without the assistance of PSS because there are no messages that can be seen to assist you and only access to the source code exposes the root cause. As this communication is over HTTP, the IIS logs would be a good place to start. POST entries are logged which show that connectivity is working and message delivery is being attempted: #Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2006-09-12 12:11:29 #Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status 2006-09-12 12:12:12 W3SVC1 10.1.17.219 POST /msmq/private$/test - 80 - 10.2.200.3 - 200 0 0 If you capture the traffic with Network Monitor you can see the POST being sent to the server but you also see a response being returned to the client: HTTP: Response to Client; HTTP/1.1; Status Code = 500 - Internal Server Error "Internal Server Error" means we can probably stop looking at IIS and instead focus on the Message Queuing ISAPI extension (Mqise.dll). MSMQ 3.0 (Windows XP and Windows Server 2003) comes with error logging enabled by default but the log files are in binary format - MSMQ 2.0 generated logging in plain text. The symbolic information needed for formatting the files is not currently publicly available so log files have to be sent in to Microsoft PSS. Although this does mean raising a support case, formatting the log files to text and returning them to the customer shouldn't take long. Obviously the engineer analyses them for you - I just want to point out that you can see the logging output in text format if you want it. The important entries in the log for this problem are: [7]b48.928 09/12/2006-13:20:44.552 [mqise GetNetBiosNameFromIPAddr] ERROR:Failed to get the NetBios name from the DNS name, error = 0xea [7]b48.928 09/12/2006-13:20:44.552 [mqise RPCToServer] ERROR:RPC call R_ProcessHTTPRequest failed, error code = 1702(RPC_S_INVALID_BINDING) which allow a Microsoft escalation engineer to check the MQISE source code to see what is going wrong. This problem according to the article occurs when the extension tries to bind to the local MSMQ service after the extension receives a POST request that contains an MSMQ message. MSMQ resolves the server name by using the DNS host name but the extension cannot bind to the service because the buffer that MSMQ uses to resolve the server name is too small - server names that are exactly 15 characters long will not fit. RPC exception 0x6a6 (RPC_S_INVALID_BINDING) occurs in the W3wp.exe process but the exception is handled and so you do not receive an error message. The workaround is to rename the MSMQ server to something less than 15 characters. If the problem has only just been noticed in a production environment - an application may have been modified to get through a newly-implemented firewall, for example - then renaming is going to be an issue. Other applications may need to be reinstalled or modified if server names are hard-coded or stored in the registry. The renaming may also break a company naming convention where the name is built up from something like location+department+number. If you want to learn more about MSMQ logging then check out Chapter 15 of the MSMQ FAQ. In fact, even if you DON'T want to learn anything about MSMQ logging you should read the FAQ anyway as there is a huge amount of useful information on known issues and the like.

Read the article

Error installing pkgconfig via macports

- by Greg K

I installed Macports 1.8.2 from a DMG. That seemed to install fine. I ran sudo port selfupdate to make sure my ports tree was current. I then tried to install bindfs as I want to mount some directories in my OS X file system (like you can do with mount --bind in linux). pkgconfig and macfuse are two dependencies of bindfs. I had trouble installing bindfs due to errors installing pkgconfig, so I tried to just install pkgconfig, here's the debug output from sudo port install pkgconfig: $ sudo port -d install pkgconfig DEBUG: Found port in file:///opt/local/var/macports/sources/rsync.macports.org/release/ports/devel/pkgconfig DEBUG: Changing to port directory: /opt/local/var/macports/sources/rsync.macports.org/release/ports/devel/pkgconfig DEBUG: OS Platform: darwin DEBUG: OS Version: 10.3.0 DEBUG: Mac OS X Version: 10.6 DEBUG: System Arch: i386 DEBUG: setting option os.universal_supported to yes DEBUG: org.macports.load registered provides 'load', a pre-existing procedure. Target override will not be provided DEBUG: org.macports.unload registered provides 'unload', a pre-existing procedure. Target override will not be provided DEBUG: org.macports.distfiles registered provides 'distfiles', a pre-existing procedure. Target override will not be provided DEBUG: adding the default universal variant DEBUG: Reading variant descriptions from /opt/local/var/macports/sources/rsync.macports.org/release/ports/_resources/port1.0/variant_descriptions.conf DEBUG: Requested variant darwin is not provided by port pkgconfig. DEBUG: Requested variant i386 is not provided by port pkgconfig. DEBUG: Requested variant macosx is not provided by port pkgconfig. ---> Computing dependencies for pkgconfig DEBUG: Executing org.macports.main (pkgconfig) DEBUG: Skipping completed org.macports.fetch (pkgconfig) DEBUG: Skipping completed org.macports.checksum (pkgconfig) DEBUG: Skipping completed org.macports.extract (pkgconfig) DEBUG: Skipping completed org.macports.patch (pkgconfig) ---> Configuring pkgconfig DEBUG: Using compiler 'Mac OS X gcc 4.2' DEBUG: Executing org.macports.configure (pkgconfig) DEBUG: Environment: CFLAGS='-O2 -arch x86_64' CPPFLAGS='-I/opt/local/include' CXXFLAGS='-O2 -arch x86_64' MACOSX_DEPLOYMENT_TARGET='10.6' CXX='/usr/bin/g++-4.2' F90FLAGS='-O2 -m64' LDFLAGS='-L/opt/local/lib' OBJC='/usr/bin/gcc-4.2' FCFLAGS='-O2 -m64' INSTALL='/usr/bin/install -c' OBJCFLAGS='-O2 -arch x86_64' FFLAGS='-O2 -m64' CC='/usr/bin/gcc-4.2' DEBUG: Assembled command: 'cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_devel_pkgconfig/work/pkg-config-0.23" && ./configure --prefix=/opt/local --enable-indirect-deps --with-pc-path=/opt/local/lib/pkgconfig:/opt/local/share/pkgconfig' checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... no checking for mawk... no checking for nawk... no checking for awk... awk checking whether make sets $(MAKE)... no checking whether to enable maintainer-specific portions of Makefiles... no checking build system type... i386-apple-darwin10.3.0 checking host system type... i386-apple-darwin10.3.0 checking for style of include used by make... none checking for gcc... /usr/bin/gcc-4.2 checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Error: Target org.macports.configure returned: configure failure: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_devel_pkgconfig/work/pkg-config-0.23" && ./configure --prefix=/opt/local --enable-indirect-deps --with-pc-path=/opt/local/lib/pkgconfig:/opt/local/share/pkgconfig " returned error 77 DEBUG: Backtrace: configure failure: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_devel_pkgconfig/work/pkg-config-0.23" && ./configure --prefix=/opt/local --enable-indirect-deps --with-pc-path=/opt/local/lib/pkgconfig:/opt/local/share/pkgconfig " returned error 77 while executing "$procedure $targetname" Warning: the following items did not execute (for pkgconfig): org.macports.activate org.macports.configure org.macports.build org.macports.destroot org.macports.install Error: Status 1 encountered during processing. I have only recently installed Xcode 3.2.2 (prior to installing macports). Am I right in thinking this the issue here: configure: error: C compiler cannot create executables

Read the article

Excel Template Teaser

- by Tim Dexter

In lieu of some official documentation I'm in the process of putting together some posts on the new 10.1.3.4.1 Excel templates. No more HTML, maskerading as Excel; far more flexibility than Excel Analyzer and no need to write complex XSL templates to create the same output. Multi sheet outputs with macros and embeddable XSL commands are here. Their capabilities are pretty extensive and I have not worked on them for a few years since I helped put them together for EBS FSG users, so Im back on the learning curve. Let me say up front, there is no template builder, its a completely manual process to build them but, the results can be fantastic and provide yet another 'superstar' opportunity for you. The templates can take hierarchical XML data and walk the structure much like an RTF template. They use named cells/ranges and a hidden sheet to provide the rendering engine the hooks to drop the data in. As a taster heres the data and output I worked with on my first effort: <EMPLOYEES> <LIST_G_DEPT> <G_DEPT> <DEPARTMENT_ID>10</DEPARTMENT_ID> <DEPARTMENT_NAME>Administration</DEPARTMENT_NAME> <LIST_G_EMP> <G_EMP> <EMPLOYEE_ID>200</EMPLOYEE_ID> <EMP_NAME>Jennifer Whalen</EMP_NAME> <EMAIL>JWHALEN</EMAIL> <PHONE_NUMBER>515.123.4444</PHONE_NUMBER> <HIRE_DATE>1987-09-17T00:00:00.000-06:00</HIRE_DATE> <SALARY>4400</SALARY> </G_EMP> </LIST_G_EMP> <TOTAL_EMPS>1</TOTAL_EMPS> <TOTAL_SALARY>4400</TOTAL_SALARY> <AVG_SALARY>4400</AVG_SALARY> <MAX_SALARY>4400</MAX_SALARY> <MIN_SALARY>4400</MIN_SALARY> </G_DEPT> ... </LIST_G_DEPT> </EMPLOYEES> Structured XML coming from a data template, check out the data template progression post. I can then generate the following binary XLS file. There are few cool things to notice in this output. DEPARTMENT-EMPLOYEE master detail output. Not easy to do in the Excel analyzer. Date formatting - this is using an Excel function. Remember BIP generates XML dates in the canonical format. I have formatted the other data in the template using native Excel functionality Salary Total - although in the data I have calculated this in the template Conditional formatting - this is handled by Excel based on the incoming data Bursting department data across sheets and using the department name for the sheet name. This alone is worth the wait! there's more, but this is surely enough to whet your appetite. These new templates are already tucked away in EBS R12 under controlled release by the GL team and have now come to the BIEE and standalone releases in the 10.1.3.4.1+ rollup patch. For the rest of you, its going to be a bit of a waiting game for the relevant teams to uptake the latest BIP release. Look out for more soon with some explanation of how they work and how to put them together!

Read the article

Silverlight Cream for March 08, 2011 -- #1056

- by Dave Campbell

In this Issue: Joost van Schaik, Manas Patnaik, Kevin Hoffman, Jesse Liberty, Deborah Kurata, Dhananjay Kumar, Dennis Delimarsky, Samuel Jack, Peter Kuhn, WindowsPhoneGeek, and Jfo. Above the Fold: Silverlight: "How I let the trees grow" Peter Kuhn WP7: "Simple Windows Phone 7 / Silverlight drag/flick behavior" Joost van Schaik Shoutouts: SilverlightShow has their top 5 from last week posted, plus the ECOContest is ready to be voted on: SilverlightShow for Feb 28 - March 06, 2011 Drew DeVault is a young man involved with the Microsoft Student Insiders. He gave a WP7 presentation at RMTT and has posted his material: Post-Session: Windows Phone 7 @ RMTT Rui Marinho has an app in the ECO Contest called Forest Findr. is based on the BIng Map Control for silverlight and Sql Spatial data, and helps you find Forests and get geolocated pictures and wikipedia information, and has a post up with a bunch of info on it here: Forest Findr. my entry on the SilverlightShow EcoContest From SilverlightCream.com: Simple Windows Phone 7 / Silverlight drag/flick behavior Joost van Schaik has a behavior that makes *anything* draggable and 'flickable' in WP7 ... read the intro, scroll to the bottom to watch the demo, and then grab up the code... cool stuff, Joost! Data Aggregation Using Presentation Model in RIA and Silverlight 4 Manas Patnaik sent me a link to his blog, and it appears he's got lots of Silverlight goodness out there so you'll be hearing more about him. This first post is on the Presentation Model in RIA and Silverlight 4... good discussion, diagrams and code... good job, Manas! WP7 for iPhone and Android Developers - Advanced UI Kevin Hoffman has part 3 of an ambitious 12-part tutorial series up on WP7 development ... this go-around is concentrating on Advanced UI - Panorama/Pivot controls, DataBinding, ObservableCollections, and Converters... whew! Sterling DB on top of Isolated Storage – 2 Jesse Liberty has part 2 of his Sterling series up... this time setting up the database in App.xaml so it can be used for dealing with tombstoning. Silverlight Charting: Formatting the Tick Marks Deborah Kurata's next chart tutorial is all about showing you how to continue to dress up your charts.. this time by formatting the tick marks... if you don't know what that is... check out the first image in the post. Stored Procedure in WCF Data Service Dhananjay Kumar has a very nice tutorial up on using a stored proc with WCF Data Services... I happen to know someone working on just that at this time. If you have this in mind, here's a step-by-step guide to getting it done. Windows Phone 7 – Episode 5 – Pages Dennis Delimarsky has part 5 of his WP7 tutorial series up and is discussing Pages in this 17 minute video. Unpacking Simon Squared: My mini framework-independent animation library Samuel Jack has not only Open-Sourced the WP7 game he built and blogged about, but he's now explaining some of the structure of the game in posts such as this one about the animation library he wrote that his game is built on. How I let the trees grow Peter Kuhn shares with us the code he used for the tree animation in his ECO Contest entry. There's a lot to learn in this post about performance ... the fully-animated tree has about 20K elements... 5K branches and 20K leaves... check it out. WP7 ToastPrompt in depth WindowsPhoneGeek takes a deep dive into the ToastPrompt control in the Coding4fun Toolkit... everything you need to completely use the control including sample code. Beware the loaded event Jfo talks about another frustration point she had with WP7 development, and that is around the use of the loaded event... read these tips from someone that's been there. Stay in the 'Light! Twitter SilverlightNews | Twitter WynApse | WynApse.com | Tagged Posts | SilverlightCream Join me @ SilverlightCream | Phoenix Silverlight User Group Technorati Tags: Silverlight Silverlight 3 Silverlight 4 Windows Phone MIX10

Read the article

Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article

Converting .docx to pdf (or .doc to pdf, or .doc to odt, etc.) with libreoffice on a webserver on the fly using php

- by robertphyatt

Ok, so I needed to convert .docx files to .pdf files on the fly, but none of the free php libraries that were available let me do it on my server (a webservice was not good enough). Basically either I needed to pay for a library (and have it maybe suck) or just deal with the free ones that didn't convert the formatting well enough. Not good enough! I found that LibreOffice (OpenOffice's successor) allows command line conversion using the LibreOffice conversion engine (which DID preserve the formatting like I wanted and generally worked great). I loaded the latest version of Ubuntu (http://www.ubuntu.com/download/ubuntu/download) onto my Virtual Box (https://www.virtualbox.org/wiki/Downloads) on my computer and found that I was able to easily convert files using the commandline like this: libreoffice --headless -convert-to pdf fileToConvert.docx -outdir output/path/for/pdf I thought: sweet...but I don't have admin rights on my host's web server. I tried to use a "portable" version of LibreOffice that I obtained from http://portablelinuxapps.org/ but I was unable to get it to work on my host's webserver, because my host's webserver didn't have all the dependencies (Dependency Hell! http://en.wikipedia.org/wiki/Dependency_hell) I was at a loss of how to make it work, until I ran across a cool project made by a Ph.D. student (Philip J. Guo) at Stanford called CDE: http://www.stanford.edu/~pgbovine/cde.html I will let you look at his explanations of how it works (I followed what he did in http://www.youtube.com/watch?feature=player_embedded&v=6XdwHo1BWwY, starting at about 32:00 as well as the directions on his site), but in short, it allows one to avoid dependency hell by copying all the files used when you run certain commands, recreating the linux environment where the command worked. I was able to use this to run LibreOffice without having to resort to someone's portable version of it, and it worked just like it did when I did it on Ubuntu with the command above, with a tweak: I needed to run the wrapper of LibreOffice the CDE generated. So, below is my PHP code that calls it. In this code snippet, the filename to be copied is passed in as $_POST["filename"]. I copy the file to the same spot where I originally converted the file, convert it, copy it back and then delete all the files (so that it doesn't start growing exponentially). I did it this way because I wasn't able to make it work otherwise on the webserver. If there is a linux + webserver ninja out there that can figure out how to make it work without doing this, I would be interested to know what you did. Please post a comment or something if you did that. <?php //first copy the file to the magic place where we can convert it to a pdf on the fly copy($time.$_POST["filename"], "../LibreOffice/cde-package/cde-root/home/robert/Desktop/".$_POST["filename"]); //change to that directory chdir('../LibreOffice/cde-package/cde-root/home/robert'); //the magic command that does the conversion $myCommand = "./libreoffice.cde --headless -convert-to pdf Desktop/".$_POST["filename"]." -outdir Desktop/"; exec ($myCommand); //copy the file back copy("Desktop/".str_replace(".docx", ".pdf", $_POST["filename"]), "../../../../../documents/".str_replace(".docx", ".pdf", $_POST["filename"])); //delete all the files out of the magic place where we can convert it to a pdf on the fly $files1 = scandir('Desktop'); //my files that I generated all happened to start with a number. $pattern = '/^[0-9]/'; foreach ($files1 as $value) { preg_match($pattern, $value, $matches); if(count($matches) ?> 0) { unlink("Desktop/".$value); } } //changing the header to the location of the file makes it work well on androids header( 'Location: '.str_replace(".docx", ".pdf", $_POST["filename"]) ); ?> And here is the tar.gz file I generated I generated with CDE. To duplicate what I did exactly, put the tar.gz file in a folder somewhere. I will call that folder the "root". Make a new folder called "documents" in the "root" folder. Unpack the tar.gz and run the php script above from the "documents" folder. Success! I made a truly portable version of LibreOffice that can convert files on the fly on a webserver using 100% free, open source software!

Search Results

Search found 2743 results on 110 pages for 'awk formatting'.

Page 55/110 | < Previous Page | 51 52 53 54 55 56 57 58 59 60 61 62 | Next Page >

- by mtim

- by chiggsy

- by khmarbaise

- by teepusink

- by Nano Taboada

- by joe

- by philo

- by Yassin

- by atif089

- by Dmitriy

- by Abeed Salam

- by Raajkumar

- by vikas ramnani

- by inger

- by Manifold

- by Werner

- by nosuchip

- by Jeff Julian

- by pinaldave

- by John Breakwell

- by Greg K

- by Tim Dexter

- by Dave Campbell

- by user12620111

- by robertphyatt

< Previous Page | 51 52 53 54 55 56 57 58 59 60 61 62 | Next Page >