Search Results

Search found 8268 results on 331 pages for 'difference'.

Page 259/331 | < Previous Page | 255 256 257 258 259 260 261 262 263 264 265 266 | Next Page >

What happened to 'Copy Public Gallery Link' for folders in Photos in Dropbox?

- by Transgenic

I've been trying to figure out how to use the 'Photos' folder properly, but it does not seem to contain the functionality that is mentioned on various websites. From what I've read, you are supposed to create a folder within the 'Photos' folder. At that point, it becomes a public gallery for which you can access the context menu item Copy Public Gallery Link. However, I do not have this context menu item. I found a topic on the DropBox forum, but it is 2 years old. Even the various website information mentioned is over a year old. I found a topic from this year on SuperUser that mentions about the Share link menu item. I understand that this functionality is basically the same as Copy Public Gallery Link, however, the key difference is that it opens the website for which I need to click 'Copy link to this page' link, which then adds it to the clipboard. My primary questions are: Did they phase out the Copy Public Gallery Link from the context menu entirely? If not, is there a way to re-enable this menu item? If not, am I using the Photos folder properly? Lastly, if there is no way to get this menu item, is there some kind of script or hack-around that will let me simplify the Shark link + Copy link to this page functionality without requiring it to open my web browser? Note: I had to edit my post to remove other website links that I mention since I'm considered a 'New User' here and am only able to post 2 hyperlinks. Sorry.

Read the article
(squid): failed to find or read error text file.

- by adam

There is something in our ERR_NO_RELAY that is causing this error to be logged and for the squid process to fail on start up. I can't show you the exact content of the file but I can tell you It has several lines of JavaScript When we remove the JavaScript, the problem goes away. This same file does not cause any issues other 3 instances of squid that we have running internally. All instances of squid came from the same VM images so they should be the same. We are unable to reproduce this issue except on the one box and we are unable to debug more on this box right now because it is running in production. I know these files are interpreted so squid can replace certain values available in the session so it may be that a syntax error caused this issue. That does not explain why we cannot reproduce it on other (virtually the same) images. One difference is that the instance of squid that has the issue was under load when the issue occurred. Any suggestions/insight would be appreciated. thanks!

Read the article
Running two graphics cards (non-SLI) to power 3D on two different monitors?

- by Delameko

Hi, I'm a bit clueless about this, so excuse my naivety. I have two video cards, a Nvidia 8800 and a GT120, powering three monitors. I run two 3D game instances (two Everquest 2 clients), one on each of my first two monitors. It's been running fine, although sometimes it sounds like the computer is trying to take off. Today I realised that I was actually playing them on the two monitors that are both powered by the 8800. Thinking that I might as well make use of the power of both cards I tried switching the monitor cables over so that each card would be "powering" one of the clients. (Is it silly to assume this is how it works?) This doesn't seem to have had the desired effect, as the client running on the 8800 screen is running worse than it was before. Is it even possible to run two clients on separate GPUs? Is SLI the only way to utilise 2 GPUs? Is there something special I have to do? Or do I have to set the client to use a particular GPU (an option I can't seem to find in EQ2)? I run the clients in window mode if that makes any difference, and I'm running Win 7. Thanks.

Read the article
Need recommendation: which USB card (PCMCIA) works with external hard drive?

- by Carl

I have an HT-Link Cardbus/PCMCIA USB 2.0 2-port card (NEC / 32-bit). My external hard drive w/USB adapter won't work with it, and it will work plugged directly into a USB port on a different laptop. (My USB ports got fried.) I got the card off E-Bay. My MP3 player works plugged into that card. The drivers for the card say "Known limitations: High Speed Isochronus, USB Composite Devices." (No other details provided.) I don't know if the hard drive adapter is "isochronous" or "composite." I've read there are problems with too little power being supplied to the drive. The cable to the drive has two USB plugs on one end, and it doesn't make any difference if I plug both of them into the Cardbus card. What card should I get? I see many different brands on E-Bay. I need one that supplies sufficient power for an external hard drive, and doesn't have any "known limitations" in the way.

Read the article
Easiest way to find out if user has either Windows 7 or Vista (through telephone support)?

- by Rabarberski

If you have to provide some initial troubleshooting support by phone [or email], and you don't have access to the PC itself, what is the easiest and most foolproof question to find out if the 'dumb' user is using either Windows 7 or Windows Vista? For example: determining if the user has either Windows XP or Windows Vista/7 is easy. Just ask the user if the button at the left bottom corner is (a) either square with the word 'Start' on it, or (b) it is a round button. But how to determine the difference between Vista and 7? Edit: For all the existing answers the user has to type something, and do it correctly. Sometimes even that is already hard for a computer illiterate user. My XP example just requires looking. If it exists (although I am afraid it doesn't), I think a solution that is just based on something this is visually different between Vista and 7 would stand above all others. (Which makes Dan's suggestion to turn over the box and look at the label" not so stupid). Perhaps the small 'show desktop' rectangle at the right side of the task bar (was that present in Vista)?

Read the article
very slow connection to ssh server from client (but not other servers)

- by AntonOfTheWoods

I have an Ubuntu 12.04 laptop that is taking so long to connect to various servers (in different data centres) that it seems like a bit of a lottery whether I'll actually get a connection. If I connect to the servers between themselves it's instantaneous, and I've set UseDNS no AddressFamily inet On the servers I'm connecting to (and rebooted for good measure). I also put in the reverse DNS+IP of the cable connection I'm connecting from. If I connect from the laptop via telnet: telnet my.server 22 Then the connection is also instantaneous, so it doesn't appear to be a problem with an intervening firewall. I have the same behaviour whether I connect with the IP, a short name in my hosts or the FQDN. I'm connecting with a 50mbps (cable, sync) connection so that doesn't appear to be the problem, and when I do finally get a connection then it's a good, quick, stable one. I have tried listening on another port (8000) and that makes no difference. Web and other connections from the laptop to the machine are also very good. Does anyone have any ideas here?

Read the article
Using Plesk for webhosting on Ubuntu - Security risk or reasonably safe?

- by user66952

Sorry for this newb-question I'm pretty clueless about Plesk, only have limited debian (without Plesk) experience. If the question is too dumb just telling me how to ask a smarter one or what kind of info I should read first to improve the question would be appreciated as well. I want to offer a program for download on my website hosted on an Ubuntu 8.04.4 VPS using Plesk 9.3.0 for web-hosting. I have limited the ssh-access to the server via key only. When setting up the webhosting with Plesk it created an FTP-login & user is that a potential security risk that could bypass the key-only access? I think Plesk itself (even without the ftp-user-account) through it's web-interface could be a risk is that correct or are my concerns exaggerated? Would you say this solution makes a difference if I'm just using it for the next two weeks and then change servers to a system where I know more about security. 3.In other words is one less likely to get hacked within the first two weeks of having a new site up and running than in week 14&15? (due to occurring in less search results in the beginning perhaps, or for whatever reason... )

Read the article
Why is my HP TouchSmart tx2 1025dx shutting down?

- by Kristy

I'm having an issue with my HP tx2 1025dx laptop. Sometimes it will spontaneously shut itself off. First, the screen will go out. Next, the computer itself will shut down. This usually occurs when I am doing something that requires a lot of resources, but lately can happen when I am doing something as simple as using an Office app. I think it may be heat related, if I try to boot the laptop after it shuts itself down, it will shut itself down again before startup is complete. If I let it cool down for a while and then come back to it, the computer will boot properly. These shutdowns are always accompanied by high temperatures on the bottom right side, as well as the right side of the keyboard. I am using a laptop cooler with two fans, and it is not making a bit of difference. This is the second TouchSmart we have owned, the first one had the same issues, we sent it in to HP after it finally refused to reboot. They “fixed” it, and it worked fine for about a month before doing the exact same thing. Unfortunately we had bought the second one before the issues began with the original, if I had known this would be a problem, I would not have purchased another TouchSmart. This particular laptop we have had for more than a year and it didn’t start these issues until a couple of months ago. Any suggestions as to how to fix this problem?

Read the article
Printer irregularly producing garbage output

- by John Gardeniers

Every now and then instead of getting the proper output we get numerous pages, mostly with just a single line, of output which appears to be the raw PCL. My theory is that this happens when the first byte or two of the document is somehow not received by the printer, which then doesn't know how to interpret the rest and does it's best by spitting it out as text. This is a problem I've seen many times over the years but has been popping up more often since we upgraded to Win 7 64 bit, which introduced a number of headaches because of the HP lack of real support for 64 bits. It also appears to happen most often when printing PDF files. We have tried several different PDF readers in addition to Adobe's own but that hasn't helped. While we mainly use HP printers, and the problem is not limited to any particular model, I've also seen it happen on other brands, albeit to a lesser extent. I've also been unable to discern a difference between printers used via a print server or those connected directly by IP address. It also happens to USB attached printers. Because of the erratic nature of this problem there is precious little I can think of to try and debug it, so I'm after any ideas that might help to eliminate it.

Read the article
Too many connections to Sql Server 2008

- by Luis Forero

I have an application in C# Framework 4.0. Like many app this one connects to a data base to get information. In my case this database is SqlServer 2008 Express. The database is in my machine In my data layer I’m using Enterprise Library 5.0 When I publish my app in my local machine (App Pool Classic) Windows Professional IIS 7.5 The application works fine. I’m using this query to check the number of connections my application is creating when I’m testing it. SELECT db_name(dbid) as DatabaseName, count(dbid) as NoOfConnections, loginame as LoginName FROM sys.sysprocesses WHERE dbid > 0 AND db_name(dbid) = 'MyDataBase' GROUP BY dbid, loginame When I start testing the number of connection start growing but at some point the max number of connection is 26. I think that’s ok because the app works When I publish the app to TestMachine1 • XP Mode Virtual Machine (Windows XP Professional) • IIS 5.1 It works fine, the behavior is the same, the number of connections to the database increment to 24 or 26, after that they stay at that point no matter what I do in the application. The problem: When I publish to TestMachine2 (App Pool Classic) • Windows Server 2008 R2 • IIS 7.5 I start to test the application the number of connection to the database start to grow but this time they grow very rapidly and don’t stop growing at 24 or 26, the number of connections grow till the get to be 100 and the application stop working at that point. I have check for any difference on the publications, especially in Windows Professional and Windows Server and they seem with the same parameters and configurations. Any clues why this could be happening? , any suggestions?

Read the article
What can cause two "identical" setups of IE8 and XP to display things differently?

- by ccornet

This has been baffling me for a while. Of the machines I use in visitting Stack Overflow, two of them are machines with the same setup: Dell with Windows XP with IE8. Since they were issued to me by the same company (one to use in the office, one to use at home), they have identical setups as well. But they display certain page elements differently! One is an Optiplex GX620 desktop, the other is an Inspiron 9100 laptop, but somehow the hardware doesn't seem like something that should be overriding how my browser displays things. Nevertheless, the laptop seems to display things differently than what is expected. Differences have included the following: This issue persisted on the laptop after Jeff fixed it, but was repaired for everyone else and on the Desktop. When viewing Vote Counts on a post, the grey line is left immediately beneath the upvotes but a number-sized white space is below that before the downvotes. On the desktop, it displays properly with the two adjacent and divided by a grey line. Code blocks seem to have a blank line at the end on the laptop. The following image illustrates how the last two elements look on the laptop. So, considering that as far as I can tell, these two setups are identical (I have not messed with any settings and they were both initialized identically as well), what else could be causing the display difference?

Read the article
Problems restricting access using Apache

- by Tola Odejayi

I've set up XAMPP on a Windows 7 machine, and I want to restrict access to the htdocs folder to only requests from the local machine. C:\Xampp\htdocs is the web root folder. I have the following in my apache/conf/httpd.conf file: <Directory /> Options FollowSymLinks AllowOverride None Order deny,allow Deny from all </Directory> <Directory "C:/Xampp/htdocs"> Options Indexes FollowSymLinks Includes ExecCGI AllowOverride All order deny,allow deny from all allow from 127.0.0.1 allow from localhost </Directory> All my .htaccess files are blank. But when I navigate to the web root folder via a browser, I get the following message: Access forbidden! You don't have permission to access the requested directory. There is either no index document or the directory is read-protected. I tried adding the IP restrictions to the <Directory>...</Directory>, but it made no difference. What am I doing wrong here?

Read the article
Is is possible to guide installation of new programs using %ProgramFiles%? [closed]

- by ??????? ???????????

The purpose of this is to have the default "program files" (32 and 64 bit) folders located under an arbitrary path, possibly on a drive separate from where windows lives. Initially I thought that this may be done using a system environment variable through the dialog located under Control Panel - System - Advanced - Environment Variables. These variables turned out to be set in the registry under the key HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion. However, one particular entry is confusing. The ProgramFilesPath entry seems to point at an environment variable that is not defined under the same registry key. I could assume that the difference between ProgramFilesDir and ProgramFilesPath is none and that one of them exists as a backwards compatibility, but having some legitimate resource from Microsoft to look at would be better than guessing. After receiving some worrying feedback about having both 32 and 64bit applications in the same folder, I have decided not to ask about the feasibility of this to avoid discussion. The real question is if the desired effect is possible to attain by "cutting into" the windows setup process and modifying those registry entries as early as possible. These settings should be system wide and not only for software installed by a particular user. If this is indeed something that can be done, I wonder if there are any subtle pitfalls. Programs that expect libraries and other resources to be in default locations can probably be dealt with using the same technique as employed by Windows to re-map the "Documents and Settings" folders and the like (i.e. breaking legacy applications is not real concern).

Read the article
Huge discrepancy in Inkscape file size

- by Keyran

When using Inkscape to create many pictures with common elements across them, I tend to copy the first SVG file I have created as many times as I need pictures, and then edit the copies. If I reuse files across projects, it can result in a file being copied and modified tens to hundreds of files. I have recently realized that the latest copies have a size between 29 and 60 MB, slowing my computer down significatively. My pictures are very simple, nothing that would normally go over 1 MB in size. As an experiment, I copied the entire content of one of the latest files into a new Inkscape file. I am certain that I have copied the content of the file entirely (I have only one layer and I used the "Select All" option). The new file has a size of 102,2 KB. This would indicate that about 30 MB of data per file is irrelevant to me. What could be the cause of this size difference ? Is there a way to reduce the size of a file without having to copy the content into a new file ? I am using Inkscape 0.48.4 on Debian Unstable. Thanks for any input you might be able to provide !

Read the article
IE8 page reload hangs

- by Rod

When a 7mb HTM file is first opened (by clicking on the file icon or using Open With), both IE8 and Firefox display this browser file quickly. After the file is closed, Firefox will reopen this file quickly, but IE8 appears to hang during the reopen. Clearing the IE cache does not help. However, IE will reopen the file quickly again only if the File/Open/Browse feature of the menu bar is used (clicking on the file icon can be used only once between computer reboots). Testing suggests that the problem relates to the number of HTML hyperlinks pointing to another part of the file. There are many hyperlinks, but they are not a problem during the first load of the document (between computer reboots). What needs to be fixed to avoid use of the workaround? Using Windows XP SP3 Update 6/23/12 - Controlled testing shows that the number of hyperlinks is not the problem. The way this large file is opened is the difference: 1) from the IE menu bar, File/Open/Browse is consistent and fast (but not as fast as FF). 2) clicking on the file name in the folder (even when IE is the default program for this file type) causes a much delayed load of the file. Creating a smaller file demonstrates the delayed load, but verifies that the load eventually occurs.

Read the article
Can’t connect to SQL Server 2008 - looks like Shared Memory problem

- by user38556

I am unable to connect to my local instance of SQL Server 2008 Express using SQL Server Management Studio. I believe the problem is related to a change I made to the connection protocols. Before the error occurred, I had Shared Memory enabled and Named Pipes and TCP/IP disabled. I then enabled both Named Pipes and TCP/IP, and this is when I started experiencing the problem. When I try to connect to the server with SSMS (with either my SQL server sysadmin login or with windows authentication), I get the following error message: A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) Why is it returning a Named Pipes error? Why would it not just use Shared Memory, as this has a higher priority order in the list of connection protocols? It seems like it is not listening on Shared Memory for some reason? When I set Named Pipes to enabled and try to connect, I get the same error message. My windows account is does not have administrator priviliges on my computer - perhaps this is making a difference in some way (as some of the discussions in this post about an "SuperSocketNetLib\Lpc" registry key seems to suggest). I have tried restarting the SQL Server service, by the way, and also tried to get someone to log onto the machine with an admin account to restart the SQL Server service. Still no luck.

Read the article
Deploying Memcached as 32bit or 64bit?

- by rlotun

I'm curious about how people deploy memcached on 64 bit machines. Do you compile a 64bit (standard) memcached binary and run that, or do people compile it in 32bit mode and run N instances (where N = machine_RAM / 4GB)? Consider a recommended deployment of Redis (from the Redis FAQ): Redis uses a lot more memory when compiled for 64 bit target, especially if the dataset is composed of many small keys and values. Such a database will, for instance, consume 50 MB of RAM when compiled for the 32 bit target, and 80 MB for 64 bit! That's a big difference. You can run 32 bit Redis binaries in a 64 bit Linux and Mac OS X system without problems. For OS X just use make 32bit. For Linux instead, make sure you have libc6-dev-i386 installed, then use make 32bit if you are using the latest Git version. Instead for Redis <= 1.2.2 you have to edit the Makefile and replace "-arch i386" with "-m32". If your application is already able to perform application-level sharding, it is very advisable to run N instances of Redis 32bit against a big 64 bit Redis box (with more than 4GB of RAM) instead than a single 64 bit instance, as this is much more memory efficient. Would not the same recommendation also apply to a memcached cluster?

Read the article
Fedora 13 becomes unresponsive when too many applications running.

- by user61766

I am using Fedora 13 64bit on Dell Vostro full 4GB RAM system with default Gnome GUI. This is very annoying problem that I don't how to fix except by rebooting the whole PC. When I have too many applications running ( like browser windows), the system start acting sluggish. The fist symptoms appear in Eclipse IDE which becomes so terrible it just becomes frozen for sometime one whole minute after I try to edit something in the editor. Then Firefox seems like it has crashed. Google Chrome becomes very unresponsive as well. All GUI applications including File manager becomes unresponsive. When I check System Monitor, the CPU is still around 20% and memory is at 80% but the system seems getting fried up. This progressively becomes worse until I soft reboot it or if I dont do it evetually the whole system is fried, no response to any keyboard key or mouse and I have to hit the hardware turn off button. I regularly yum update the system but this makes no difference. Please don't tell not to run too many applications because I need those for my work. I thought Linux is well designed Operating System but I am very disappointed so far. Can some one here help ?

Read the article
Randomly unusable wireless connection. Oddest issue ever...

- by Hallucynogenyc

I have two desktop computers a laptop and a smartphone. All of them connect perfectly to my router (WRT54GL with Tomato firmware), but randomly (once a week maybe) something odd happens: For over an hour or so one of the desktop computers (Windows 7 Pro x64) will just refuse to connect properly to the router. Only that computer and only to that network. I can connect all the other machines to the router perfectly and I can connect properly to other networks with that machine. What I mean by "not being able to connect properly" I mean that the OS will just tell me "not able to connect to.." or that it will connect but then say that it has no Internet access or it connects but will then take minutes to load any website, even the router web interface. I've tried to change from WPA2 to WEP and back to WPA2, I've tried different network adapters (one internal PCI card and one USB external card), removing networks from Windows and adding again... without making any difference. It works for some days but at the end, it ends happening at some time. I just have no clue on what the hell could be going on here. I first thought it was the network adapter, then the router, then Windows... Thanks in advance

Read the article
Red Gate Coder interviews: Alex Davies

- by Michael Williamson

Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

Read the article
A way of doing real-world test-driven development (and some thoughts about it)

- by Thomas Weller

Lately, I exchanged some arguments with Derick Bailey about some details of the red-green-refactor cycle of the Test-driven development process. In short, the issue revolved around the fact that it’s not enough to have a test red or green, but it’s also important to have it red or green for the right reasons. While for me, it’s sufficient to initially have a NotImplementedException in place, Derick argues that this is not totally correct (see these two posts: Red/Green/Refactor, For The Right Reasons and Red For The Right Reason: Fail By Assertion, Not By Anything Else). And he’s right. But on the other hand, I had no idea how his insights could have any practical consequence for my own individual interpretation of the red-green-refactor cycle (which is not really red-green-refactor, at least not in its pure sense, see the rest of this article). This made me think deeply for some days now. In the end I found out that the ‘right reason’ changes in my understanding depending on what development phase I’m in. To make this clear (at least I hope it becomes clear…) I started to describe my way of working in some detail, and then something strange happened: The scope of the article slightly shifted from focusing ‘only’ on the ‘right reason’ issue to something more general, which you might describe as something like 'Doing real-world TDD in .NET , with massive use of third-party add-ins’. This is because I feel that there is a more general statement about Test-driven development to make: It’s high time to speak about the ‘How’ of TDD, not always only the ‘Why’. Much has been said about this, and me myself also contributed to that (see here: TDD is not about testing, it's about how we develop software). But always justifying what you do is very unsatisfying in the long run, it is inherently defensive, and it costs time and effort that could be used for better and more important things. And frankly: I’m somewhat sick and tired of repeating time and again that the test-driven way of software development is highly preferable for many reasons - I don’t want to spent my time exclusively on stating the obvious… So, again, let’s say it clearly: TDD is programming, and programming is TDD. Other ways of programming (code-first, sometimes called cowboy-coding) are exceptional and need justification. – I know that there are many people out there who will disagree with this radical statement, and I also know that it’s not a description of the real world but more of a mission statement or something. But nevertheless I’m absolutely sure that in some years this statement will be nothing but a platitude. Side note: Some parts of this post read as if I were paid by Jetbrains (the manufacturer of the ReSharper add-in – R#), but I swear I’m not. Rather I think that Visual Studio is just not production-complete without it, and I wouldn’t even consider to do professional work without having this add-in installed... The three parts of a software component Before I go into some details, I first should describe my understanding of what belongs to a software component (assembly, type, or method) during the production process (i.e. the coding phase). Roughly, I come up with the three parts shown below: First, we need to have some initial sort of requirement. This can be a multi-page formal document, a vague idea in some programmer’s brain of what might be needed, or anything in between. In either way, there has to be some sort of requirement, be it explicit or not. – At the C# micro-level, the best way that I found to formulate that is to define interfaces for just about everything, even for internal classes, and to provide them with exhaustive xml comments. The next step then is to re-formulate these requirements in an executable form. This is specific to the respective programming language. - For C#/.NET, the Gallio framework (which includes MbUnit) in conjunction with the ReSharper add-in for Visual Studio is my toolset of choice. The third part then finally is the production code itself. It’s development is entirely driven by the requirements and their executable formulation. This is the delivery, the two other parts are ‘only’ there to make its production possible, to give it a decent quality and reliability, and to significantly reduce related costs down the maintenance timeline. So while the first two parts are not really relevant for the customer, they are very important for the developer. The customer (or in Scrum terms: the Product Owner) is not interested at all in how the product is developed, he is only interested in the fact that it is developed as cost-effective as possible, and that it meets his functional and non-functional requirements. The rest is solely a matter of the developer’s craftsmanship, and this is what I want to talk about during the remainder of this article… An example To demonstrate my way of doing real-world TDD, I decided to show the development of a (very) simple Calculator component. The example is deliberately trivial and silly, as examples always are. I am totally aware of the fact that real life is never that simple, but I only want to show some development principles here… The requirement As already said above, I start with writing down some words on the initial requirement, and I normally use interfaces for that, even for internal classes - the typical question “intf or not” doesn’t even come to mind. I need them for my usual workflow and using them automatically produces high componentized and testable code anyway. To think about their usage in every single situation would slow down the production process unnecessarily. So this is what I begin with: namespace Calculator { /// <summary> /// Defines a very simple calculator component for demo purposes. /// </summary> public interface ICalculator { /// <summary> /// Gets the result of the last successful operation. /// </summary> /// <value>The last result.</value> /// <remarks> /// Will be <see langword="null" /> before the first successful operation. /// </remarks> double? LastResult { get; } } // interface ICalculator } // namespace Calculator So, I’m not beginning with a test, but with a sort of code declaration - and still I insist on being 100% test-driven. There are three important things here: Starting this way gives me a method signature, which allows to use IntelliSense and AutoCompletion and thus eliminates the danger of typos - one of the most regular, annoying, time-consuming, and therefore expensive sources of error in the development process. In my understanding, the interface definition as a whole is more of a readable requirement document and technical documentation than anything else. So this is at least as much about documentation than about coding. The documentation must completely describe the behavior of the documented element. I normally use an IoC container or some sort of self-written provider-like model in my architecture. In either case, I need my components defined via service interfaces anyway. - I will use the LinFu IoC framework here, for no other reason as that is is very simple to use. The ‘Red’ (pt. 1) First I create a folder for the project’s third-party libraries and put the LinFu.Core dll there. Then I set up a test project (via a Gallio project template), and add references to the Calculator project and the LinFu dll. Finally I’m ready to write the first test, which will look like the following: namespace Calculator.Test { [TestFixture] public class CalculatorTest { private readonly ServiceContainer container = new ServiceContainer(); [Test] public void CalculatorLastResultIsInitiallyNull() { ICalculator calculator = container.GetService<ICalculator>(); Assert.IsNull(calculator.LastResult); } } // class CalculatorTest } // namespace Calculator.Test This is basically the executable formulation of what the interface definition states (part of). Side note: There’s one principle of TDD that is just plain wrong in my eyes: I’m talking about the Red is 'does not compile' thing. How could a compiler error ever be interpreted as a valid test outcome? I never understood that, it just makes no sense to me. (Or, in Derick’s terms: this reason is as wrong as a reason ever could be…) A compiler error tells me: Your code is incorrect, but nothing more. Instead, the ‘Red’ part of the red-green-refactor cycle has a clearly defined meaning to me: It means that the test works as intended and fails only if its assumptions are not met for some reason. Back to our Calculator. When I execute the above test with R#, the Gallio plugin will give me this output: So this tells me that the test is red for the wrong reason: There’s no implementation that the IoC-container could load, of course. So let’s fix that. With R#, this is very easy: First, create an ICalculator - derived type: Next, implement the interface members: And finally, move the new class to its own file: So far my ‘work’ was six mouse clicks long, the only thing that’s left to do manually here, is to add the Ioc-specific wiring-declaration and also to make the respective class non-public, which I regularly do to force my components to communicate exclusively via interfaces: This is what my Calculator class looks like as of now: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get { throw new NotImplementedException(); } } } } Back to the test fixture, we have to put our IoC container to work: [TestFixture] public class CalculatorTest { #region Fields private readonly ServiceContainer container = new ServiceContainer(); #endregion // Fields #region Setup/TearDown [FixtureSetUp] public void FixtureSetUp() { container.LoadFrom(AppDomain.CurrentDomain.BaseDirectory, "Calculator.dll"); } ... Because I have a R# live template defined for the setup/teardown method skeleton as well, the only manual coding here again is the IoC-specific stuff: two lines, not more… The ‘Red’ (pt. 2) Now, the execution of the above test gives the following result: This time, the test outcome tells me that the method under test is called. And this is the point, where Derick and I seem to have somewhat different views on the subject: Of course, the test still is worthless regarding the red/green outcome (or: it’s still red for the wrong reasons, in that it gives a false negative). But as far as I am concerned, I’m not really interested in the test outcome at this point of the red-green-refactor cycle. Rather, I only want to assert that my test actually calls the right method. If that’s the case, I will happily go on to the ‘Green’ part… The ‘Green’ Making the test green is quite trivial. Just make LastResult an automatic property: [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { public double? LastResult { get; private set; } } One more round… Now on to something slightly more demanding (cough…). Let’s state that our Calculator exposes an Add() method: ... /// <summary> /// Adds the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the additon.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Add(double operand1, double operand2); } // interface ICalculator A remark: I sometimes hear the complaint that xml comment stuff like the above is hard to read. That’s certainly true, but irrelevant to me, because I read xml code comments with the CR_Documentor tool window. And using that, it looks like this: Apart from that, I’m heavily using xml code comments (see e.g. here for a detailed guide) because there is the possibility of automating help generation with nightly CI builds (using MS Sandcastle and the Sandcastle Help File Builder), and then publishing the results to some intranet location. This way, a team always has first class, up-to-date technical documentation at hand about the current codebase. (And, also very important for speeding up things and avoiding typos: You have IntelliSense/AutoCompletion and R# support, and the comments are subject to compiler checking…). Back to our Calculator again: Two more R# – clicks implement the Add() skeleton: ... public double Add(double operand1, double operand2) { throw new NotImplementedException(); } } // class Calculator As we have stated in the interface definition (which actually serves as our requirement document!), the operands are not allowed to be negative. So let’s start implementing that. Here’s the test: [Test] [Row(-0.5, 2)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } As you can see, I’m using a data-driven unit test method here, mainly for these two reasons: Because I know that I will have to do the same test for the second operand in a few seconds, I save myself from implementing another test method for this purpose. Rather, I only will have to add another Row attribute to the existing one. From the test report below, you can see that the argument values are explicitly printed out. This can be a valuable documentation feature even when everything is green: One can quickly review what values were tested exactly - the complete Gallio HTML-report (as it will be produced by the Continuous Integration runs) shows these values in a quite clear format (see below for an example). Back to our Calculator development again, this is what the test result tells us at the moment: So we’re red again, because there is not yet an implementation… Next we go on and implement the necessary parameter verification to become green again, and then we do the same thing for the second operand. To make a long story short, here’s the test and the method implementation at the end of the second cycle: // in CalculatorTest: [Test] [Row(-0.5, 2)] [Row(295, -123)] public void AddThrowsOnNegativeOperands(double operand1, double operand2) { ICalculator calculator = container.GetService<ICalculator>(); Assert.Throws<ArgumentException>(() => calculator.Add(operand1, operand2)); } // in Calculator: public double Add(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } throw new NotImplementedException(); } So far, we have sheltered our method from unwanted input, and now we can safely operate on the parameters without further caring about their validity (this is my interpretation of the Fail Fast principle, which is regarded here in more detail). Now we can think about the method’s successful outcomes. First let’s write another test for that: [Test] [Row(1, 1, 2)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } Again, I’m regularly using row based test methods for these kinds of unit tests. The above shown pattern proved to be extremely helpful for my development work, I call it the Defined-Input/Expected-Output test idiom: You define your input arguments together with the expected method result. There are two major benefits from that way of testing: In the course of refining a method, it’s very likely to come up with additional test cases. In our case, we might add tests for some edge cases like ‘one of the operands is zero’ or ‘the sum of the two operands causes an overflow’, or maybe there’s an external test protocol that has to be fulfilled (e.g. an ISO norm for medical software), and this results in the need of testing against additional values. In all these scenarios we only have to add another Row attribute to the test. Remember that the argument values are written to the test report, so as a side-effect this produces valuable documentation. (This can become especially important if the fulfillment of some sort of external requirements has to be proven). So your test method might look something like that in the end: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 2)] [Row(0, 999999999, 999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, double.MaxValue)] [Row(4, double.MaxValue - 2.5, double.MaxValue)] public void TestAdd(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Add(operand1, operand2); Assert.AreEqual(expectedResult, result); } And this will produce the following HTML report (with Gallio): Not bad for the amount of work we invested in it, huh? - There might be scenarios where reports like that can be useful for demonstration purposes during a Scrum sprint review… The last requirement to fulfill is that the LastResult property is expected to store the result of the last operation. I don’t show this here, it’s trivial enough and brings nothing new… And finally: Refactor (for the right reasons) To demonstrate my way of going through the refactoring portion of the red-green-refactor cycle, I added another method to our Calculator component, namely Subtract(). Here’s the code (tests and production): // CalculatorTest.cs: [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtract(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); double result = calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, result); } [Test, Description("Arguments: operand1, operand2, expectedResult")] [Row(1, 1, 0)] [Row(0, 999999999, -999999999)] [Row(0, 0, 0)] [Row(0, double.MaxValue, -double.MaxValue)] [Row(4, double.MaxValue - 2.5, -double.MaxValue)] public void TestSubtractGivesExpectedLastResult(double operand1, double operand2, double expectedResult) { ICalculator calculator = container.GetService<ICalculator>(); calculator.Subtract(operand1, operand2); Assert.AreEqual(expectedResult, calculator.LastResult); } ... // ICalculator.cs: /// <summary> /// Subtracts the specified operands. /// </summary> /// <param name="operand1">The operand1.</param> /// <param name="operand2">The operand2.</param> /// <returns>The result of the subtraction.</returns> /// <exception cref="ArgumentException"> /// Argument <paramref name="operand1"/> is < 0.<br/> /// -- or --<br/> /// Argument <paramref name="operand2"/> is < 0. /// </exception> double Subtract(double operand1, double operand2); ... // Calculator.cs: public double Subtract(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } return (this.LastResult = operand1 - operand2).Value; } Obviously, the argument validation stuff that was produced during the red-green part of our cycle duplicates the code from the previous Add() method. So, to avoid code duplication and minimize the number of code lines of the production code, we do an Extract Method refactoring. One more time, this is only a matter of a few mouse clicks (and giving the new method a name) with R#: Having done that, our production code finally looks like that: using System; using LinFu.IoC.Configuration; namespace Calculator { [Implements(typeof(ICalculator))] internal class Calculator : ICalculator { #region ICalculator public double? LastResult { get; private set; } public double Add(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 + operand2).Value; } public double Subtract(double operand1, double operand2) { ThrowIfOneOperandIsInvalid(operand1, operand2); return (this.LastResult = operand1 - operand2).Value; } #endregion // ICalculator #region Implementation (Helper) private static void ThrowIfOneOperandIsInvalid(double operand1, double operand2) { if (operand1 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand1"); } if (operand2 < 0.0) { throw new ArgumentException("Value must not be negative.", "operand2"); } } #endregion // Implementation (Helper) } // class Calculator } // namespace Calculator But is the above worth the effort at all? It’s obviously trivial and not very impressive. All our tests were green (for the right reasons), and refactoring the code did not change anything. It’s not immediately clear how this refactoring work adds value to the project. Derick puts it like this: STOP! Hold on a second… before you go any further and before you even think about refactoring what you just wrote to make your test pass, you need to understand something: if your done with your requirements after making the test green, you are not required to refactor the code. I know… I’m speaking heresy, here. Toss me to the wolves, I’ve gone over to the dark side! Seriously, though… if your test is passing for the right reasons, and you do not need to write any test or any more code for you class at this point, what value does refactoring add? Derick immediately answers his own question: So why should you follow the refactor portion of red/green/refactor? When you have added code that makes the system less readable, less understandable, less expressive of the domain or concern’s intentions, less architecturally sound, less DRY, etc, then you should refactor it. I couldn’t state it more precise. From my personal perspective, I’d add the following: You have to keep in mind that real-world software systems are usually quite large and there are dozens or even hundreds of occasions where micro-refactorings like the above can be applied. It’s the sum of them all that counts. And to have a good overall quality of the system (e.g. in terms of the Code Duplication Percentage metric) you have to be pedantic on the individual, seemingly trivial cases. My job regularly requires the reading and understanding of ‘foreign’ code. So code quality/readability really makes a HUGE difference for me – sometimes it can be even the difference between project success and failure… Conclusions The above described development process emerged over the years, and there were mainly two things that guided its evolution (you might call it eternal principles, personal beliefs, or anything in between): Test-driven development is the normal, natural way of writing software, code-first is exceptional. So ‘doing TDD or not’ is not a question. And good, stable code can only reliably be produced by doing TDD (yes, I know: many will strongly disagree here again, but I’ve never seen high-quality code – and high-quality code is code that stood the test of time and causes low maintenance costs – that was produced code-first…) It’s the production code that pays our bills in the end. (Though I have seen customers these days who demand an acceptance test battery as part of the final delivery. Things seem to go into the right direction…). The test code serves ‘only’ to make the production code work. But it’s the number of delivered features which solely counts at the end of the day - no matter how much test code you wrote or how good it is. With these two things in mind, I tried to optimize my coding process for coding speed – or, in business terms: productivity - without sacrificing the principles of TDD (more than I’d do either way…). As a result, I consider a ratio of about 3-5/1 for test code vs. production code as normal and desirable. In other words: roughly 60-80% of my code is test code (This might sound heavy, but that is mainly due to the fact that software development standards only begin to evolve. The entire software development profession is very young, historically seen; only at the very beginning, and there are no viable standards yet. If you think about software development as a kind of casting process, where the test code is the mold and the resulting production code is the final product, then the above ratio sounds no longer extraordinary…) Although the above might look like very much unnecessary work at first sight, it’s not. With the aid of the mentioned add-ins, doing all the above is a matter of minutes, sometimes seconds (while writing this post took hours and days…). The most important thing is to have the right tools at hand. Slow developer machines or the lack of a tool or something like that - for ‘saving’ a few 100 bucks - is just not acceptable and a very bad decision in business terms (though I quite some times have seen and heard that…). Production of high-quality products needs the usage of high-quality tools. This is a platitude that every craftsman knows… The here described round-trip will take me about five to ten minutes in my real-world development practice. I guess it’s about 30% more time compared to developing the ‘traditional’ (code-first) way. But the so manufactured ‘product’ is of much higher quality and massively reduces maintenance costs, which is by far the single biggest cost factor, as I showed in this previous post: It's the maintenance, stupid! (or: Something is rotten in developerland.). In the end, this is a highly cost-effective way of software development… But on the other hand, there clearly is a trade-off here: coding speed vs. code quality/later maintenance costs. The here described development method might be a perfect fit for the overwhelming majority of software projects, but there certainly are some scenarios where it’s not - e.g. if time-to-market is crucial for a software project. So this is a business decision in the end. It’s just that you have to know what you’re doing and what consequences this might have… Some last words First, I’d like to thank Derick Bailey again. His two aforementioned posts (which I strongly recommend for reading) inspired me to think deeply about my own personal way of doing TDD and to clarify my thoughts about it. I wouldn’t have done that without this inspiration. I really enjoy that kind of discussions… I agree with him in all respects. But I don’t know (yet?) how to bring his insights into the described production process without slowing things down. The above described method proved to be very “good enough” in my practical experience. But of course, I’m open to suggestions here… My rationale for now is: If the test is initially red during the red-green-refactor cycle, the ‘right reason’ is: it actually calls the right method, but this method is not yet operational. Later on, when the cycle is finished and the tests become part of the regular, automated Continuous Integration process, ‘red’ certainly must occur for the ‘right reason’: in this phase, ‘red’ MUST mean nothing but an unfulfilled assertion - Fail By Assertion, Not By Anything Else!

Read the article
value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
How to enable gzip HTTP compression on Windows Azure dynamic content

- by Steven

Hi all, I've been trying unsuccessfully to enable gzip HTTP compression on my Windows Azure hosted WCF Restful service which returns JSON only from GET and POST requests. I have tried so many things that I would have a hard time listing all of them, and I now realise I have been working with conflicting information (regarding old version of azure etc) so think it best to start with a clean slate! I am working with Visual Studio 2008, using the February 2010 tools for Visual Studio. So, according to the following link, HTTP compression has now been enabled .. http://msdn.microsoft.com/en-us/library/ff436045.aspx ... and I've used the advice at the following page (the URL compression advice only), but I get no compression. http://blog.smarx.com/posts/iis-compression-in-windows-azure <urlCompression doStaticCompression="true" doDynamicCompression="false" dynamicCompressionBeforeCache="true" /> It doesn't help that I don't know what the difference is between urlCompression and httpCompression. I've tried to find out but to no avail! Could the fact that the tools for Visual Studio were released before the version of Azure which supports compression be a problem? I read somewhere that with the latest tools, you can choose which version of Azure OS you want to use when you publish ... but I don't know if that's true, and if it is, I can't find where to choose. Could I be using a pre-http enabled version? I've also tried blowery http compression module, but no results. Does any one have any up-to-date advice on how to achieve this? i.e. advice that relates to the current version of the Azure OS. Cheers! Steven

Read the article
C# WebBrowser.ShowPrintDialog() not showing

- by jeah_wicer

I have this peculiar problem while wanting to print a html-report. The file itself is a normal local html file, located on my hard drive. To do this, I have tried the following: public static void PrintReport(string path) { WebBrowser wb = new WebBrowser(); wb.Navigate(path); wb.ShowPrintDialog() } And I have this form with a button with the click event: private void button1_Click(object sender, EventArgs e) { string path = @"D:\MyReport.html"; PrintReport(path); } This does absolutely nothing. Which is kind of strange... but things get stranger... When editing the print function to do the following: public static void PrintReport(string path) { WebBrowser wb = new WebBrowser(); wb.Navigate(path); MessageBox.Show("TEST"); wb.ShowPrintDialog() } It works. Yes, only adding a MessageBox. The MessageBox is showing and after it comes the print dialog. I have also tried with Thread.Sleep(1000) instead, which doesn't work. Can anyone explain to me what's going on here? Why would a messagebox make any difference? Can it be some kind of permission problem? I've reproduced this on both Windows 7 and 8, same thing. I made this small application with only the above code to isolate the problem. I am quite sure it works on windows XP though, since an older version of the application I'm working on runs on it. When trying to do this directly with the mshtml-dll instead I also get problems. Any input or clarification is greatly appreciated!

Read the article

Search Results

Search found 8268 results on 331 pages for 'difference'.

Page 259/331 | < Previous Page | 255 256 257 258 259 260 261 262 263 264 265 266 | Next Page >

- by Transgenic

- by adam

- by Delameko

- by Carl

- by Rabarberski

- by AntonOfTheWoods

- by user66952

- by Kristy

- by John Gardeniers

- by Luis Forero

- by ccornet

- by Tola Odejayi

- by ??????? ???????????

- by Keyran

- by Rod

- by user38556

- by rlotun

- by user61766

- by Hallucynogenyc

- by Michael Williamson

- by Thomas Weller

- by john.rose

- by user12620111

- by Steven

- by jeah_wicer

< Previous Page | 255 256 257 258 259 260 261 262 263 264 265 266 | Next Page >