Search Results

Search found 1544 results on 62 pages for 'heap corruption'.

Page 12/62 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >

  • Hard drives (SATA/ATA) corrupting

    - by JC Denton
    Hello All, 2 years ago I bought relatives a new computer for christmas and installed Ubuntu on it. Ever since then it has been experiencing problems with the hard drives. The hard drive supplied with the machine was a SATA drive. When it appeared it was having problems (files and folders with invalid encoding started appearing) I replaced the SATA drive with the drive of their previous computer. I replaced it (The replacement) later on, the drive being rather old and thus becoming more prone to the risk of failure. The replacement drive is a IDE drive but the same problems started to appear (files and folders showing up in nautilus with invalid encoding). I fear the files and folders that are showing are existing FS entries, starting to corrupt. As it's happening to both the IDE and SATA drive it's unlikely to be the drives themselves or the IDE/SATA controller, I believe. Any ideas as to what could be causing the (assumed) corruption? EDIT: You're right about the paragraphs. They were there in edit mode but I'm still getting to grips with the whitespace format codes. The system is a "Primo Pro" AMD Phenom II X4 Quad Core 920 2.80GHz SILENT DDR2, ordered from overclockers.co.uk and nothing has been added to it except for the replacement of the SATA drive with an AMD drive. It would seem unlikely for a barebones system to be underpowered.

    Read the article

  • Files on ext4 on Drobo with corrupt, zero-ed out blocks

    - by Patrick
    I have a 2TB ext4 file system (Ubuntu running Linux kernel 2.6.31-22-server x86_64). This file system is the second drive on a Drobo box plugged in via USB. We've not had problems on the first drive (Drobo limits drive size to 2TB due to some OS limitations, so if you have more space than that it appears as two separate drives). I am sharing this files with Samba (smbd 3.4.0) with a mix of Windows and Linux workstations. Recently we've been experiencing some data corruption in multiple files. In many cases I have an un-corrupt original file stored on one of the workstations. These are binary files of various formats, (e.g. SQLite, but others as well). I used "split" to split a corrupt and uncorrupt file into 4096 byte chunks (this is the block size of the ext4 file system). I then ran md5sum on pairs of chunks and discovered that the chunks matched in many cases and in every case where they did not match, the corrupt chunk was a solid chunk of zeroes (620f0b67a91f7f74151bc5be745b7110 for what it's worth). I'm trying to track down a culprit but am a bit at a loss. I don't believe Samba is at fault since I'm using it without issue on the first drive exported by the Drobo. What can I do to narrow this down and find out what's going on?

    Read the article

  • What are "Excess Fragments" in defragmenting a hard drive?

    - by Andrew Swift
    I'm defragmenting my hard drive (XP SP3) with PerfectDisk 7.0, and it finds 816,659 excess fragments when I ask for an analysis. [update] Specifically, it shows that the 1TB disk is 14% fragmented with 19693 fragments and 816,659 excess fragments. About 20% of the disk is still free space. What does excess fragments refer to? What is the difference between fragments and excess fragments? I have had problems in the past where I defragmented a fragmented disk and many files were corrupted. It seemed as though "excess fragments" referred to orphan pieces, where the program couldn't find out where to put them. If that was true, then defragmenting a disk resulted in many incomplete files, and in fact I defragmented a disk full of MP3's and got a lot of corrupted files as a result. Instead, I started to simply format a separate disk and copy everything from one to the other. That way there were no orphan bits, and no file corruption. Does anybody know what "excess fragments" really are?

    Read the article

  • Automated Scanning for Corrupted Archive Files

    - by Synetech inc.
    Hi, I have all kinds of files that I have downloaded from everywhere over the years scattered around my hard drives. I’m in the process of trying to organize them all and have run into a problem. Sometimes a file was not downloaded correctly (this was a big issue when Chromium first came out) and is thus corrupt. For media files, this is relatively simple to determine (it requires actually opening and examining the file). For executables or other binaries, it is relatively difficult (executables may or may not crash, other binaries could be completely unknown). Archive files (eg zip, rar, 7z, exe, ace, etc.) however should be pretty simple, they have a built-in corruption detection facility. My problem now is that I really don’t want to open and test each and every single archived file throughout my drive; that would be a nightmare. I’m looking for a utility that can automate the process. Is there a program that can scan archive files on a drive and list the ones that are corrupt? Thanks a lot.

    Read the article

  • Sharepoint AD imported users are becomming sporadically corrupted, causing us to have to create a ne

    - by TrevJen
    Sharepoint 2007 MOSS with AD imported users. All servers are 2008. I have around 50 users, over the past 2 months, I have had a handful of the users suddenly unable to login to Sharepoint. When they login, they either get a blank screen or they are repropmted. These users are using accounts that have been used for many months, sometimes the problem originates with a password change. In all cases, the users account works on every other Active Directory authenticated resource (domain, exchange, LDAP). In the most recent case, last night I was forced deleted a user ("John smith") because of corruption. The orifinal account name was jsmith. I deleted him from active directory, then deleted him from the profile list in Sharepoint Shared Services. I could not find a way to delete him from the Sharepoint user list, but I reran the import after recreating his account (renamed it too just to be sure to "smithj"). At first, this did not wor, the user could still access all other resources but Sharepoint. then, some 30 minutes later it inexplicably started working. This morning, the user changed passwords, which immediatly broke the login on Sharepoint again. I am at a loss on how to troubleshoot this.

    Read the article

  • logfile deleted on Oracle database how to re-create it?

    - by Daniel
    for my database assignment we were looking into 'database corruption' and I was asked to delete the second redo log file which I have done with the command: rm log02a.rdo this was in the $HOME/ORADATA/u03 directory. Now I started up my database using startup pfile=$PFILE nomount then I mounted it using the command alter database mount; now when I try to open it alter database open; it gives me the error: ORA-03113: end-of-file on communication channel Process ID: 22125 Session ID: 25 Serial number: 1 I am assuming this is because the second redo log file is missing. There is still log01a.rdo, but not the one I have deleted. How can I go about recovering this now so that I can open my database again? I have looked into the database created scripts, and it specified the log02a.rdo file to be size 10M and part of group 2. If I do select group#, member from v$logfile; I get: 1 /oradata/student_db/user06/ORADATA/u03/log01a.rdo 2 /oradata/student_db/user06/ORADATA/u03/log02a.rdo 3 /oradata/student_db/user06/ORADATA/u03/log03a.rdo 4 /oradata/student_db/user06/ORADATA/u03/log04a.rdo So it is part of group 2. If I try to add the log02a.rdo file again "already part of the database". If I drop group 2 and then add it again with these commands: ALTER DATABASE ADD LOGFILE GROUP 2 ('$HOME/ORADATA/u03/log02a.rdo') SIZE 10M; Nothing. Supposedly alters the database, but it still won't start up. Any ideas what I can do to re-create this and be able to open my database again?

    Read the article

  • disk write cache buffer and separate power supply

    - by HugoRune
    Windows has a setting to turn off the write-cache buffer (see image) Turn off Windows write-cache buffer flushing on the device To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure. Is it feasible and economical to get such a "separate power supply" for the internal sata drives of a non-server PC? Under what name is such a power supply sold? I know that there are UPS devices that can be connected to external drives,but what is required to be able to switch this setting safely on for an internal disk? The setting has different descriptions in different version of windows Windows XP: Enable write caching on the disk This setting enables write caching in Windows to improve disk performance, but a power outage or equipment failure might result in data loss or corruption. Windows Server 2003: Enable write caching on the disk Recommended only for disks with a backup power supply. This setting further improves disk performance, but it also increases the risk of data loss if the disk loses power. Windows Vista: Enable advanced performance Recommended only for disks with a backup power supply. This setting further improves disk performance, but it also increases the risk of data loss if the disk loses power. Windows 7 and 8: Turn off Windows write-cache buffer flushing on the device To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure. This article by Raymond Chen has some more detailed information about what the setting does.

    Read the article

  • MySQL Tables Missing/Corrupt After Recreation

    - by Synetech inc.
    Hi, Yesterday I dumped my MySQL databases to an SQL file and renamed the ibdata1 file. I then recreated it and imported the SQL file and moved the new ibdata1 file to my MySQL data directory, deleting the old one. I’ve done it before without issue, however this time something is not right. When I examine the (personal, not MySQL config) databases, they are all there, but they are empty… sort of. The data directory still has the .ibd files with the correct content in them and I can view the table list in the databases, but not the tables themselves. (I have file-per-table enabled, and am using InnoDB as default for everything.) For example with the urls database and its urls table, I can successfully open mysql.exe or phpMyAdmin and use urls;. I can even show tables; to see the expected table, but then when I try to describe urls; or select * from urls;, it complains that the table does not exist (even though it just listed it). (The MySQL Administrator lists the databases, but does not even list the tables, it indicates that the dbs are completely empty.) The problem now is that I have already deleted the SQL file (and cannot recover it even after scouring my hard-drive). So I am trying to figure out a way to repair these databases/tables. I can’t use the table repair function since it complains that the table does not exist, and I can’t dump them because again, it complains that the tables don’t exist. Like I’ve said, the data itself is still present in the .ibd files and the table names are present. I just need a way to get MySQL to recognize that the tables exist in the databases (I can find the column names of the tables in question in the ibdata1 file using a hex-editor). Any idea how I can repair this type of corruption? I don’t mind rolling up my sleeves, digging in, and taking a bunch of steps to fix it. Thanks a lot.

    Read the article

  • Computer is dying--what should I be looking for?

    - by Will
    Okay, I'm a bit knowledgeable with pooters and such, but i'm confused. My computer is dying slowly, and I'm not sure what part is causing this. Computer details: Vista, dell machine, intel Q6600, 2.4 Core Duo (quad core), standard memory and drive (unknown manufacturer). Symptoms: I would best describe the symptoms as memory corruption. After a couple days on, I start getting applications crashing or failing to open for a lack of "resources". Sounds are corrupted. Onscreen text gets corrupted; the characters of text are garbled, not the pixels on the screen. Video memory seems untouched as I haven't seen any misplaced pixels. Recently I've lost files on disk. I've also experienced errors reporting a supposed lack of disk space, even though I have fifty gigs free. There was one point where I couldn't get to the POST when booting up. After I cleaned everything (see next) this hasn't happened. Diagnostic steps: First thing I did was clean the case. There was a lot of dust buildup on heatsinks, so I cleaned all that up. No help. Next, I disconnected and reconnected everything, from power cables to memory (did not reseat cpu). No change. Last, I ran the standard vista memory diagnostics and ran checkdisk. Both reported no errors found. I have not run any POST tests, now that I think about it. I'm at a loss at this point. Disk appears fine, memory too. I'd expect motherboard issues to result in the thing not booting up, yet it does every time. What should I be looking at? What more can I do?

    Read the article

  • Tracking EXC_BAD_ACCESS on iPad

    - by Aleks
    I've been using this code to create my UIWindow UIMyWindow* win = [[UIMyWindow alloc] initWithFrame:[[UIScreen mainScreen] applicationFrame]]; UIMyWindow isn't anything special it just has a pointer to a C++ class that does some wrapping of ObjectiveC. Recently my application start crashing after adding some line of code that doesn't have to do anything with the error. The line of code that I added is just allocating a C++ object but the program execution never reaches this line. Interesting enough my code works in Release. My only guess is that I made some memory corruption on a completely different place. My questions are: What type of memory corruption that can be? And is there some good practices to track them down?

    Read the article

  • Symfony/Propel NestedSet left/right ID corruption/adjustment

    - by Mike Crowe
    Hi folks, I have a nested set application that seems to be getting corrupted. Here's what I'm seeing: We're using nested sets for a binary tree (any node can have 2 children). It appears to be working fine, but some event causes a discrepancy. For instance, when I do a getNumberOfDescendants() for the root node, it will slowly increase as this event happens. However, displaying the tree works fine, as does inserting (apparently). Has anybody seen anything like this before? For instance, my repair program shows this as the repairs that it makes: User pxxxxx left 0=>0, right 145=>129 User axxxxx left 1=>1, right 124=>106 User mxxxxx left 119=>117, right 120=>118 User fxxxxx left 125=>107, right 144=>128 User fxxxxx left 126=>108, right 131=>113 User rxxxxx left 127=>109, right 128=>110 User mxxxxx left 129=>111, right 130=>112 User mxxxxx left 132=>114, right 143=>127 User cxxxxx left 133=>115, right 142=>126 User gxxxxx left 134=>116, right 137=>121 User mxxxxx left 135=>119, right 136=>120 User jxxxxx left 138=>122, right 141=>125 User axxxxx left 139=>123, right 140=>124 I thought at first it was when I deleted a user, but it has since occurred w/o that event. Anybody know of a cause that might generate this? I've tested ad nauseum on my local machine, but I can't duplicate it. I do have an issue where my production box is PHP 5.2.0, whereas my test device is 5.2.10. Could that be an issue? TIA Mike

    Read the article

  • Strange problem with NSMutableArray - Possibly some memory corruption

    - by user210504
    Hi! I am trying to update data in a table view using a NSMutableArray. Quite simple :( What is happening is that I get my data from a NSURLConnection Callback, which I parse and store it in an array and call reload data on the table view. The problem is that when cellForRowAtIndexPath is called back by the framework. The array still shows the correct count of the elements but all string objects I had stored earlier are shown as invalid. Any pointers

    Read the article

  • Memorystream and Large Object Heap

    - by Flo
    I have to transfer large files between computers on via unreliable connections using WCF. Because I want to be able to resume the file and I don't want to be limited in my filesize by WCF, I am chunking the files into 1MB pieces. These "chunk" are transported as stream. Which works quite nice, so far. My steps are: open filestream read chunk from file into byet[] and create memorystream transfer chunk back to 2. until the whole file is sent My problem is in step 2. I assume that when I create a memory stream from a byte array, it will end up on the LOH and ultimately cause an outofmemory exception. I could not actually create this error, maybe I am wrong in my assumption. Now, I don't want to send the byte[] in the message, as WCF will tell me the array size is too big. I can change the max allowed array size and/or the size of my chunk, but I hope there is another solution. My actual question(s): Will my current solution create objects on the LOH and will that cause me problem? Is there a better way to solve this? Btw.: On the receiving side I simple read smaller chunks from the arriving stream and write them directly into the file, so no large byte arrays involved.

    Read the article

  • Download and write .tar.gz files without corruption.

    - by arbales
    I've tried numerous ways of downloading files, specifically .zip and .tar.gz, with Ruby and write them to the disk. I've found that the file appears to be the same as the reference (in size), but the archives refuse to extract. What I'm attempting now is: Thanks! def download_request(url, filePath:path, progressIndicator:progressBar) file = File.open(path, "w+") begin Net::HTTP.get_response URI.parse(url) do |response| if response['Location']!=nil puts 'Direct to: ' + response['Location'] return download_request(response['Location'], filePath:path, progressIndicator:progressBar) end # some stuff response.read_body do |segment| file.write(segment) # some progress stuff. end end ensure file.close end end download_request("http://github.com/jashkenas/coffee-script/tarball/master", filePath:"tarball.tar.gz", progressIndicator:nil)

    Read the article

  • SQL Compact Edition database corruption

    - by jdv
    Hi, Our product is using MS SQL Compact Edition on a Windows machine (laptop). It's basically a metadata index for files we have on the filesystem. Recently we have seen databases getting corrupted. This happens when the machine is very busy moving files around and has to do a tiny bit of database changes at the same time. I was somewhat shocked that was at all possible. It was my expectation that the database would stay coherent whatever the circumstances. Of course we are doing something wrong. Things we have checked so far are: Use of only one db connection per thread specify the maximum size when opening the database The database is accessed only by one application, a .net based windows service. Are there other gotcha's?

    Read the article

  • Unusual heap size limitations in VS2003 C++

    - by Shane MacLaughlin
    I have a C++ app that uses large arrays of data, and have noticed while testing that it is running out of memory, while there is still plenty of memory available. I have reduced the code to a sample test case as follows; void MemTest() { size_t Size = 500*1024*1024; // 512mb if (Size > _HEAP_MAXREQ) TRACE("Invalid Size"); void * mem = malloc(Size); if (mem == NULL) TRACE("allocation failed"); } If I create a new MFC project, include this function, and run it from InitInstance, it works fine in debug mode (memory allocated as expected), yet fails in release mode (malloc returns NULL). Single stepping through release into the C run times, my function gets inlined I get the following // malloc.c void * __cdecl _malloc_base (size_t size) { void *res = _nh_malloc_base(size, _newmode); RTCCALLBACK(_RTC_Allocate_hook, (res, size, 0)); return res; } Calling _nh_malloc_base void * __cdecl _nh_malloc_base (size_t size, int nhFlag) { void * pvReturn; // validate size if (size > _HEAP_MAXREQ) return NULL; ' ' And (size _HEAP_MAXREQ) returns true and hence my memory doesn't get allocated. Putting a watch on size comes back with the exptected 512MB, which suggests the program is linking into a different run-time library with a much smaller _HEAP_MAXREQ. Grepping the VC++ folders for _HEAP_MAXREQ shows the expected 0xFFFFFFE0, so I can't figure out what is happening here. Anyone know of any CRT changes or versions that would cause this problem, or am I missing something way more obvious?

    Read the article

  • java.lang.OutOfMemoryError: Java heap space

    - by houlahan
    i get this error when calling a mysql Prepared Statement every 30 seconds this is the code which is been called: public static int getUserConnectedatId(Connection conn, int i) throws SQLException { pstmt = conn.prepareStatement("SELECT UserId from connection where ConnectionId ='" + i + "'"); ResultSet rs = pstmt.executeQuery(); int id = -1; if (rs.next()) { id = rs.getInt(1); } pstmt = null; rs = null; return id; } not sure what the problem is :s thanks in advanced.

    Read the article

  • Copying a IO stream results in corruption.

    - by StackedCrooked
    I have a small Mongrel webserver that sends the stdout of a process to a http response: response.start(200) do |head,out| head["Content-Type"] = "text/html" status = POpen4::popen4(command) do |stdout, stderr, stdin, pid| stdin.close() FileUtils.copy_stream(stdout, out) FileUtils.copy_stream(stderr, out) puts "Sent response." end end This works well most of the time, but sometimes characters get duplicated. For example this is what I get from the "man ls" command: LS(1) User Commands LS(1) NNAAMMEE ls - list directory contents SSYYNNOOPPSSIISS llss [_O_P_T_I_O_N]... [_F_I_L_E]... DDEESSCCRRIIPPTTIIOONN List information about the FILEs (the current directory by default). Sort entries alphabetically if none of --ccffttuuvvSSUUXX nor ----ssoorrtt. Mandatory arguments to long options are mandatory for short options For some mysterious reason capital letters get duplicated. Can anyone explain what's going on?

    Read the article

  • Sharepoint AD imported users are becomming sporadically corrupted, causing us to have to create a new account

    - by TrevJen
    Sharepoint 2007 MOSS with AD imported users. All servers are 2008. ***UPDATE More details in testing. This Sharepoint is in an AD Child domain (clients.mycompany.local), which is sub to the root of the AD tree (mycompany.local). The user is in the parent tree (as are half of the other functional users. I have elevated the user rights to Domain. In looking at the logs, it seems that the Sharepoint server is trying to authenticate them by querying the DC for the clients domain (which is the way it normally works and still works for all existing identically configured users). I think if I could force it to authenticate up to the top domain DC then it would be ok?? I have around 50 users, over the past 2 months, I have had a handful of the users suddenly unable to login to Sharepoint. When they login, they either get a blank screen or they are repropmted. These users are using accounts that have been used for many months, sometimes the problem originates with a password change. In all cases, the users account works on every other Active Directory authenticated resource (domain, exchange, LDAP). In the most recent case, last night I was forced deleted a user ("John smith") because of corruption. The orifinal account name was jsmith. I deleted him from active directory, then deleted him from the profile list in Sharepoint Shared Services. I could not find a way to delete him from the Sharepoint user list, but I reran the import after recreating his account (renamed it too just to be sure to "smithj"). At first, this did not wor, the user could still access all other resources but Sharepoint. then, some 30 minutes later it inexplicably started working. This morning, the user changed passwords, which immediatly broke the login on Sharepoint again. Logs by request from matt b Office SharePoint Server Date: 4/13/2010 2:00:00 PM Event ID: 7888 Task Category: Office Server General Level: Error Keywords: Classic User: N/A Computer: nb-portal-01.clients.netboundary.local Description: A runtime exception was detected. Details follow. Message: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)) – TrevJen 19 hours ago Techinal Details: System.UnauthorizedAccessException: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)) at Microsoft.SharePoint.SPGlobal.HandleUnauthorizedAccessException(UnauthorizedAccessException ex) at Microsoft.SharePoint.Library.SPRequest.UpdateField(String bstrUrl, String bstrListName, String bstrXML) at Microsoft.SharePoint.SPField.UpdateCore(Boolean bToggleSealed) – TrevJen 19 hours ago at Microsoft.SharePoint.SPField.Update() at Microsoft.Office.Server.UserProfiles.SiteSynchronizer.UserSynchronizer.PushSchemaToList(Boolean& bAddedColumn) at Microsoft.Office.Server.UserProfiles.SiteSynchronizer.UserSynchronizer.SynchFull() at Microsoft.Office.Server.UserProfiles.SiteSynchronizer.Synch() at Microsoft.Office.Server.Diagnostics.FirstChanceHandler.ExceptionFilter(Boolean fRethrowException, TryBlock tryBlock, FilterBlock filter, CatchBlock catchBlock, FinallyBlock finallyBlock) – TrevJen 19 hours ago Log Name: Application Source: Office SharePoint Server Date: 4/13/2010 2:00:00 PM Event ID: 5553 Task Category: User Profiles Level: Error Keywords: Classic User: N/A Computer: nb-portal-01.clients.netboundary.local Description: failure trying to synch site 6fea15e2-0899-4c19-9016-44d77834c018 for ContentDB b2002b0b-3d4c-411a-8c4f-3d047ca9322c WebApp 3aff7051-455d-4a70-a377-5b1c36df618e. Exception message was Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)). – TrevJen 18 hours ago

    Read the article

  • Causes of sudden massive filesystem damage? ("root inode is not a directory")

    - by poolie
    I have a laptop running Maverick (very happily until yesterday), with a Patriot Torx SSD; LUKS encryption of the whole partition; one lvm physical volume on top of that; then home and root in ext4 logical volumes on top of that. When I tried to boot it yesterday, it complained that it couldn't mount the root filesystem. Running fsck, basically every inode seems to be wrong. Both home and root filesystems show similar problems. Checking a backup superblock doesn't help. e2fsck 1.41.12 (17-May-2010) lithe_root was not cleanly unmounted, check forced. Resize inode not valid. Recreate? no Pass 1: Checking inodes, blocks, and sizes Root inode is not a directory. Clear? no Root inode has dtime set (probably due to old mke2fs). Fix? no Inode 2 is in use, but has dtime set. Fix? no Inode 2 has a extra size (4730) which is invalid Fix? no Inode 2 has compression flag set on filesystem without compression support. Clear? no Inode 2 has INDEX_FL flag set but is not a directory. Clear HTree index? no HTREE directory inode 2 has an invalid root node. Clear HTree index? no Inode 2, i_size is 9581392125871137995, should be 0. Fix? no Inode 2, i_blocks is 40456527802719, should be 0. Fix? no Reserved inode 3 (<The ACL index inode>) has invalid mode. Clear? no Inode 3 has compression flag set on filesystem without compression support. Clear? no Inode 3 has INDEX_FL flag set but is not a directory. Clear HTree index? no .... Running strings across the filesystems, I can see there are what look like filenames and user data there. I do have sufficiently good backups (touch wood) that it's not worth grovelling around to pull back individual files, though I might save an image of the unencrypted disk before I rebuild, just in case. smartctl doesn't show any errors, neither does the kernel log. Running a write-mode badblocks across the swap lv doesn't find problems either. So the disk may be failing, but not in an obvious way. At this point I'm basically, as they say, fscked? Back to reinstalling, perhaps running badblocks over the disk, then restoring from backup? There doesn't even seem to be enough data to file a meaningful bug... I don't recall that this machine crashed last time I used it. At this point I suspect a bug or memory corruption caused it to write garbage across the disks when it was last running, or some kind of subtle failure mode for the SSD. What do you think would have caused this? Is there anything else you'd try?

    Read the article

  • Can you force a crash if a write occurs to a given memory location with finer than page granularity?

    - by Joseph Garvin
    I'm writing a program that for performance reasons uses shared memory (alternatives have been evaluated, and they are not fast enough for my task, so suggestions to not use it will be downvoted). In the shared memory region I am writing many structs of a fixed size. There is one program responsible for writing the structs into shared memory, and many clients that read from it. However, there is one member of each struct that clients need to write to (a reference count, which they will update atomically). All of the other members should be read only to the clients. Because clients need to change that one member, they can't map the shared memory region as read only. But they shouldn't be tinkering with the other members either, and since these programs are written in C++, memory corruption is possible. Ideally, it should be as difficult as possible for one client to crash another. I'm only worried about buggy clients, not malicious ones, so imperfect solutions are allowed. I can try to stop clients from overwriting by declaring the members in the header they use as const, but that won't prevent memory corruption (buffer overflows, bad casts, etc.) from overwriting. I can insert canaries, but then I have to constantly pay the cost of checking them. Instead of storing the reference count member directly, I could store a pointer to the actual data in a separate mapped write only page, while keeping the structs in read only mapped pages. This will work, the OS will force my application to crash if I try to write to the pointed to data, but indirect storage can be undesirable when trying to write lock free algorithms, because needing to follow another level of indirection can change whether something can be done atomically. Is there any way to mark smaller areas of memory such that writing them will cause your app to blow up? Some platforms have hardware watchpoints, and maybe I could activate one of those with inline assembly, but I'd be limited to only 4 at a time on 32-bit x86 and each one could only cover part of the struct because they're limited to 4 bytes. It'd also make my program painful to debug ;)

    Read the article

  • Why did my flash drive become "read only" and (how) can I fix it?

    - by Bob
    I have a brand new flash drive (one week old) that has become marked as read only, by Windows, Kubuntu and a bootable partitioner. Why did this happen? Is it fixable? If it is, how can I fix this? The problem Firstly, this drive is new. It's certainly not been used enough to die from normal wear and tear, though I would not discount defective components. The drive itself has somehow become locked in a read only state. Windows' Disk management: Diskpart: Generic Flash Disk USB Device Disk ID: 33FA33FA Type : USB Status : Online Path : 0 Target : 0 LUN ID : 0 Location Path : UNAVAILABLE Current Read-only State : Yes Read-only : No Boot Disk : No Pagefile Disk : No Hibernation File Disk : No Crashdump Disk : No Clustered Disk : No What really confuses me is Current Read-only State : Yes and Read-only : No. Attempted solutions So far, I've tried: Formatting it in Windows (in Disk management, the format options are greyed out when right clicking). DiskPart Clean (CLEAN - Clear the configuration information, or all information, off the disk.): DISKPART> clean DiskPart has encountered an error: The media is write protected. See the System Event Log for more information. There was nothing in the event log. Windows command line format >format G: Insert new disk for drive G: and press ENTER when ready... The type of the file system is FAT32. Verifying 7740M Cannot format. This volume is write protected. Windows chkdsk: see below for details Kubuntu fsck (through VirtualBox USB passthrough): see below for details Acronis True Image to format, to convert to GPT, to destroy and rebuild MBR, basically anything: failed (could not write to MBR) Details (and a nice story) Background This was a brand new, generic, 8GB flash drive I wanted to create a multiboot flash drive with. It came formatted as FAT32, though oddly a little larger than most 8 GIGAbyte flash drives I've come across. Approximately 127MB was listed as "used" by Windows. I never discovered why. The end usable space was about what I normally expect from a 8GB drive (approx 7.4 GIBIbytes). I had thrown quite a few Linux distros on, along with a copy of Hiren's. They would all boot perfectly. They were put on with YUMI. When I tried to put the Knoppix DVD on, YUMI added an odd video option to its boot comman which caused Knoppix to boot with a black screen on X. ttys 1 through 6 still worked as text only interfaces. A few days later, I took some time to take that odd video option off, making the boot command match the one that comes with Knoppix. On the attempt to boot, Knoppix reported some form of LZMA corruption. Leading up to the current issue I was thinking the Knoppix files may have been corrupted somehow, so I tried reloading it. The drive was nearly full (45MB free), so I deleted a generic ISO that also was not booting. That went fine. I then went through YUMI to 'uninstall' Knoppix, i.e. delete files and remove from the menus. The files went first, then the menus were cleared successfully. However, the free space was stuck at about 700MB, same as it was before removing Knoppix. In the old Knoppix folder, there was a 0 byte file named KNOPPIX that could not be deleted. I tried reinserting the drive to delete this file - without safely removing, if that made a difference (hey, first time for everything). Running the standard Windows chkdsk scan without /r or /f reported errors found. Running with /r just got it stuck. I decided to give fsck a shot, so I loaded up my Kubuntu VM and attached the drive to it with VirtualBox's USB 2.0 passthrough. I umounted it (/dev/sda1) and ran a fsck. There are differences between boot sector and its backup. I chose No action. It told me FATs differ and asked me to select either the first or second FAT. Whichever I selected, I got a notice of Free cluster summary wrong. If I chose Correct, it gave a list of incorrect file names. To try to fix something, at least, I ran it with the -p option. Halfway through fixing the files, the VM froze - I ended its process about ten minutes later. Cause? My next attempt was to use YUMI, again, to rebuild the whole drive. I used YUMI's built in reformat (to FAT32) option and installed a Kubuntu ISO (700MB). The format was successful, however, the extract and copy of Kubuntu (which YUMI uses a 7zip binary for) froze at about 60% done. After waiting for about fifteen minutes (longer than the 3.5GB Knoppix ISO took last time), I pulled the drive out. The drive at this point was already formatted, SYSLINUX already installed, just waiting on the unpacking of an ISO and the modifying of the boot menus. Plugging it back in, it came up as normal - however, any write action would fail. Disk management reported it as read only. On reconnect, it would come up as normal but a write operation would cause it to go read only again. After a few attempts, it started coming up as read only on insertion. Attempts to fix This is when I ran through the attempts listed above, to try and reformat it in case of a faulty format. However the inability to do so even on a bootable disk indicated something more serious is wrong. chkdsk now reports nothing is wrong, and fsck still reports MBR inconsistencies, but now always chooses first FAT automatically after telling me FATs differ. It still does the same Free cluster summary wrong afterwards. I cannot run with -p anymore because it is now marked as read only. It also managed to corrupt my VM's disk somehow on the first attempt (yes, I'm sure I chose sda, which is mapped to a 7.4GB drive - I triple checked). Thank god for snapshots? I'm just about out of ideas. To my inexperienced mind it looks like something in the drive's firmware set it to read only "permanently" somehow - is there any way to reset this? I don't particularly care about keeping data, considering I've reformatted it twice. Also, fixes that keep me in Windows are better; it reduces the risk of me accidentally nuking my main hard drive. Update 1: I pulled apart the drive out of curiosity. As you can see, there are no obvious write protect switches. There is an IC on the other side, ALCOR branded labelled AU6989HL, if that matters. If there appears to be no way to fix this, I'll probably pull out the (glued down) card and put it in a card reader to check if it's the card or the controller that died. Update 2: I've pulled the card off, Windows detects the drive as a card reader now. The contacts on the card don't appear to be used, and there are several rows of holes on the card itself. Putting it into the card reader only detects about 30MB total, RAW. It's probably either the reader incorrectly reporting the card as faulty (as if a real SD card's write protect was switched on) or a bad contact somewhere. If nothing else, I have a spare 8GB Micro SD card now... as soon as I figure out how to format it as 8GB.

    Read the article

  • Can I switch the Visual C++ runtime to another heap?

    - by sharptooth
    My program uses a third party dynamic link library that has huge memory leaks inside. Both my program and the library are Visual C++ native code. Both link to the Visual C++ runtime dynamically. I'd like to force the library into another heap so that all allocations that are done through the Visual C++ runtime while the library code is running are done on that heap. I can call HeapCreate() and later HeapDestroy(). If I somehow ensure that all allocations are done in the new heap I don't care of the leaks anymore - they all go when I destroy the second heap. Is it possible to force the Visual C++ runtime to make all allocations on a specified heap?

    Read the article

  • Using R to Analyze G1GC Log Files

    - by user12620111
    Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=||   Using R to Analyze G1GC Log Files   Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z\(\),]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\\( " , " " ); gsub ( "\ \)" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period:     par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight.   plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

    Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >