Search Results

Search found 10536 results on 422 pages for 'cpu usage'.

Page 99/422 | < Previous Page | 95 96 97 98 99 100 101 102 103 104 105 106  | Next Page >

  • Generate a list of file names based on month and year arithmetic

    - by MacUsers
    How can I list the numbers 01 to 12 (one for each of the 12 months) in such a way so that the current month always comes last where the oldest one is first. In other words, if the number is grater than the current month, it's from the previous year. e.g. 02 is Feb, 2011 (the current month right now), 03 is March, 2010 and 09 is Sep, 2010 but 01 is Jan, 2011. In this case, I'd like to have [09, 03, 01, 02]. This is what I'm doing to determine the year: for inFile in os.listdir('.'): if inFile.isdigit(): month = months[int(inFile)] if int(inFile) <= int(strftime("%m")): year = strftime("%Y") else: year = int(strftime("%Y"))-1 mnYear = month + ", " + str(year) I don't have a clue what to do next. What should I do here? Update: I think, I better upload the entire script for better understanding. #!/usr/bin/env python import os, sys from time import strftime from calendar import month_abbr vGroup = {} vo = "group_lhcb" SI00_fig = float(2.478) months = tuple(month_abbr) print "\n%-12s\t%10s\t%8s\t%10s" % ('VOs','CPU-time','CPU-time','kSI2K-hrs') print "%-12s\t%10s\t%8s\t%10s" % ('','(in Sec)','(in Hrs)','(*2.478)') print "=" * 58 for inFile in os.listdir('.'): if inFile.isdigit(): readFile = open(inFile, 'r') lines = readFile.readlines() readFile.close() month = months[int(inFile)] if int(inFile) <= int(strftime("%m")): year = strftime("%Y") else: year = int(strftime("%Y"))-1 mnYear = month + ", " + str(year) for line in lines[2:]: if line.find(vo)==0: g, i = line.split() s = vGroup.get(g, 0) vGroup[g] = s + int(i) sumHrs = ((vGroup[g]/60)/60) sumSi2k = sumHrs*SI00_fig print "%-12s\t%10s\t%8s\t%10.2f" % (mnYear,vGroup[g],sumHrs,sumSi2k) del vGroup[g] When I run the script, I get this: [root@serv07 usage]# ./test.py VOs CPU-time CPU-time kSI2K-hrs (in Sec) (in Hrs) (*2.478) ================================================== Jan, 2011 211201372 58667 145376.83 Dec, 2010 5064337 1406 3484.07 Feb, 2011 17506049 4862 12048.04 Sep, 2010 210874275 58576 145151.33 As I said in the original post, I like the result to be in this order instead: Sep, 2010 210874275 58576 145151.33 Dec, 2010 5064337 1406 3484.07 Jan, 2011 211201372 58667 145376.83 Feb, 2011 17506049 4862 12048.04 The files in the source directory reads like this: [root@serv07 usage]# ls -l total 3632 -rw-r--r-- 1 root root 1144972 Feb 9 19:23 01 -rw-r--r-- 1 root root 556630 Feb 13 09:11 02 -rw-r--r-- 1 root root 443782 Feb 11 17:23 02.bak -rw-r--r-- 1 root root 1144556 Feb 14 09:30 09 -rw-r--r-- 1 root root 370822 Feb 9 19:24 12 Did I give a better picture now? Sorry for not being very clear in the first place. Cheers!! Update @Mark Ransom This is the result from Mark's suggestion: [root@serv07 usage]# ./test.py VOs CPU-time CPU-time kSI2K-hrs (in Sec) (in Hrs) (*2.478) ========================================================== Dec, 2010 5064337 1406 3484.07 Sep, 2010 210874275 58576 145151.33 Feb, 2011 17506049 4862 12048.04 Jan, 2011 211201372 58667 145376.83 As I said before, I'm looking for the result to b printed in this order: Sep, 2010 - Dec, 2010 - Jan, 2011 - Feb, 2011 Cheers!!

    Read the article

  • Adding Runtime Intelligence Application Analytics for a library and not an application

    - by brickner
    I want to add usage statistics for a .NET 4.0 library I write on CodePlex. I try to follow the step described here but my problem lies with the fact that what I write is a library and not an application. One of the steps is put the Setup and Teardown attributes. I thought about adding the Setup attribute on a static constructor or a different place that will run once per usage of the library. My problem lies with the Teardown attribute that should be placed on code that ends the usage. I don't know where to put this attribute. Is it possible to get usage statistics on a library? Maybe I can register on an event that will fire when the application unloads the dll?

    Read the article

  • How can I position QDockWidgets as the screen shot shows using code?

    - by Nathan
    I want a Qt window to come up with the following arrangement of dock widgets on the right. Qt allows you to provide an argument to the addDockWidget method of QMainWindow to specify the position (top, bottom, left or right) but apparently not how two QDockWidgets placed on the same side will be arranged. Here is the code that adds the dock widgets. this uses PyQt4 but it should be the same for Qt with C++ self.memUseGraph = mem_use_widget(self) self.memUseDock = QDockWidget("Memory Usage") self.memUseDock.setObjectName("Memory Usage") self.memUseDock.setWidget(self.memUseGraph) self.addDockWidget(Qt.DockWidgetArea(Qt.RightDockWidgetArea),self.memUseDock) self.diskUsageGraph = disk_usage_widget(self) self.diskUsageDock = QDockWidget("Disk Usage") self.diskUsageDock.setObjectName("Disk Usage") self.diskUsageDock.setWidget(self.diskUsageGraph) self.addDockWidget(Qt.DockWidgetArea(Qt.RightDockWidgetArea),self.diskUsageDock) When this code is used to add both of them to the right side, one is above the other, not like the screen shot I made. The way I made that shot was to drag them there with the mouse after starting the program, but I need it to start that way.

    Read the article

  • converting bash script to .bat

    - by Robokop
    #!/bin/bash function usage(){ cat <<EOF USAGE: $0 [strategylist] valid strategies are: ALLD ALLC TitForTat JOSS WeightedRandom Tester EOF exit 1 } [ -z $1 ] && usage javac robsAgents/*.java robsAgents/behaviours/*.java agentlist='leader:robsAgents.TournamentLeader' agentlist=$agentlist";$1:robsAgents.Contestant" while shift; do agentlist=$agentlist";$1:robsAgents.Contestant" done java jade.Boot -gui -host 127.0.0.1 "$agentlist" i have above bash script and have no access to a windows computer and i need to convert it to a .bat file, but don't even know how to do the shift and argument parsing

    Read the article

  • How to detect .NET WPF memory leak or GC long run?

    - by Néstor Sánchez A.
    I have the next very strange situation and problem: .NET 4.0 application for diagram editing (WPF). Runs ok in my PC: 8GM RAM, 3.0GHz, i7 quad-core. While creating objects (mostly diagram nodes and connectors, plus all the undo/redo information) the TaskManager show, as expected, some memory usage "jumps" (up and down). These mem-usage "jumps" also remains executing AFTER user interaction ended. Maybe this is the GC cleaning/regorganizing memory? To see what is going on, I've used the Ants mem profiler, but somewhat it prevents those "jumps" to happen after user interaction. PROBLEM: It Freezes/Hangs after seconds or minutes of usage in some slow/weak laptos/netbooks of my beta testers (under 2GHz of speed and under 2GB of RAM). I was thinking of a memory leak, but... EDIT: Also, there is the case that the memory usage grows and grows until collapse (only in slow machines). In a Windows XP Mode machine (VM in Win 7) with only 512MB of RAM Assigned it works fine without mem-usage "jumps" after user interaction (no GC cleaning?!). So, I really have a big trouble because I cannot reproduce the error, only see these strange behaviour (mem jumps), and the tool supposed to show me what is happening is hiding the problem (like the "observer's paradox"). Any ideas on what's happening and how to solve it?

    Read the article

  • Execute a command using php under ssh2 in php

    - by Mervyn
    Using Mint terminal my script connects using ssh2_connect and ssh2_auth-password. When am logged in successfully I want to run a command which will give me the hardware cpu. Is there a way I can use to exec the command in my script then show the results. I have used system and exec for pinging. if i was in the terminal i do the login. then type "get hardware cpu" in the terminal it would look like this: Test~ $ get hardware cpu

    Read the article

  • Java: "cannot find symbol" error of a String[] defined within a while-loop

    - by David
    Here's the relevant code: public static String[] runTeams (String CPUcolor) { boolean z = false ; //String[] a = new String[6] ; boolean CPU = false ; while (z == false) { while (CPU==false) { String[] a = assignTeams () ; printOrder (a) ; for (int i = 1; i<a.length; i++) { if (a[i].equals(CPUcolor)) CPU = true ; } if (CPU==false) { System.out.println ("ERROR YOU NEED TO INCLUDE THE COLOR OF THE CPU IN THE TURN ORDER") ; } } System.out.println ("is this turn order correct? (Y/N)") ; String s = getIns () ; while (!((s.equals ("y")) || (s.equals ("Y")) || (s.equals ("n")) || (s.equals ("N")))) { System.out.println ("try again") ; s = getIns () ; } if (s.equals ("y") || s.equals ("Y") ) z = true ; } return a ; } the error i get is: Risk.java:416: cannot find symbol symbol : variable a location: class Risk return a ; ^ Why did i get this error? It seems that a is clearly defined in the line String[] a = assignTeams () ; and if anything is used by the lineprintOrder (a) ;` it seems to me that if the symbol a really couldn't be found then the compiler should blow up there and not at the return statment. (also the method assignTeams returns an array of Strings.)

    Read the article

  • Unity not Working 14.04

    - by Back.Slash
    I am using Ubuntu 14.04 LTS x64. I did a sudo apt-get upgrade yesterday and restarted my PC. Now my taskbar and panel are missing. When I try to restart Unity using unity --replace Then I get error: unity-panel-service stop/waiting compiz (core) - Info: Loading plugin: core compiz (core) - Info: Starting plugin: core unity-panel-service start/running, process 3906 compiz (core) - Info: Loading plugin: ccp compiz (core) - Info: Starting plugin: ccp compizconfig - Info: Backend : gsettings compizconfig - Info: Integration : true compizconfig - Info: Profile : unity compiz (core) - Info: Loading plugin: composite compiz (core) - Info: Starting plugin: composite compiz (core) - Info: Loading plugin: opengl compiz (core) - Info: Unity is fully supported by your hardware. compiz (core) - Info: Unity is fully supported by your hardware. compiz (core) - Info: Starting plugin: opengl libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/i965_dri.so failed (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so: undefined symbol: _glapi_tls_Dispatch) libGL error: dlopen ${ORIGIN}/dri/i965_dri.so failed (${ORIGIN}/dri/i965_dri.so: cannot open shared object file: No such file or directory) libGL error: dlopen /usr/lib/dri/i965_dri.so failed (/usr/lib/dri/i965_dri.so: cannot open shared object file: No such file or directory) libGL error: unable to load driver: i965_dri.so libGL error: driver pointer missing libGL error: failed to load driver: i965 libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so failed (/usr/lib/x86_64-linux-gnu/dri/swrast_dri.so: undefined symbol: _glapi_tls_Dispatch) libGL error: dlopen ${ORIGIN}/dri/swrast_dri.so failed (${ORIGIN}/dri/swrast_dri.so: cannot open shared object file: No such file or directory) libGL error: dlopen /usr/lib/dri/swrast_dri.so failed (/usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory) libGL error: unable to load driver: swrast_dri.so libGL error: failed to load driver: swrast compiz (core) - Info: Loading plugin: compiztoolbox compiz (core) - Info: Starting plugin: compiztoolbox compiz (core) - Info: Loading plugin: decor compiz (core) - Info: Starting plugin: decor compiz (core) - Info: Loading plugin: vpswitch compiz (core) - Info: Starting plugin: vpswitch compiz (core) - Info: Loading plugin: snap compiz (core) - Info: Starting plugin: snap compiz (core) - Info: Loading plugin: mousepoll compiz (core) - Info: Starting plugin: mousepoll compiz (core) - Info: Loading plugin: resize compiz (core) - Info: Starting plugin: resize compiz (core) - Info: Loading plugin: place compiz (core) - Info: Starting plugin: place compiz (core) - Info: Loading plugin: move compiz (core) - Info: Starting plugin: move compiz (core) - Info: Loading plugin: wall compiz (core) - Info: Starting plugin: wall compiz (core) - Info: Loading plugin: grid compiz (core) - Info: Starting plugin: grid compiz (core) - Info: Loading plugin: regex compiz (core) - Info: Starting plugin: regex compiz (core) - Info: Loading plugin: imgpng compiz (core) - Info: Starting plugin: imgpng compiz (core) - Info: Loading plugin: session compiz (core) - Info: Starting plugin: session I/O warning : failed to load external entity "/home/sumeet/.compiz/session/10de541a813cc1a8fc140170575114755000000020350005" compiz (core) - Info: Loading plugin: gnomecompat compiz (core) - Info: Starting plugin: gnomecompat compiz (core) - Info: Loading plugin: animation compiz (core) - Info: Starting plugin: animation compiz (core) - Info: Loading plugin: fade compiz (core) - Info: Starting plugin: fade compiz (core) - Info: Loading plugin: unitymtgrabhandles compiz (core) - Info: Starting plugin: unitymtgrabhandles compiz (core) - Info: Loading plugin: workarounds compiz (core) - Info: Starting plugin: workarounds compiz (core) - Info: Loading plugin: scale compiz (core) - Info: Starting plugin: scale compiz (core) - Info: Loading plugin: expo compiz (core) - Info: Starting plugin: expo compiz (core) - Info: Loading plugin: ezoom compiz (core) - Info: Starting plugin: ezoom compiz (core) - Info: Loading plugin: unityshell compiz (core) - Info: Starting plugin: unityshell WARN 2014-06-02 18:46:23 unity.glib.dbus.server GLibDBusServer.cpp:579 Can't register object 'org.gnome.Shell' yet as we don't have a connection, waiting for it... ERROR 2014-06-02 18:46:23 unity.debug.interface DebugDBusInterface.cpp:216 Unable to load entry point in libxpathselect: libxpathselect.so.1.4: cannot open shared object file: No such file or directory compiz (unityshell) - Error: GL_ARB_vertex_buffer_object not supported ERROR 2014-06-02 18:46:23 unity.shell.compiz unityshell.cpp:3850 Impossible to delete the unity locked stamp file compiz (core) - Error: Plugin initScreen failed: unityshell compiz (core) - Error: Failed to start plugin: unityshell compiz (core) - Info: Unloading plugin: unityshell X Error of failed request: BadWindow (invalid Window parameter) Major opcode of failed request: 3 (X_GetWindowAttributes) Resource id in failed request: 0x3e000c9 Serial number of failed request: 10115 Current serial number in output stream: 10116 Any help would be highly appreciated. EDIT : My PC configuration description: Portable Computer product: Dell System XPS L502X (System SKUNumber) vendor: Dell Inc. version: 0.1 serial: 1006ZP1 width: 64 bits capabilities: smbios-2.6 dmi-2.6 vsyscall32 configuration: administrator_password=unknown boot=normal chassis=portable family=HuronRiver System frontpanel_password=unknown keyboard_password=unknown power-on_password=unknown sku=System SKUNumber uuid=44454C4C-3000-1030-8036-B1C04F5A5031 *-core description: Motherboard product: 0YR8NN vendor: Dell Inc. physical id: 0 version: A00 serial: .1006ZP1.CN4864314C0560. slot: Part Component *-firmware description: BIOS vendor: Dell Inc. physical id: 0 version: A11 date: 05/29/2012 size: 128KiB capacity: 2496KiB capabilities: pci pnp upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy360 int13floppy1200 int13floppy720 int5printscreen int9keyboard int14serial int17printer int10video acpi usb ls120boot smartbattery biosbootspecification netboot *-cpu description: CPU product: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz vendor: Intel Corp. physical id: 19 bus info: cpu@0 version: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz serial: Not Supported by CPU slot: CPU size: 800MHz capacity: 800MHz width: 64 bits clock: 100MHz capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid cpufreq configuration: cores=4 enabledcores=4 threads=8 *-cache:0 description: L1 cache physical id: 1a slot: L1-Cache size: 64KiB capacity: 64KiB capabilities: synchronous internal write-through data *-cache:1 description: L2 cache physical id: 1b slot: L2-Cache size: 256KiB capacity: 256KiB capabilities: synchronous internal write-through data *-cache:2 description: L3 cache physical id: 1c slot: L3-Cache size: 6MiB capacity: 6MiB capabilities: synchronous internal write-back unified *-memory description: System Memory physical id: 1d slot: System board or motherboard size: 6GiB *-bank:0 description: SODIMM DDR3 Synchronous 1333 MHz (0.8 ns) product: M471B5273DH0-CH9 vendor: Samsung physical id: 0 serial: 450F1160 slot: ChannelA-DIMM0 size: 4GiB width: 64 bits clock: 1333MHz (0.8ns) *-bank:1 description: SODIMM DDR3 Synchronous 1333 MHz (0.8 ns) product: HMT325S6BFR8C-H9 vendor: Hynix/Hyundai physical id: 1 serial: 0CA0E8E2 slot: ChannelB-DIMM0 size: 2GiB width: 64 bits clock: 1333MHz (0.8ns) *-pci description: Host bridge product: 2nd Generation Core Processor Family DRAM Controller vendor: Intel Corporation physical id: 100 bus info: pci@0000:00:00.0 version: 09 width: 32 bits clock: 33MHz *-pci:0 description: PCI bridge product: Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port vendor: Intel Corporation physical id: 1 bus info: pci@0000:00:01.0 version: 09 width: 32 bits clock: 33MHz capabilities: pci pm msi pciexpress normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:40 ioport:3000(size=4096) memory:f0000000-f10fffff ioport:c0000000(size=301989888) *-generic UNCLAIMED description: Unassigned class product: Illegal Vendor ID vendor: Illegal Vendor ID physical id: 0 bus info: pci@0000:01:00.0 version: ff width: 32 bits clock: 66MHz capabilities: bus_master vga_palette cap_list configuration: latency=255 maxlatency=255 mingnt=255 resources: memory:f0000000-f0ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:3000(size=128) memory:f1000000-f107ffff *-display description: VGA compatible controller product: 2nd Generation Core Processor Family Integrated Graphics Controller vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 09 width: 64 bits clock: 33MHz capabilities: msi pm vga_controller bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:52 memory:f1400000-f17fffff memory:e0000000-efffffff ioport:4000(size=64) *-communication description: Communication controller product: 6 Series/C200 Series Chipset Family MEI Controller #1 vendor: Intel Corporation physical id: 16 bus info: pci@0000:00:16.0 version: 04 width: 64 bits clock: 33MHz capabilities: pm msi bus_master cap_list configuration: driver=mei_me latency=0 resources: irq:50 memory:f1c05000-f1c0500f *-usb:0 description: USB controller product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 vendor: Intel Corporation physical id: 1a bus info: pci@0000:00:1a.0 version: 05 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci-pci latency=0 resources: irq:16 memory:f1c09000-f1c093ff *-multimedia description: Audio device product: 6 Series/C200 Series Chipset Family High Definition Audio Controller vendor: Intel Corporation physical id: 1b bus info: pci@0000:00:1b.0 version: 05 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=snd_hda_intel latency=0 resources: irq:53 memory:f1c00000-f1c03fff *-pci:1 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 1 vendor: Intel Corporation physical id: 1c bus info: pci@0000:00:1c.0 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode cap_list configuration: driver=pcieport resources: irq:16 *-pci:2 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 2 vendor: Intel Corporation physical id: 1c.1 bus info: pci@0000:00:1c.1 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:17 memory:f1b00000-f1bfffff *-network description: Wireless interface product: Centrino Wireless-N 1030 [Rainbow Peak] vendor: Intel Corporation physical id: 0 bus info: pci@0000:03:00.0 logical name: mon.wlan0 version: 34 serial: bc:77:37:14:47:e5 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list logical wireless ethernet physical configuration: broadcast=yes driver=iwlwifi driverversion=3.13.0-27-generic firmware=18.168.6.1 latency=0 link=no multicast=yes wireless=IEEE 802.11bgn resources: irq:51 memory:f1b00000-f1b01fff *-pci:3 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 4 vendor: Intel Corporation physical id: 1c.3 bus info: pci@0000:00:1c.3 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:19 memory:f1a00000-f1afffff *-usb description: USB controller product: uPD720200 USB 3.0 Host Controller vendor: NEC Corporation physical id: 0 bus info: pci@0000:04:00.0 version: 04 width: 64 bits clock: 33MHz capabilities: pm msi msix pciexpress xhci bus_master cap_list configuration: driver=xhci_hcd latency=0 resources: irq:19 memory:f1a00000-f1a01fff *-pci:4 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 5 vendor: Intel Corporation physical id: 1c.4 bus info: pci@0000:00:1c.4 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:16 memory:f1900000-f19fffff *-pci:5 description: PCI bridge product: 6 Series/C200 Series Chipset Family PCI Express Root Port 6 vendor: Intel Corporation physical id: 1c.5 bus info: pci@0000:00:1c.5 version: b5 width: 32 bits clock: 33MHz capabilities: pci pciexpress msi pm normal_decode bus_master cap_list configuration: driver=pcieport resources: irq:17 ioport:2000(size=4096) ioport:f1800000(size=1048576) *-network description: Ethernet interface product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller vendor: Realtek Semiconductor Co., Ltd. physical id: 0 bus info: pci@0000:06:00.0 logical name: eth0 version: 06 serial: 14:fe:b5:a3:ac:40 size: 1Gbit/s capacity: 1Gbit/s width: 64 bits clock: 33MHz capabilities: pm msi pciexpress msix vpd bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full firmware=rtl_nic/rtl8168e-2.fw ip=172.19.167.151 latency=0 link=yes multicast=yes port=MII speed=1Gbit/s resources: irq:49 ioport:2000(size=256) memory:f1804000-f1804fff memory:f1800000-f1803fff *-usb:1 description: USB controller product: 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 vendor: Intel Corporation physical id: 1d bus info: pci@0000:00:1d.0 version: 05 width: 32 bits clock: 33MHz capabilities: pm debug ehci bus_master cap_list configuration: driver=ehci-pci latency=0 resources: irq:23 memory:f1c08000-f1c083ff *-isa description: ISA bridge product: HM67 Express Chipset Family LPC Controller vendor: Intel Corporation physical id: 1f bus info: pci@0000:00:1f.0 version: 05 width: 32 bits clock: 33MHz capabilities: isa bus_master cap_list configuration: driver=lpc_ich latency=0 resources: irq:0 *-ide:0 description: IDE interface product: 6 Series/C200 Series Chipset Family 4 port SATA IDE Controller vendor: Intel Corporation physical id: 1f.2 bus info: pci@0000:00:1f.2 version: 05 width: 32 bits clock: 66MHz capabilities: ide pm bus_master cap_list configuration: driver=ata_piix latency=0 resources: irq:19 ioport:40b8(size=8) ioport:40cc(size=4) ioport:40b0(size=8) ioport:40c8(size=4) ioport:4090(size=16) ioport:4080(size=16) *-serial UNCLAIMED description: SMBus product: 6 Series/C200 Series Chipset Family SMBus Controller vendor: Intel Corporation physical id: 1f.3 bus info: pci@0000:00:1f.3 version: 05 width: 64 bits clock: 33MHz configuration: latency=0 resources: memory:f1c04000-f1c040ff ioport:efa0(size=32) *-ide:1 description: IDE interface product: 6 Series/C200 Series Chipset Family 2 port SATA IDE Controller vendor: Intel Corporation physical id: 1f.5 bus info: pci@0000:00:1f.5 version: 05 width: 32 bits clock: 66MHz capabilities: ide pm bus_master cap_list configuration: driver=ata_piix latency=0 resources: irq:19 ioport:40a8(size=8) ioport:40c4(size=4) ioport:40a0(size=8) ioport:40c0(size=4) ioport:4070(size=16) ioport:4060(size=16) *-scsi:0 physical id: 1 logical name: scsi0 capabilities: emulated *-disk description: ATA Disk product: SAMSUNG HN-M640M physical id: 0.0.0 bus info: scsi@0:0.0.0 logical name: /dev/sda version: 2AR1 serial: S2T3J1KBC00006 size: 596GiB (640GB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 sectorsize=512 signature=6b746d91 *-volume:0 description: Windows NTFS volume physical id: 1 bus info: scsi@0:0.0.0,1 logical name: /dev/sda1 version: 3.1 serial: 0272-3e7f size: 348MiB capacity: 350MiB capabilities: primary bootable ntfs initialized configuration: clustersize=4096 created=2013-09-18 12:20:45 filesystem=ntfs label=System Reserved modified_by_chkdsk=true mounted_on_nt4=true resize_log_file=true state=dirty upgrade_on_mount=true *-volume:1 description: Extended partition physical id: 2 bus info: scsi@0:0.0.0,2 logical name: /dev/sda2 size: 116GiB capacity: 116GiB capabilities: primary extended partitioned partitioned:extended *-logicalvolume:0 description: Linux swap / Solaris partition physical id: 5 logical name: /dev/sda5 capacity: 6037MiB capabilities: nofs *-logicalvolume:1 description: Linux filesystem partition physical id: 6 logical name: /dev/sda6 logical name: / capacity: 110GiB configuration: mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro,data=ordered state=mounted *-volume:2 description: Windows NTFS volume physical id: 3 bus info: scsi@0:0.0.0,3 logical name: /dev/sda3 logical name: /media/os version: 3.1 serial: 4e7853ec-5555-a74d-82e0-9f49798d3772 size: 156GiB capacity: 156GiB capabilities: primary ntfs initialized configuration: clustersize=4096 created=2013-09-19 09:19:00 filesystem=ntfs label=OS mount.fstype=fuseblk mount.options=ro,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 state=mounted *-volume:3 description: Windows NTFS volume physical id: 4 bus info: scsi@0:0.0.0,4 logical name: /dev/sda4 logical name: /media/data version: 3.1 serial: 7666d55f-e1bf-e645-9791-2a1a31b24b9a size: 322GiB capacity: 322GiB capabilities: primary ntfs initialized configuration: clustersize=4096 created=2013-09-17 23:27:01 filesystem=ntfs label=Data modified_by_chkdsk=true mount.fstype=fuseblk mount.options=rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 mounted_on_nt4=true resize_log_file=true state=mounted upgrade_on_mount=true *-scsi:1 physical id: 2 logical name: scsi1 capabilities: emulated *-cdrom description: DVD-RAM writer product: DVD+-RW GT32N vendor: HL-DT-ST physical id: 0.0.0 bus info: scsi@1:0.0.0 logical name: /dev/cdrom logical name: /dev/sr0 version: A201 capabilities: removable audio cd-r cd-rw dvd dvd-r dvd-ram configuration: ansiversion=5 status=nodisc *-battery product: DELL vendor: SANYO physical id: 1 version: 2008 serial: 1.0 slot: Rear capacity: 57720mWh configuration: voltage=11.1V `

    Read the article

  • array and array_view from amp.h

    - by Daniel Moth
    This is a very long post, but it also covers what are probably the classes (well, array_view at least) that you will use the most with C++ AMP, so I hope you enjoy it! Overview The concurrency::array and concurrency::array_view template classes represent multi-dimensional data of type T, of N dimensions, specified at compile time (and you can later access the number of dimensions via the rank property). If N is not specified, it is assumed that it is 1 (i.e. single-dimensional case). They are rectangular (not jagged). The difference between them is that array is a container of data, whereas array_view is a wrapper of a container of data. So in that respect, array behaves like an STL container, whereas the closest thing an array_view behaves like is an STL iterator (albeit with random access and allowing you to view more than one element at a time!). The data in the array (whether provided at creation time or added later) resides on an accelerator (which is specified at creation time either explicitly by the developer, or set to the default accelerator at creation time by the runtime) and is laid out contiguously in memory. The data provided to the array_view is not stored by/in the array_view, because the array_view is simply a view over the real source (which can reside on the CPU or other accelerator). The underlying data is copied on demand to wherever the array_view is accessed. Elements which differ by one in the least significant dimension of the array_view are adjacent in memory. array objects must be captured by reference into the lambda you pass to the parallel_for_each call, whereas array_view objects must be captured by value (into the lambda you pass to the parallel_for_each call). Creating array and array_view objects and relevant properties You can create array_view objects from other array_view objects of the same rank and element type (shallow copy, also possible via assignment operator) so they point to the same underlying data, and you can also create array_view objects over array objects of the same rank and element type e.g.   array_view<int,3> a(b); // b can be another array or array_view of ints with rank=3 Note: Unlike the constructors above which can be called anywhere, the ones in the rest of this section can only be called from CPU code. You can create array objects from other array objects of the same rank and element type (copy and move constructors) and from other array_view objects, e.g.   array<float,2> a(b); // b can be another array or array_view of floats with rank=2 To create an array from scratch, you need to at least specify an extent object, e.g. array<int,3> a(myExtent);. Note that instead of an explicit extent object, there are convenience overloads when N<=3 so you can specify 1-, 2-, 3- integers (dependent on the array's rank) and thus have the extent created for you under the covers. At any point, you can access the array's extent thought the extent property. The exact same thing applies to array_view (extent as constructor parameters, incl. convenience overloads, and property). While passing only an extent object to create an array is enough (it means that the array will be written to later), it is not enough for the array_view case which must always wrap over some other container (on which it relies for storage space and actual content). So in addition to the extent object (that describes the shape you'd like to be viewing/accessing that data through), to create an array_view from another container (e.g. std::vector) you must pass in the container itself (which must expose .data() and a .size() methods, e.g. like std::array does), e.g.   array_view<int,2> aaa(myExtent, myContainerOfInts); Similarly, you can create an array_view from a raw pointer of data plus an extent object. Back to the array case, to optionally initialize the array with data, you can pass an iterator pointing to the start (and optionally one pointing to the end of the source container) e.g.   array<double,1> a(5, myVector.begin(), myVector.end()); We saw that arrays are bound to an accelerator at creation time, so in case you don’t want the C++ AMP runtime to assign the array to the default accelerator, all array constructors have overloads that let you pass an accelerator_view object, which you can later access via the accelerator_view property. Note that at the point of initializing an array with data, a synchronous copy of the data takes place to the accelerator, and then to copy any data back we'll see that an explicit copy call is required. This does not happen with the array_view where copying is on demand... refresh and synchronize on array_view Note that in the previous section on constructors, unlike the array case, there was no overload that accepted an accelerator_view for array_view. That is because the array_view is simply a wrapper, so the allocation of the data has already taken place before you created the array_view. When you capture an array_view variable in your call to parallel_for_each, the copy of data between the non-CPU accelerator and the CPU takes place on demand (i.e. it is implicit, versus the explicit copy that has to happen with the array). There are some subtleties to the on-demand-copying that we cover next. The assumption when using an array_view is that you will continue to access the data through the array_view, and not through the original underlying source, e.g. the pointer to the data that you passed to the array_view's constructor. So if you modify the data through the array_view on the GPU, the original pointer on the CPU will not "know" that, unless one of two things happen: you access the data through the array_view on the CPU side, i.e. using indexing that we cover below you explicitly call the array_view's synchronize method on the CPU (this also gets called in the array_view's destructor for you) Conversely, if you make a change to the underlying data through the original source (e.g. the pointer), the array_view will not "know" about those changes, unless you call its refresh method. Finally, note that if you create an array_view of const T, then the data is copied to the accelerator on demand, but it does not get copied back, e.g.   array_view<const double, 5> myArrView(…); // myArrView will not get copied back from GPU There is also a similar mechanism to achieve the reverse, i.e. not to copy the data of an array_view to the GPU. copy_to, data, and global copy/copy_async functions Both array and array_view expose two copy_to overloads that allow copying them to another array, or to another array_view, and these operations can also be achieved with assignment (via the = operator overloads). Also both array and array_view expose a data method, to get a raw pointer to the underlying data of the array or array_view, e.g. float* f = myArr.data();. Note that for array_view, this only works when the rank is equal to 1, due to the data only being contiguous in one dimension as covered in the overview section. Finally, there are a bunch of global concurrency::copy functions returning void (and corresponding concurrency::copy_async functions returning a future) that allow copying between arrays and array_views and iterators etc. Just browse intellisense or amp.h directly for the full set. Note that for array, all copying described throughout this post is deep copying, as per other STL container expectations. You can never have two arrays point to the same data. indexing into array and array_view plus projection Reading or writing data elements of an array is only legal when the code executes on the same accelerator as where the array was bound to. In the array_view case, you can read/write on any accelerator, not just the one where the original data resides, and the data gets copied for you on demand. In both cases, the way you read and write individual elements is via indexing as described next. To access (or set the value of) an element, you can index into it by passing it an index object via the subscript operator. Furthermore, if the rank is 3 or less, you can use the function ( ) operator to pass integer values instead of having to use an index object. e.g. array<float,2> arr(someExtent, someIterator); //or array_view<float,2> arr(someExtent, someContainer); index<2> idx(5,4); float f1 = arr[idx]; float f2 = arr(5,4); //f2 ==f1 //and the reverse for assigning, e.g. arr(idx[0], 7) = 6.9; Note that for both array and array_view, regardless of rank, you can also pass a single integer to the subscript operator which results in a projection of the data, and (for both array and array_view) you get back an array_view of rank N-1 (or if the rank was 1, you get back just the element at that location). Not Covered In this already very long post, I am not going to cover three very cool methods (and related overloads) that both array and array_view expose: view_as, section, reinterpret_as. We'll revisit those at some point in the future, probably on the team blog. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • 256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

    - by Alan Smith
    For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour.  This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue   static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005>     define wooden texture { noise surface { ambient 0.2  diffuse 0.7  specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1  lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7  } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } }   viewpoint {     from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60     resolution 640, 480 aspect 1.6 image_format 0 }       light <-10, 30, 20> light <-10, 30, -20>   object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden }   object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome }   After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue }   In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000   As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72   Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: ·         On-Premise o   Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o   Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o   Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o   Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. ·         Windows Azure o   Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment.   The architecture of each worker role is shown below.   The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" >   <WorkerRole name="CloudRayWorkerRole" vmsize="Small">     <Imports>     </Imports>     <ConfigurationSettings>       <Setting name="DataConnectionString" />     </ConfigurationSettings>     <LocalResources>       <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" />     </LocalResources>   </WorkerRole> </ServiceDefinition>     The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1.       The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2.       PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3.       DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4.       The JPG file is uploaded from local storage to the images blob container. 5.       A message is placed on the images queue to indicate a new image is available for download. 6.       The job message is deleted from the job queue. 7.       The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() {     // Set environment variables     string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation);     string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation);       LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder");     string localStorageRootPath = rayStorage.RootPath;       JobQueue jobQueue = new JobQueue("renderjobs");     JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs");     CloudRayBlob sceneBlob = new CloudRayBlob("scenes");     CloudRayBlob imageBlob = new CloudRayBlob("images");     RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource();       Frames = 0;       while (true)     {         // Get the render job from the queue         CloudQueueMessage jobMsg = jobQueue.Get();           if (jobMsg != null)         {             // Get the file details             string sceneFile = jobMsg.AsString;             string tgaFile = sceneFile.Replace(".pi", ".tga");             string jpgFile = sceneFile.Replace(".pi", ".jpg");               string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile);             string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile);             string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile);               // Copy the scene file to local storage             sceneBlob.DownloadFile(sceneFilePath);               // Run the ray tracer.             string polyrayArguments =                 string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath);             Process polyRayProcess = new Process();             polyRayProcess.StartInfo.FileName =                 Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath);             polyRayProcess.StartInfo.Arguments = polyrayArguments;             polyRayProcess.Start();             polyRayProcess.WaitForExit();               // Convert the image             string dtaArguments =                 string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath));             Process dtaProcess = new Process();             dtaProcess.StartInfo.FileName =                 Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath);             dtaProcess.StartInfo.Arguments = dtaArguments;             dtaProcess.Start();             dtaProcess.WaitForExit();               // Upload the image to blob storage             imageBlob.UploadFile(jpgFilePath);               // Add a download job.             downloadQueue.Add(jpgFile);               // Delete the render job message             jobQueue.Delete(jobMsg);               Frames++;         }         else         {             Thread.Sleep(1000);         }           // Log the worker role activity.         roleLifecycleDataSource.Alive             ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames);     } }     Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage.   public class RoleLifecycle : TableServiceEntity {     public string ServerName { get; set; }     public string Status { get; set; }     public DateTime StartTime { get; set; }     public DateTime EndTime { get; set; }     public long SecondsRunning { get; set; }     public DateTime LastActiveTime { get; set; }     public int Frames { get; set; }     public string Comment { get; set; }       public RoleLifecycle()     {     }       public RoleLifecycle(string roleName)     {         PartitionKey = roleName;         RowKey = Utils.GetAscendingRowKey();         Status = "Started";         StartTime = DateTime.UtcNow;         LastActiveTime = StartTime;         EndTime = StartTime;         SecondsRunning = 0;         Frames = 0;     } }     A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*">   <Role name="CloudRayWorkerRole">     <Instances count="16" />     <ConfigurationSettings>       <Setting name="DataConnectionString"         value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." />     </ConfigurationSettings>   </Role> </ServiceConfiguration>     About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames.   Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation.   With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes.     At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances.   <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*">   <Role name="CloudRayWorkerRole">     <Instances count="256" />     <ConfigurationSettings>       <Setting name="DataConnectionString"         value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." />     </ConfigurationSettings>   </Role> </ServiceConfiguration>     Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames.   Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds.   We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes.   The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each.   Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles   The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. ·         Windows Azure can provide massive compute power, on demand, in a matter of minutes. ·         The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. ·         Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. ·         No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. ·         Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. ·         Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. ·         Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. ·         Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. ·         If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. ·         Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. ·         Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

    Read the article

  • Using R to Analyze G1GC Log Files

    - by user12620111
    Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=||   Using R to Analyze G1GC Log Files   Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z\(\),]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\\( " , " " ); gsub ( "\ \)" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period:     par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight.   plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

    Read the article

  • Wireless AAA for a small, bandwidth-limited hotel.

    - by Anthony Hiscox
    We (the tech I work with and myself) live in a remote northern town where Internet access is somewhat of a luxury, and bandwidth is quite limited. Here, overage charges ranging from few hundreds, to few thousands of dollars a month, is not uncommon. I myself incur regular monthly charges just through my regular Internet usage at home (I am allowed 10G for $60CAD!) As part of my work, I have found myself involved with several hotels that are feeling this. I know that I can come up with something to solve this problem, but I am relatively new to system administration and I don't want my dreams to overcome reality. So, I pass these ideas on to you, those with much more experience than I, in hopes you will share some of your thoughts and concerns. This system must be cost effective, yes the charges are high here, but the trust in technology is the lowest I've ever seen. Must be capable of helping client reduce their usage (squid) Allow a limited (throughput and total usage) amount of free Internet, as this is often franchise policy. Allow a user to track their bandwidth usage Allow (optional) higher speed and/or usage for an additional charge. This fee can be obtained at the front desk on checkout and should not require the use of PayPal or Credit Card. Unfortunately some franchises have ridiculous policies that require the use of a third party remote service to authenticate guests to your network. This means WPA is out, and it also means that I do not auth before Internet usage, that will be their job. However, I do require the ABILITY to perform authentication for Internet access if a hotel does not have this policy. I will still have to track bandwidth (under a guest account by default) and provide the same limiting, however the guest often will require a complete 'unlimited' access, in terms of existence, not throughput. Provide firewalling capabilities for hotels that have nothing, Office, and Guest network segregation (some of these guys are running their office on the guest network, with no encryption, and a simple TOS to get on!) Prevent guests from connecting to other guests, however provide a means to allow this to happen. IE. Each guest connects to a page and allows the other guest, this writes a iptables rule (with python-netfilter) and allows two rooms to play a game, for instance. My thoughts on how to implement this. One decent box (we'll call it a router now) with a lot of ram, and 3 NIC's: Internet Office Guests (AP's + In Room Ethernet) Router Firewall Rules Guest can talk to router only, through which they are routed to where they need to go, including Internet services. Office can be used to bridge Office to Internet if an existing solution is not in place, otherwise, it simply works for a network accessible web (webmin+python-webmin?) interface. Router Software: OpenVZ provides virtualization for a few services I don't really trust. Squid, FreeRADIUS and Apache. The only service directly accessible to guests is Apache. Apache has mod_wsgi and django, because I can write quickly using django and my needs are low. It also potentially has the FreeRADIUS mod, but there seems to be some caveats with this. Firewall rules are handled on the router with iptables. Webmin (or a custom django app maybe) provides abstracted control over any features that the staff may need to access. Python, if you haven't guessed it's the language I feel most comfortable in, and I use it for almost everything. And finally, has this been done, is it a overly massive project not worth taking on for one guy, and/or is there some tools I'm missing that could be making my life easier? For the record, I am fairly good with Python, but not very familiar with many other languages (I can struggle through PHP, it's a cosmetic issue there). I am also an avid linux user, and comfortable with config files and command line. Thank you for your time, I look forward to reading your responses. Edit: My apologies if this is not a Q&A in the sense that some were expecting, I'm just looking for ideas and to make sure I'm not trying to do something that's been done. I'm looking at pfSense now as a possible start for what I need.

    Read the article

  • Norton Ghost usage, Linux? ISO? Server? MBR?

    - by overtherainbow
    Before evaluating Symantec/Norton Ghost to image partitions, I have a couple of questions about using this tool: In the product page, it only mentions Windows: Can Norton image Linux partitions as well? Can I burn an ISO to create/recover images? The ISO's I found seem only able to restore an image but not create one. Does it mean that images can only be created from within a running Windows? For Windows partitions: Does it support both regular and Server versions? Acronis doesn't image Server partitions in the regular version When restoring an image, does Norton give the option of including/excluding the MBR? Thank you.

    Read the article

  • Debian virtual memory reaching limit

    - by Gregor
    As a relative newbie to systems, I inherited a Debian server and I've noticed that virtual memory is very high (around 95%!). The server has been running slow for around 6 months, and I was wondering if any of you had any tips on things I could try, particularly on freeing up memory. The server hosts various websites and also a Postit email server. Here are the details: Operating system Debian Linux 5.0 Webmin version 1.580 Time on system Thu Apr 12 11:12:21 2012 Kernel and CPU Linux 2.6.18-6-amd64 on x86_64 Processor information Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz, 2 cores System uptime 229 days, 12 hours, 50 minutes Running processes 138 CPU load averages 0.10 (1 min) 0.28 (5 mins) 0.36 (15 mins) CPU usage 14% user, 1% kernel, 0% IO, 85% idle Real memory 2.94 GB total, 1.69 GB used Virtual memory 3.93 GB total, 3.84 GB used Local disk space 142.84 GB total, 116.13 GB used Free m output: free -m total used free shared buffers cached Mem: 3010 2517 492 0 107 996 -/+ buffers/cache: 1413 1596 Swap: 4024 3930 93 Top output: top - 11:59:57 up 229 days, 13:38, 1 user, load average: 0.26, 0.24, 0.26 Tasks: 136 total, 2 running, 134 sleeping, 0 stopped, 0 zombie Cpu(s): 3.8%us, 0.5%sy, 0.0%ni, 95.0%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3082544k total, 2773160k used, 309384k free, 111496k buffers Swap: 4120632k total, 4024712k used, 95920k free, 1036136k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28796 www-data 16 0 304m 68m 6188 S 8 2.3 0:03.13 apache2 1 root 15 0 10304 592 564 S 0 0.0 0:00.76 init 2 root RT 0 0 0 0 S 0 0.0 0:04.06 migration/0 3 root 34 19 0 0 0 S 0 0.0 0:05.67 ksoftirqd/0 4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 5 root RT 0 0 0 0 S 0 0.0 0:00.06 migration/1 6 root 34 19 0 0 0 S 0 0.0 0:01.26 ksoftirqd/1 7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 8 root 10 -5 0 0 0 S 0 0.0 0:00.12 events/0 9 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1 10 root 10 -5 0 0 0 S 0 0.0 0:00.00 khelper 11 root 10 -5 0 0 0 S 0 0.0 0:00.02 kthread 16 root 10 -5 0 0 0 S 0 0.0 0:15.51 kblockd/0 17 root 10 -5 0 0 0 S 0 0.0 0:01.32 kblockd/1 18 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpid 127 root 10 -5 0 0 0 S 0 0.0 0:00.00 khubd 129 root 10 -5 0 0 0 S 0 0.0 0:00.00 kseriod 180 root 10 -5 0 0 0 S 0 0.0 70:09.05 kswapd0 181 root 17 -5 0 0 0 S 0 0.0 0:00.00 aio/0 182 root 17 -5 0 0 0 S 0 0.0 0:00.00 aio/1 780 root 16 -5 0 0 0 S 0 0.0 0:00.00 ata/0 782 root 16 -5 0 0 0 S 0 0.0 0:00.00 ata/1 783 root 16 -5 0 0 0 S 0 0.0 0:00.00 ata_aux 802 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_0 803 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_1 804 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_2 805 root 10 -5 0 0 0 S 0 0.0 0:00.00 scsi_eh_3 1013 root 10 -5 0 0 0 S 0 0.0 49:27.78 kjournald 1181 root 15 -4 16912 452 448 S 0 0.0 0:00.05 udevd 1544 root 14 -5 0 0 0 S 0 0.0 0:00.00 kpsmoused 1706 root 13 -5 0 0 0 S 0 0.0 0:00.00 kmirrord 1995 root 18 0 193m 3324 1688 S 0 0.1 8:52.77 rsyslogd 2031 root 15 0 48856 732 608 S 0 0.0 0:01.86 sshd 2071 root 25 0 17316 1072 1068 S 0 0.0 0:00.00 mysqld_safe 2108 mysql 15 0 320m 72m 4368 S 0 2.4 1923:25 mysqld 2109 root 18 0 3776 500 496 S 0 0.0 0:00.00 logger 2180 postgres 15 0 99504 3016 2880 S 0 0.1 1:24.15 postgres 2184 postgres 15 0 99504 3596 3420 S 0 0.1 0:02.08 postgres 2185 postgres 15 0 99504 696 628 S 0 0.0 0:00.65 postgres 2186 postgres 15 0 99640 892 648 S 0 0.0 0:01.18 postgres

    Read the article

  • Norton Ghost usage, Linux? ISO? Server? MBR?

    - by OverTheRainbow
    Before evaluating Symantec/Norton Ghost to image partitions, I have a couple of questions about using this tool: In the product page, it only mentions Windows: Can Norton image Linux partitions as well? Can I burn an ISO to create/recover images? The ISO's I found seem only able to restore an image but not create one. Does it mean that images can only be created from within a running Windows? For Windows partitions: Does it support both regular and Server versions? Acronis doesn't image Server partitions in the regular version When restoring an image, does Norton give the option of including/excluding the MBR? Thank you.

    Read the article

  • What is my peak total memory usage? How much of my RAM is actually being actively used?

    - by William C
    Hi, Is there an app or utility in Windows that shows me the peak value of in-use memory (i.e., the total of the shareable and private memory of processes, drivers and the operating system) (not standby nor free)? I've got lots of memory and I'm installing an in-memory file cache, called "eBoostr", and would like an idea how much memory to allocate for it and still avoid deteriorating page faults. Essentially, I want the answer to the question, "How much of my RAM is actually being actively used?" W

    Read the article

  • Hard drive write speed - finding a lighter antivirus?

    - by Shingetsu
    I recently have been getting a lot of system lag here (for example, the mouse and the display in general take about 15 seconds to react in the worst cases). After a lot of monitoring the resources, I found that the problem mainly happens when too much Disk I/O is being done. Three culprits have been identified: My browser had the highest write I/O with 35,000,000 I/O Write Bytes. Steam had the highest read I/O (when IDLE!!!) with 106,000,000 I/O Read Bytes. My antivirus (in both cases I will soon mention) was the runner up in both cases with: 30,000,000ish write and 80,000,000ish read. The first AV I had was Avast! which I had liked on my previous system. After noticing it taking so much I/O I switched to Panda (supposing it wouldn't use TOO much during idle phase). However it only used a bit less I/O. Just a lot less memory and cpu and somewhat more network. My browser at the moment is Maxthon 3 (which I like a lot). Before this I was running chrome which had similar data and much higher cpu when running in the background was enabled. I'm not going to be running steam all the time and there aren't many alternatives to it. I like my browser very much, but I AM willing to switch if there's an obvious problem (I'm in programming, however I'm not a very good sysadmin, especially not when it comes to windows). Finally, my system almost stops lagging when I turn off the antivirus (and preferably steam) (some remains but once in every 5-6 hours for a few seconds so it isn't a big problem). My question (has a few parts): Is it possible to configure steam to lower it's I/O usage? (and maybe network while we're at it?) Which antivirus (very preferably free) uses lowest I/O while idle (I leave PC alone during active scans so that isn't a problem). Is there an obvious problem with my current browser and, if so, is there a way to fix it or should I switch and, if so, to what? (P.S. I've been on FFox for some time too). Info on system: Windows 7 (32 bit T_T, I am getting a new one in a few months but I want to keep using system during that time though). Hard Drive (main) is a Raid0. (Also have an external 1TB one which contains steam (and steam alone). As such it doesn't get used by much anything other than steam and isn't a very large problem. However steam still uses some I/O of registry) CPU: Intel(R) Core(TM)2 CPU [email protected] RAM: 6GB (3.25GB usable) (this and CPU have little effect as shown in next section) Additional info: Memory usage during problematic times: 44% CPU usage during problematic times: 35% Page File: main drive: system managed. 1TB drive: none. The current system I'm using is about 6 years old and is mainly a place holder while I await the new one in a few months. Final words: this is my 1st post on Super User (this question wouldn't feel right on Stack Overflow where I usually stay). If it doesn't have it's place here please tell me. If anything is wrong with it, same. Edit Technically I'm looking for a live thread detection program with minimal IO usage. I already have good active scan capability: Kaspersky (the free scanner uses the paid database) and MalwareBytes. Edit 2 Noticed another one, it seems that windows media player has been using stuff even when off! Turning it off and restarting now. If the problem is fixed I'll tell you guys. The reason I didn't notice it before was because I didn't have resource manager in front of me at the MOMENT of the problem. Now I did and it was at the very top of the list!

    Read the article

  • How to get more information from the system crash

    - by viraptor
    I'd like to debug an issue I'm having with a linux (debian stable) server, but I'm running out of ideas of how to confirm any diagnosis. Some background: The servers are running DL160 class with hardware raid between two disks. They're running a lot of services, mostly utilising network interface and CPU. There are 8 cpus and 7 "main" most cpu-hungry processes are bound to one core each via cpu affinity. Other random background scripts are not forced anywhere. The filesystem is writing ~1.5k blocks/s the whole time (goes up above 2k/s in peak times). Normal CPU usage for those servers is ~60% on 7 cores and some minimal usage on the last (whatever's running on shells usually). What actually happens is that the "main" services start using 100% CPU at some point, mainly stuck in kernel time. After a couple of seconds, LA goes over 400 and we lose any way to connect to the box (KVM is on it's way, but not there yet). Sometimes we see a kernel reporting hung task (but not always): [118951.272884] INFO: task zsh:15911 blocked for more than 120 seconds. [118951.272955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [118951.273037] zsh D 0000000000000000 0 15911 1 [118951.273093] ffff8101898c3c48 0000000000000046 0000000000000000 ffffffffa0155e0a [118951.273183] ffff8101a753a080 ffff81021f1c5570 ffff8101a753a308 000000051f0fd740 [118951.273274] 0000000000000246 0000000000000000 00000000ffffffbd 0000000000000001 [118951.273335] Call Trace: [118951.273424] [<ffffffffa0155e0a>] :ext3:__ext3_journal_dirty_metadata+0x1e/0x46 [118951.273510] [<ffffffff804294f6>] schedule_timeout+0x1e/0xad [118951.273563] [<ffffffff8027577c>] __pagevec_free+0x21/0x2e [118951.273613] [<ffffffff80428b0b>] wait_for_common+0xcf/0x13a [118951.273692] [<ffffffff8022c168>] default_wake_function+0x0/0xe .... This would point at raid / disk failure, however sometimes the tasks are hung on kernel's gettsc which would indicate some general weird hardware behaviour. It's also running mysql (almost read-only, 99% cache hit), which seems to spawn a lot more threads during the system problems. During the day it does ~200kq/s (selects) and ~10q/s (writes). The host is never running out of memory or swapping, no oom reports are spotted. We've got many boxes with similar/same hardware and they all seem to behave that way, but I'm not sure which part fails, so it's probably not a good idea to just grab something more powerful and hope the problem goes away. Applications themselves don't really report anything wrong when they're running. I can run anything safely on the same hardware in an isolated environment. What can I do to narrow down the problem? Where else should I look for explanation?

    Read the article

  • Linux per-process resource limits - a deep Red Hat Mystery

    - by BobBanana
    I have my own multithreaded C program which scales in speed smoothly with the number of CPU cores.. I can run it with 1, 2, 3, etc threads and get linear speedup.. up to about 5.5x speed on a 6-core CPU on a Ubuntu Linux box. I had an opportunity to run the program on a very high end Sunfire x4450 with 4 quad-core Xeon processors, running Red Hat Enterprise Linux. I was eagerly anticipating seeing how fast the 16 cores could run my program with 16 threads.. But it runs at the same speed as just TWO threads! Much hair-pulling and debugging later, I see that my program really is creating all the threads, they really are running simultaneously, but the threads themselves are slower than they should be. 2 threads runs about 1.7x faster than 1, but 3, 4, 8, 10, 16 threads all run at just net 1.9x! I can see all the threads are running (not stalled or sleeping), they're just slow. To check that the HARDWARE wasn't at fault, I ran SIXTEEN copies of my program independently, simultaneously. They all ran at full speed. There really are 16 cores and they really do run at full speed and there really is enough RAM (in fact this machine has 64GB, and I only use 1GB per process). So, my question is if there's some OPERATING SYSTEM explanation, perhaps some per-process resource limit which automatically scales back thread scheduling to keep one process from hogging the machine. Clues are: My program does not access the disk or network. It's CPU limited. Its speed scales linearly on a single CPU box in Ubuntu Linux with a hexacore i7 for 1-6 threads. 6 threads is effectively 6x speedup. My program never runs faster than 2x speedup on this 16 core Sunfire Xeon box, for any number of threads from 2-16. Running 16 copies of my program single threaded runs perfectly, all 16 running at once at full speed. top shows 1600% of CPUs allocated. /proc/cpuinfo shows all 16 cores running at full 2.9GHz speed (not low frequency idle speed of 1.6GHz) There's 48GB of RAM free, it is not swapping. What's happening? Is there some process CPU limit policy? How could I measure it if so? What else could explain this behavior? Thanks for your ideas to solve this, the Great Xeon Slowdown Mystery of 2010!

    Read the article

  • Reducing Oracle LOB Memory Use in PHP, or Paul's Lesson Applied to Oracle

    - by christopher.jones
    Paul Reinheimer's PHP memory pro tip shows how re-assigning a value to a variable doesn't release the original value until the new data is ready. With large data lengths, this unnecessarily increases the peak memory usage of the application. In Oracle you might come across this situation when dealing with LOBS. Here's an example that selects an entire LOB into PHP's memory. I see this being done all the time, not that that is an excuse to code in this style. The alternative is to remove OCI_RETURN_LOBS to return a LOB locator which can be accessed chunkwise with LOB->read(). In this memory usage example, I threw some CLOB rows into a table. Each CLOB was about 1.5M. The fetching code looked like: $s = oci_parse ($c, 'SELECT CLOBDATA FROM CTAB'); oci_execute($s); echo "Start Current :" . memory_get_usage() . "\n"; echo "Start Peak : " .memory_get_peak_usage() . "\n"; while(($r = oci_fetch_array($s, OCI_RETURN_LOBS)) !== false) { echo "Current :" . memory_get_usage() . "\n"; echo "Peak : " . memory_get_peak_usage() . "\n"; // var_dump(substr($r['CLOBDATA'],0,10)); // do something with the LOB // unset($r); } echo "End Current :" . memory_get_usage() . "\n"; echo "End Peak : " . memory_get_peak_usage() . "\n"; Without "unset" in loop, $r retains the current data value while new data is fetched: Start Current : 345300 Start Peak : 353676 Current : 1908092 Peak : 2958720 Current : 1908092 Peak : 4520972 End Current : 345668 End Peak : 4520972 When I uncommented the "unset" line in the loop, PHP's peak memory usage is much lower: Start Current : 345376 Start Peak : 353676 Current : 1908168 Peak : 2958796 Current : 1908168 Peak : 2959108 End Current : 345744 End Peak : 2959108 Even if you are using LOB->read(), unsetting variables in this manner will reduce the PHP program's peak memory usage. With LOBS in Oracle DB there is also DB memory use to consider. Using LOB->free() is worthwhile for locators. Importantly, the OCI8 1.4.1 extension (from PECL or included in PHP 5.3.2) has a LOB fix to free up Oracle's locators earlier. For long running scripts using lots of LOBS, upgrading to OCI8 1.4.1 is recommended.

    Read the article

  • ASP.NET Asynchronous Pages and when to use them

    - by rajbk
    There have been several articles posted about using  asynchronous pages in ASP.NET but none of them go into detail as to when you should use them. I finally found a great post by Thomas Marquardt that explains the process in depth. He addresses a key misconception also: So, in your ASP.NET application, when should you perform work asynchronously instead of synchronously? Well, only 1 thread per CPU can execute at a time.  Did you catch that?  A lot of people seem to miss this point...only one thread executes at a time on a CPU. When you have more than this, you pay an expensive penalty--a context switch. However, if a thread is blocked waiting on work...then it makes sense to switch to another thread, one that can execute now.  It also makes sense to switch threads if you want work to be done in parallel as opposed to in series, but up until a certain point it actually makes much more sense to execute work in series, again, because of the expensive context switch. Pop quiz: If you have a thread that is doing a lot of computational work and using the CPU heavily, and this takes a while, should you switch to another thread? No! The current thread is efficiently using the CPU, so switching will only incur the cost of a context switch. Ok, well, what if you have a thread that makes an HTTP or SOAP request to another server and takes a long time, should you switch threads? Yes! You can perform the HTTP or SOAP request asynchronously, so that once the "send" has occurred, you can unwind the current thread and not use any threads until there is an I/O completion for the "receive". Between the "send" and the "receive", the remote server is busy, so locally you don't need to be blocking on a thread, but instead make use of the asynchronous APIs provided in .NET Framework so that you can unwind and be notified upon completion. Again, it only makes sense to switch threads if the benefit from doing so out weights the cost of the switch. Read more about it in these posts: Performing Asynchronous Work, or Tasks, in ASP.NET Applications http://blogs.msdn.com/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx ASP.NET Thread Usage on IIS 7.0 and 6.0 http://blogs.msdn.com/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx   PS: I generally do not write posts that simply link to other posts but think it is warranted in this case.

    Read the article

  • Advanced TSQL Tuning: Why Internals Knowledge Matters

    - by Paul White
    There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes.  Query tuning is not complete as soon as the query returns results quickly in the development or test environments.  In production, your query will compete for memory, CPU, locks, I/O and other resources on the server.  Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data.  In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row.  The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type.  A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL,   CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL,   CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table.  The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results.  This seems slow, considering that the table only has 50,000 rows.  Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time.  Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring.  The script below monitors STATISTICS IO output and the amount of tempdb used by the test query.  We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB.  The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row.  With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first).  This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts.  In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted.  SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed.  Sorts typically require a very large number of comparisons, so this is usually a very effective optimization.  One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself.  SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use.  The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory.  Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant.  The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources.  More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1).  Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB.  The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file.  Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case.  In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms.  If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much.  Why is the data size so much smaller?  The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored.  By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output.  You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead.  SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows.  The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes.  The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page.  There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option.  By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row.  SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage.  You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now.  This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query).  SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant.  No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers.  Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’.  Why?  Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed.  SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages.  This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this?  Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need.  245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory.  Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily?  That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column.  That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal.  I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator.  Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted.  We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need.  The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb.  The small remaining inefficiency is in reading the id column values from the clustered primary key index.  As a clustered index, it contains all the in-row data at its leaf.  The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead.  The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure.  This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only.  This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data.  The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings.  It isn’t about the internal differences between TEXT and MAX data types either.  It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load.  No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance.  Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows.  5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is!  If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb.  Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough.  Tuning for logical reads and adding covering indexes is not enough.  If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on.  The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008.  They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator.  Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

    Read the article

  • Java FAQ: Tudo o que você precisa saber

    - by Bruno.Borges
    Com frequência recebo e-mails de clientes com dúvidas sobre "quando sairá a próxima versão do Java?", ou então "quando vai expirar o Java?" ou ainda "quais as mudanças da próxima versão?". Por isso resolvi escrever aqui um FAQ, respondendo estas dúvidas e muitas outras. Este post estará sempre atualizado, então se você possui alguma dúvida, envie para mim no Twitter @brunoborges. Qual a diferença entre o Oracle JDK e o OpenJDK?O projeto OpenJDK funciona como a implementação de referência Open Source do Java Standard Edition. Empresas como a Oracle, IBM, e Azul Systems suportam e investem no projeto OpenJDK para continuar evoluindo a plataforma Java. O Oracle JDK é baseado no OpenJDK, mas traz outras ferramentas como o Mission Control, e a máquina virtual traz algumas features avançadas como por exemplo o Flight Recorder. Até a versão 6, a Oracle oferecia duas máquinas virtuais: JRockit (BEA) e HotSpot (Sun). A partir da versão 7 a Oracle unificou as máquinas virtuais, e levou as features avançadas do JRockit para dentro da VM HotSpot. Leia também o OpenJDK FAQ. Onde posso obter binários beta Early Access do JDK 7, JDK 8, JDK 9 para testar?A partir do projeto OpenJDK, existe um projeto específico para cada versão do Java. Nestes projetos você pode encontrar binários beta Early Access, além do código-fonte. JDK 6 - http://jdk6.java.net/ JDK 7 - http://jdk7.java.net/ JDK 8 - http://jdk8.java.net/ JDK 9 - http://jdk9.java.net/ Quando acaba o suporte do Oracle Java SE 6, 7, 8? Somente produtos e versões com release oficial são suportados pela Oracle (exemplo: não há suporte para binários beta do JDK 7, JDK 8, ou JDK 9). Existem duas categorias de datas que o usuriário do Java deve estar ciente:  EOPU - End of Public UpdatesMomento em que a Oracle não mais disponibiliza publicamente atualizações Oracle SupportPolítica de suporte da Oracle para produtos, incluindo o Oracle Java SE O Oracle Java SE é um produto e portando os períodos de suporte são regidos pelo Oracle Lifetime Support Policy. Consulte este documento para datas atualizadas e específicas para cada versão do Java. O Oracle Java SE 6 já atingiu EOPU (End of Public Updates) e agora é mantido e atualizado somente para clientes através de contrato comercial de suporte. Para maiores informações, consulte a página sobre Oracle Java SE Support.  O mais importante aqui é você estar ciente sobre as datas de EOPU para as versões do Java SE da Oracle.Consulte a página do Oracle Java SE Support Roadmap e busque nesta página pela tabela com nome Java SE Public Updates. Nela você encontrará a data em que determinada versão do Java irá atingir EOPU. Como funciona o versionamento do Java?Em 2013, a Oracle divulgou um novo esquema de versionamento do Java para facilmente identificar quando é um release CPU e quando é um release LFR, e também para facilitar o planejamento e desenvolvimento de correções e features para futuras versões. CPU - Critical Patch UpdateAtualizações com correções de segurança. Versão será múltipla de 5, ou com soma de 1 para manter o número ímpar. Exemplos: 7u45, 7u51, 7u55. LFR - Limited Feature ReleaseAtualizações com correções de funcionalidade, melhorias de performance, e novos recursos. Versões de números pares múltiplos de 20, com final 0. Exemplos: 7u40, 7u60, 8u20. Qual a data da próxima atualização de segurança (CPU) do Java SE?Lançamentos do tipo CPU são controlados e pré-agendados pela Oracle e se aplicam a todos os produtos, inclusive o Oracle Java SE. Estes releases acontecem a cada 3 meses, sempre na Terça-feira mais próxima do dia 17 dos meses de Janeiro, Abril, Julho, e Outubro. Consulte a página Critical Patch Updates, Security Alerts and Third Party Bulleting para saber das próximas datas. Caso tenha interesse, você pode acompanhar através de recebimentos destes boletins diretamente no seu email. Veja como assinar o Boletim de Segurança da Oracle. Qual a data da próxima atualização de features (LFR) do Java SE?A Oracle reserva o direito de não divulgar estas datas, assim como o faz para todos os seus produtos. Entretanto é possível acompanhar o desenvolvimento da próxima versão pelos sites do projeto OpenJDK. A próxima versão do JDK 7 será o update 60 e binários beta Early Access já estão disponíveis para testes. A próxima versão doJDK 8 será o update 20 e binários beta Early Access já estão disponíveis para testes. Onde posso ver as mudanças e o que foi corrigido para a próxima versão do Java?A Oracle disponibiliza um changelog para cada binário beta Early Access divulgado no portal Java.net. JDK 7 update 60 changelogs JDK 8 update 20 changelogs Quando o Java da minha máquina (ou do meu usuário) vai expirar?Conheçendo o sistema de versionamento do Java e a periodicidade dos releases de CPU, o usuário pode determinar quando que um update do Java irá expirar. De todo modo, a cada novo update, a Oracle já informa quando que este update deverá expirar diretamente no release notes da versão. Por exemplo, no release notes da versão Oracle Java SE 7 update 55, está escrito na seção JRE Expiration Date o seguinte: The JRE expires whenever a new release with security vulnerability fixes becomes available. Critical patch updates, which contain security vulnerability fixes, are announced one year in advance on Critical Patch Updates, Security Alerts and Third Party Bulletin. This JRE (version 7u55) will expire with the release of the next critical patch update scheduled for July 15, 2014. For systems unable to reach the Oracle Servers, a secondary mechanism expires this JRE (version 7u55) on August 15, 2014. After either condition is met (new release becoming available or expiration date reached), the JRE will provide additional warnings and reminders to users to update to the newer version. For more information, see JRE Expiration Date.Ou seja, a versão 7u55 irá expirar com o lançamento do próximo release CPU, pré-agendado para o dia 15 de Julho de 2014. E caso o computador do usuário não possa se comunicar com o servidor da Oracle, esta versão irá expirar forçadamente no dia 15 de Agosto de 2014 (através de um mecanismo embutido na versão 7u55). O usuário não é obrigado a atualizar para versões LFR e portanto, mesmo com o release da versão 7u60, a versão atual 7u55 não irá expirar.Veja o release notes do Oracle Java SE 8 update 5. Encontrei um bug. Como posso reportar bugs ou problemas no Java SE, para a Oracle?Sempre que possível, faça testes com os binários beta antes da versão final ser lançada. Qualquer problema que você encontrar com estes binários beta, por favor descreva o problema através do fórum de Project Feebdack do JDK.Caso você encontre algum problema em uma versão final do Java, utilize o formulário de Bug Report. Importante: bugs reportados por estes sistemas não são considerados Suporte e portanto não há SLA de atendimento. A Oracle reserva o direito de manter o bug público ou privado, e também de informar ou não o usuário sobre o progresso da resolução do problema. Tenho uma dúvida que não foi respondida aqui. Como faço?Se você possui uma pergunta que não foi respondida aqui, envie para bruno.borges_at_oracle.com e caso ela seja pertinente, tentarei responder neste artigo. Para outras dúvidas, entre em contato pelo meu Twitter @brunoborges.

    Read the article

< Previous Page | 95 96 97 98 99 100 101 102 103 104 105 106  | Next Page >