Search Results

Search found 5572 results on 223 pages for 'cpu'.

Page 215/223 | < Previous Page | 211 212 213 214 215 216 217 218 219 220 221 222 | Next Page >

Interesting articles and blogs on SPARC T4

- by mv

Interesting articles and blogs on SPARC T4 processor I have consolidated all the interesting information I could get on SPARC T4 processor and its hardware cryptographic capabilities. Hope its useful. 1. Advantages of SPARC T4 processor Most important points in this T4 announcement are : "The SPARC T4 processor was designed from the ground up for high speed security and has a cryptographic stream processing unit (SPU) integrated directly into each processor core. These accelerators support 16 industry standard security ciphers and enable high speed encryption at rates 3 to 5 times that of competing processors. By integrating encryption capabilities directly inside the instruction pipeline, the SPARC T4 processor eliminates the performance and cost barriers typically associated with secure computing and makes it possible to deliver high security levels without impacting the user experience." Data Sheet has more details on these : "New on-chip Encryption Instruction Accelerators with direct non-privileged support for 16 industry-standard cryptographic algorithms plus random number generation in each of the eight cores: AES, Camellia, CRC32c, DES, 3DES, DH, DSA, ECC, Kasumi, MD5, RSA, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512" I ran "isainfo -v" command on Solaris 11 Sparc T4-1 system. It shows the new instructions as expected : $ isainfo -v 64-bit sparcv9 applications crc32c cbcond pause mont mpmul sha512 sha256 sha1 md5 camellia kasumi des aes ima hpc vis3 fmaf asi_blk_init vis2 vis popc 32-bit sparc applications crc32c cbcond pause mont mpmul sha512 sha256 sha1 md5 camellia kasumi des aes ima hpc vis3 fmaf asi_blk_init vis2 vis popc v8plus div32 mul32 2. Dan Anderson's Blog have some interesting points about how these can be used : "New T4 crypto instructions include: aes_kexpand0, aes_kexpand1, aes_kexpand2, aes_eround01, aes_eround23, aes_eround01_l, aes_eround_23_l, aes_dround01, aes_dround23, aes_dround01_l, aes_dround_23_l. Having SPARC T4 hardware crypto instructions is all well and good, but how do we access it ? The software is available with Solaris 11 and is used automatically if you are running Solaris a SPARC T4. It is used internally in the kernel through kernel crypto modules. It is available in user space through the PKCS#11 library." 3. Dans' Blog on Where's the Crypto Libraries? Although this was written in 2009 but still is very useful "Here's a brief tour of the major crypto libraries shown in the digraph: The libpkcs11 library contains the PKCS#11 API (C_\*() functions, such as C_Initialize()). That in turn calls library pkcs11_softtoken or pkcs11_kernel, for userland or kernel crypto providers. The latter is used mostly for hardware-assisted cryptography (such as n2cp for Niagara2 SPARC processors), as that is performed more efficiently in kernel space with the "kCF" module (Kernel Crypto Framework). Additionally, for Solaris 10, strong crypto algorithms were split off in separate libraries, pkcs11_softtoken_extra libcryptoutil contains low-level utility functions to help implement cryptography. libsoftcrypto (OpenSolaris and Solaris Nevada only) implements several symmetric-key crypto algorithms in software, such as AES, RC4, and DES3, and the bignum library (used for RSA). libmd implements MD5, SHA, and SHA2 message digest algorithms" 4. Difference in T3 and T4 Diagram in this blog is good and self explanatory. Jeff's blog also highlights the differences "The T4 servers have improved crypto acceleration, described at https://blogs.oracle.com/DanX/entry/sparc_t4_openssl_engine. It is "just built in" so administrators no longer have to assign crypto accelerator units to domains - it "just happens". Every physical or virtual CPU on a SPARC-T4 has full access to hardware based crypto acceleration at all times. .... For completeness sake, it's worth noting that the T4 adds more crypto algorithms, and accelerates Camelia, CRC32c, and more SHA-x." 5. About performance counters In this blog, performance counters are explained : "Note that unlike T3 and before, T4 crypto doesn't require kernel modules like ncp or n2cp, there is no visibility of crypto hardware with kstats or cryptoadm. T4 does provide hardware counters for crypto operations. You can see these using cpustat: cpustat -c pic0=Instr_FGU_crypto 5 You can check the general crypto support of the hardware and OS with the command "isainfo -v". Since T4 crypto's implementation now allows direct userland access, there are no "crypto units" visible to cryptoadm. " For more details refer Martin's blog as well. 6. How to turn off SPARC T4 or Intel AES-NI crypto acceleration I found this interesting blog from Darren about how to turn off SPARC T4 or Intel AES-NI crypto acceleration. "One of the new Solaris 11 features of the linker/loader is the ability to have a single ELF object that has multiple different implementations of the same functions that are selected at runtime based on the capabilities of the machine. The alternate to this is having the application coded to call getisax(2) system call and make the choice itself. We use this functionality of the linker/loader when we build the userland libraries for the Solaris Cryptographic Framework (specifically libmd.so and libsoftcrypto.so) The Solaris linker/loader allows control of a lot of its functionality via environment variables, we can use that to control the version of the cryptographic functions we run. To do this we simply export the LD_HWCAP environment variable with values that tell ld.so.1 to not select the HWCAP section matching certain features even if isainfo says they are present. This will work for consumers of the Solaris Cryptographic Framework that use the Solaris PKCS#11 libraries or use libmd.so interfaces directly. For SPARC T4 : export LD_HWCAP="-aes -des -md5 -sha256 -sha512 -mont -mpul" .. For Intel systems with AES-NI support: export LD_HWCAP="-aes"" Note that LD_HWCAP is explained in http://docs.oracle.com/cd/E23823_01/html/816-5165/ld.so.1-1.html "LD_HWCAP, LD_HWCAP_32, and LD_HWCAP_64 - Identifies an alternative hardware capabilities value... A “-” prefix results in the capabilities that follow being removed from the alternative capabilities." 7. Whitepaper on SPARC T4 Servers—Optimized for End-to-End Data Center Computing This Whitepaper on SPARC T4 Servers—Optimized for End-to-End Data Center Computing explains more details. It has DTrace scripts which may come in handy : "To ensure the hardware-assisted cryptographic acceleration is configured to use and working with the security scenarios, it is recommended to use the following Solaris DTrace script. #!/usr/sbin/dtrace -s pid$1:libsoftcrypto:yf*:entry, pid$target:libsoftcrypto:rsa*:entry, pid$1:libmd:yf*:entry { @[probefunc] = count(); } tick-1sec { printa(@ops); trunc(@ops); }" Note that I have slightly modified the D Script to have RSA "libsoftcrypto:rsa*:entry" as well as per recommendations from Chi-Chang Lin. 8. References http://www.oracle.com/us/corporate/features/sparc-t4-announcement-494846.html http://www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/t-series/sparc-t4-1-ds-487858.pdf https://blogs.oracle.com/DanX/entry/sparc_t4_openssl_engine https://blogs.oracle.com/DanX/entry/where_s_the_crypto_libraries https://blogs.oracle.com/darren/entry/howto_turn_off_sparc_t4 http://docs.oracle.com/cd/E23823_01/html/816-5165/ld.so.1-1.html https://blogs.oracle.com/hardware/entry/unleash_the_power_of_cryptography https://blogs.oracle.com/cmt/entry/t4_crypto_cheat_sheet https://blogs.oracle.com/martinm/entry/t4_performance_counters_explained https://blogs.oracle.com/jsavit/entry/no_mau_required_on_a http://www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/t-series/sparc-t4-business-wp-524472.pdf

Read the article
Monitor your Hard Drive’s Health with Acronis Drive Monitor

- by Matthew Guay

Are you worried that your computer’s hard drive could die without any warning? Here’s how you can keep tabs on it and get the first warning signs of potential problems before you actually lose your critical data. Hard drive failures are one of the most common ways people lose important data from their computers. As more of our memories and important documents are stored digitally, a hard drive failure can mean the loss of years of work. Acronis Drive Monitor helps you avert these disasters by warning you at the first signs your hard drive may be having trouble. It monitors many indicators, including heat, read/write errors, total lifespan, and more. It then notifies you via a taskbar popup or email that problems have been detected. This early warning lets you know ahead of time that you may need to purchase a new hard drive and migrate your data before it’s too late. Getting Started Head over to the Acronis site to download Drive Monitor (link below). You’ll need to enter your name and email, and then you can download this free tool. Also, note that the download page may ask if you want to include a trial of their for-pay backup program. If you wish to simply install the Drive Monitor utility, click Continue without adding. Run the installer when the download is finished. Follow the prompts and install as normal. Once it’s installed, you can quickly get an overview of your hard drives’ health. Note that it shows 3 categories: Disk problems, Acronis backup, and Critical Events. On our computer, we had Seagate DiskWizard, an image backup utility based on Acronis Backup, installed, and Acronis detected it. Drive Monitor stays running in your tray even when the application window is closed. It will keep monitoring your hard drives, and will alert you if there’s a problem. Find Detailed Information About Your Hard Drives Acronis’ simple interface lets you quickly see an overview of how the drives on your computer are performing. If you’d like more information, click the link under the description. Here we see that one of our drives have overheated, so click Show disks to get more information. Now you can select each of your drives and see more information about them. From the Disk overview tab that opens by default, we see that our drive is being monitored, has been running for a total of 368 days, and that it’s health is good. However, it is running at 113F, which is over the recommended max of 107F. The S.M.A.R.T. parameters tab gives us more detailed information about our drive. Most users wouldn’t know what an accepted value would be, so it also shows the status. If the value is within the accepted parameters, it will report OK; otherwise, it will show that has a problem in this area. One very interesting piece of information we can see is the total number of Power-On Hours, Start/Stop Count, and Power Cycle Count. These could be useful indicators to check if you’re considering purchasing a second hand computer. Simply load this program, and you’ll get a better view of how long it’s been in use. Finally, the Events tab shows each time the program gave a warning. We can see that our drive, which had been acting flaky already, is routinely overheating even when our other hard drive was running in normal temperature ranges. Monitor Acronis Backups And Critical Errors In addition to monitoring critical stats of your hard drives, Acronis Drive Monitor also keeps up with the status of your backup software and critical events reported by Windows. You can access these from the front page, or via the links on the left hand sidebar. If you have any edition of any Acronis Backup product installed, it will show that it was detected. Note that it can only monitor the backup status of the newest versions of Acronis Backup and True Image. If no Acronis backup software was installed, it will show a warning that the drive may be unprotected and will give you a link to download Acronis backup software. If you have another backup utility installed that you wish to monitor yourself, click Configure backup monitoring, and then disable monitoring on the drives you’re monitoring yourself. Finally, you can view any detected Critical events from the Critical events tab on the left. Get Emailed When There’s a Problem One of Drive Monitor’s best features is the ability to send you an email whenever there’s a problem. Since this program can run on any version of Windows, including the Server and Home Server editions, you can use this feature to stay on top of your hard drives’ health even when you’re not nearby. To set this up, click Options in the top left corner. Select Alerts on the left, and then click the Change settings link to setup your email account. Enter the email address which you wish to receive alerts, and a name for the program. Then, enter the outgoing mail server settings for your email. If you have a Gmail account, enter the following information: Outgoing mail server (SMTP): smtp.gmail.com Port: 587 Username and Password: Your gmail address and password Check the Use encryption box, and then select TLS from the encryption options. It will now send a test message to your email account, so check and make sure it sent ok. Now you can choose to have the program automatically email you when warnings and critical alerts appear, and also to have it send regular disk status reports. Conclusion Whether you’ve got a brand new hard drive or one that’s seen better days, knowing the real health of your it is one of the best ways to be prepared before disaster strikes. It’s no substitute for regular backups, but can help you avert problems. Acronis Drive Monitor is a nice tool for this, and although we wish it wasn’t so centered around their backup offerings, we still found it a nice tool. Link Download Acronis Drive Monitor (registration required) Similar Articles Productive Geek Tips Quick Tip: Change Monitor Timeout From Command LineAnalyze and Manage Hard Drive Space with WinDirStatMonitor CPU, Memory, and Disk IO In Windows 7 with Taskbar MetersDefrag Multiple Hard Drives At Once In WindowsFind Your Missing USB Drive on Windows XP TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips HippoRemote Pro 2.2 Xobni Plus for Outlook All My Movies 5.9 CloudBerry Online Backup 1.5 for Windows Home Server Windows 7’s WordPad is Actually Good Greate Image Viewing and Management with Zoner Photo Studio Free Windows Media Player Plus! – Cool WMP Enhancer Get Your Team’s World Cup Schedule In Google Calendar Backup Drivers With Driver Magician TubeSort: YouTube Playlist Organizer

Read the article
The Proper Use of the VM Role in Windows Azure

- by BuckWoody

At the Professional Developer’s Conference (PDC) in 2010 we announced an addition to the Computational Roles in Windows Azure, called the VM Role. This new feature allows a great deal of control over the applications you write, but some have confused it with our full infrastructure offering in Windows Hyper-V. There is a proper architecture pattern for both of them. Virtualization Virtualization is the process of taking all of the hardware of a physical computer and replicating it in software alone. This means that a single computer can “host” or run several “virtual” computers. These virtual computers can run anywhere - including at a vendor’s location. Some companies refer to this as Cloud Computing since the hardware is operated and maintained elsewhere. IaaS The more detailed definition of this type of computing is called Infrastructure as a Service (Iaas) since it removes the need for you to maintain hardware at your organization. The operating system, drivers, and all the other software required to run an application are still under your control and your responsibility to license, patch, and scale. Microsoft has an offering in this space called Hyper-V, that runs on the Windows operating system. Combined with a hardware hosting vendor and the System Center software to create and deploy Virtual Machines (a process referred to as provisioning), you can create a Cloud environment with full control over all aspects of the machine, including multiple operating systems if you like. Hosting machines and provisioning them at your own buildings is sometimes called a Private Cloud, and hosting them somewhere else is often called a Public Cloud. State-ful and Stateless Programming This paradigm does not create a new, scalable way of computing. It simply moves the hardware away. The reason is that when you limit the Cloud efforts to a Virtual Machine, you are in effect limiting the computing resources to what that single system can provide. This is because much of the software developed in this environment maintains “state” - and that requires a little explanation. “State-ful programming” means that all parts of the computing environment stay connected to each other throughout a compute cycle. The system expects the memory, CPU, storage and network to remain in the same state from the beginning of the process to the end. You can think of this as a telephone conversation - you expect that the other person picks up the phone, listens to you, and talks back all in a single unit of time. In “Stateless” computing the system is designed to allow the different parts of the code to run independently of each other. You can think of this like an e-mail exchange. You compose an e-mail from your system (it has the state when you’re doing that) and then you walk away for a bit to make some coffee. A few minutes later you click the “send” button (the network has the state) and you go to a meeting. The server receives the message and stores it on a mail program’s database (the mail server has the state now) and continues working on other mail. Finally, the other party logs on to their mail client and reads the mail (the other user has the state) and responds to it and so on. These events might be separated by milliseconds or even days, but the system continues to operate. The entire process doesn’t maintain the state, each component does. This is the exact concept behind coding for Windows Azure. The stateless programming model allows amazing rates of scale, since the message (think of the e-mail) can be broken apart by multiple programs and worked on in parallel (like when the e-mail goes to hundreds of users), and only the order of re-assembling the work is important to consider. For the exact same reason, if the system makes copies of those running programs as Windows Azure does, you have built-in redundancy and recovery. It’s just built into the design. The Difference Between Infrastructure Designs and Platform Designs When you simply take a physical server running software and virtualize it either privately or publicly, you haven’t done anything to allow the code to scale or have recovery. That all has to be handled by adding more code and more Virtual Machines that have a slight lag in maintaining the running state of the system. Add more machines and you get more lag, so the scale is limited. This is the primary limitation with IaaS. It’s also not as easy to deploy these VM’s, and more importantly, you’re often charged on a longer basis to remove them. your agility in IaaS is more limited. Windows Azure is a Platform - meaning that you get objects you can code against. The code you write runs on multiple nodes with multiple copies, and it all works because of the magic of Stateless programming. you don’t worry, or even care, about what is running underneath. It could be Windows (and it is in fact a type of Windows Server), Linux, or anything else - but that' isn’t what you want to manage, monitor, maintain or license. You don’t want to deploy an operating system - you want to deploy an application. You want your code to run, and you don’t care how it does that. Another benefit to PaaS is that you can ask for hundreds or thousands of new nodes of computing power - there’s no provisioning, it just happens. And you can stop using them quicker - and the base code for your application does not have to change to make this happen. Windows Azure Roles and Their Use If you need your code to have a user interface, in Visual Studio you add a Web Role to your project, and if the code needs to do work that doesn’t involve a user interface you can add a Worker Role. They are just containers that act a certain way. I’ll provide more detail on those later. Note: That’s a general description, so it’s not entirely accurate, but it’s accurate enough for this discussion. So now we’re back to that VM Role. Because of the name, some have mistakenly thought that you can take a Virtual Machine running, say Linux, and deploy it to Windows Azure using this Role. But you can’t. That’s not what it is designed for at all. If you do need that kind of deployment, you should look into Hyper-V and System Center to create the Private or Public Infrastructure as a Service. What the VM Role is actually designed to do is to allow you to have a great deal of control over the system where your code will run. Let’s take an example. You’ve heard about Windows Azure, and Platform programming. You’re convinced it’s the right way to code. But you have a lot of things you’ve written in another way at your company. Re-writing all of your code to take advantage of Windows Azure will take a long time. Or perhaps you have a certain version of Apache Web Server that you need for your code to work. In both cases, you think you can (or already have) code the the software to be “Stateless”, you just need more control over the place where the code runs. That’s the place where a VM Role makes sense. Recap Virtualizing servers alone has limitations of scale, availability and recovery. Microsoft’s offering in this area is Hyper-V and System Center, not the VM Role. The VM Role is still used for running Stateless code, just like the Web and Worker Roles, with the exception that it allows you more control over the environment of where that code runs.

Read the article
Computer crashes on resume from standby almost every time

- by Los Frijoles

I am running Ubuntu 12.04 on a Core i5 2500K and ASRock Z68 Pro3-M motherboard (no graphics card, hd is a WD Green 1TB, and cd drive is some cheap lite-on drive). Since installing 12.04, my computer has been freezing after resume, but not every time. When I start to resume, it starts going normally with a blinking cursor on the screen and then sometimes it will continue on to the gnome 3 unlock screen. Most of the time, however, it will blink for a little bit and then the monitor will flip modes and shut off due to no signal. Pressing keys on the keyboard gets no response (num lock light doesn't respond, Ctrl-Alt-F1 fails to drop it into a terminal, Ctrl-Alt-Backspace doesn't work) and so I assume the computer is crashed. The worst part is, the logs look entirely normal. Here is my system log during one of these crashes and my subsequent hard poweroff and restart: Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-2, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-2, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-1, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[12419]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory Jun 6 22:09:01 kcuzner-desktop CRON[9061]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jun 6 22:17:01 kcuzner-desktop CRON[22142]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 22:39:01 kcuzner-desktop CRON[26909]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jun 6 22:54:21 kcuzner-desktop kernel: [57905.560822] show_signal_msg: 36 callbacks suppressed Jun 6 22:54:21 kcuzner-desktop kernel: [57905.560828] chromium-browse[9139]: segfault at 0 ip 00007f3a78efade0 sp 00007fff7e2d2c18 error 4 in chromium-browser[7f3a76604000+412b000] Jun 6 23:05:43 kcuzner-desktop kernel: [58586.415158] chromium-browse[21025]: segfault at 0 ip 00007f3a78efade0 sp 00007fff7e2d2c18 error 4 in chromium-browser[7f3a76604000+412b000] Jun 6 23:09:01 kcuzner-desktop CRON[13542]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jun 6 23:12:43 kcuzner-desktop kernel: [59006.317590] usb 2-1.7: USB disconnect, device number 8 Jun 6 23:12:43 kcuzner-desktop kernel: [59006.319672] sd 7:0:0:0: [sdg] Synchronizing SCSI cache Jun 6 23:12:43 kcuzner-desktop kernel: [59006.319737] sd 7:0:0:0: [sdg] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 6 23:17:01 kcuzner-desktop CRON[26580]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 23:19:04 kcuzner-desktop acpid: client connected from 29925[0:0] Jun 6 23:19:04 kcuzner-desktop acpid: 1 client rule loaded Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30131 of process 30131 (n/a) owned by '104' high priority at nice level -11. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 1 threads of 1 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30162 of process 30131 (n/a) owned by '104' RT at priority 5. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 2 threads of 1 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30163 of process 30131 (n/a) owned by '104' RT at priority 5. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 3 threads of 1 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop bluetoothd[1140]: Endpoint registered: sender=:1.239 path=/MediaEndpoint/HFPAG Jun 6 23:19:07 kcuzner-desktop bluetoothd[1140]: Endpoint registered: sender=:1.239 path=/MediaEndpoint/A2DPSource Jun 6 23:19:07 kcuzner-desktop bluetoothd[1140]: Endpoint registered: sender=:1.239 path=/MediaEndpoint/A2DPSink Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30166 of process 30166 (n/a) owned by '104' high priority at nice level -11. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 4 threads of 2 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop pulseaudio[30166]: [pulseaudio] pid.c: Daemon already running. Jun 6 23:19:10 kcuzner-desktop acpid: client 2942[0:0] has disconnected Jun 6 23:19:10 kcuzner-desktop acpid: client 29925[0:0] has disconnected Jun 6 23:19:10 kcuzner-desktop acpid: client connected from 1286[0:0] Jun 6 23:19:10 kcuzner-desktop acpid: 1 client rule loaded Jun 6 23:19:31 kcuzner-desktop bluetoothd[1140]: Endpoint unregistered: sender=:1.239 path=/MediaEndpoint/HFPAG Jun 6 23:19:31 kcuzner-desktop bluetoothd[1140]: Endpoint unregistered: sender=:1.239 path=/MediaEndpoint/A2DPSource Jun 6 23:19:31 kcuzner-desktop bluetoothd[1140]: Endpoint unregistered: sender=:1.239 path=/MediaEndpoint/A2DPSink Jun 6 23:28:12 kcuzner-desktop kernel: imklog 5.8.6, log source = /proc/kmsg started. Jun 6 23:28:12 kcuzner-desktop rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1053" x-info="http://www.rsyslog.com"] start Jun 6 23:28:12 kcuzner-desktop rsyslogd: rsyslogd's groupid changed to 103 Jun 6 23:28:12 kcuzner-desktop rsyslogd: rsyslogd's userid changed to 101 Jun 6 23:28:12 kcuzner-desktop rsyslogd-2039: Could not open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ] Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Ericsson MBM Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Sierra Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Generic Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Huawei Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Linktop Jun 6 23:28:12 kcuzner-desktop bluetoothd[1072]: Failed to init gatt_example plugin Jun 6 23:28:12 kcuzner-desktop bluetoothd[1072]: Listening for HCI events on hci0 Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> NetworkManager (version 0.9.4.0) is starting... Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> Read config file /etc/NetworkManager/NetworkManager.conf Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> VPN: loaded org.freedesktop.NetworkManager.pptp Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> DNS: loaded plugin dnsmasq Jun 6 23:28:12 kcuzner-desktop kernel: [ 0.000000] Initializing cgroup subsys cpuset Jun 6 23:28:12 kcuzner-desktop kernel: [ 0.000000] Initializing cgroup subsys cpu Sorry it's so huge; the restart happens at 23:28:12 I believe and all I see is that chromium segfaulted a few times. I wouldn't think a segfault from an individual program on the computer would crash it, but could that be the issue?

Read the article
Windows Azure VMs - New "Stopped" VM Options Provide Cost-effective Flexibility for On-Demand Workloads

- by KeithMayer

Originally posted on: http://geekswithblogs.net/KeithMayer/archive/2013/06/22/windows-azure-vms---new-stopped-vm-options-provide-cost-effective.aspxDidn’t make it to TechEd this year? Don’t worry! This month, we’ll be releasing a new article series that highlights the Best of TechEd announcements and technical information for IT Pros. Today’s article focuses on a new, much-heralded enhancement to Windows Azure Infrastructure Services to make it more cost-effective for spinning VMs up and down on-demand on the Windows Azure cloud platform. NEW! VMs that are shutdown from the Windows Azure Management Portal will no longer continue to accumulate compute charges while stopped! Previous to this enhancement being available, the Azure platform maintained fabric resource reservations for VMs, even in a shutdown state, to ensure consistent resource availability when starting those VMs in the future. And, this meant that VMs had to be exported and completely deprovisioned when not in use to avoid compute charges. In this article, I'll provide more details on the scenarios that this enhancement best fits, and I'll also review the new options and considerations that we now have for performing safe shutdowns of Windows Azure VMs. Which scenarios does the new enhancement best fit? Being able to easily shutdown VMs from the Windows Azure Management Portal without continued compute charges is a great enhancement for certain cloud use cases, such as: On-demand dev/test/lab environments - Freely start and stop lab VMs so that they are only accumulating compute charges when being actively used. "Bursting" load-balanced web applications - Provision a number of load-balanced VMs, but keep the minimum number of VMs running to support "normal" loads. Easily start-up the remaining VMs only when needed to support peak loads. Disaster Recovery - Start-up "cold" VMs when needed to recover from disaster scenarios. BUT ... there is a consideration to keep in mind when using the Windows Azure Management Portal to shutdown VMs: although performing a VM shutdown via the Windows Azure Management Portal causes that VM to no longer accumulate compute charges, it also deallocates the VM from fabric resources to which it was previously assigned. These fabric resources include compute resources such as virtual CPU cores and memory, as well as network resources, such as IP addresses. This means that when the VM is later started after being shutdown from the portal, the VM could be assigned a different IP address or placed on a different compute node within the fabric. In some cases, you may want to shutdown VMs using the old approach, where fabric resource assignments are maintained while the VM is in a shutdown state. Specifically, you may wish to do this when temporarily shutting down or restarting a "7x24" VM as part of a maintenance activity. Good news - you can still revert back to the old VM shutdown behavior when necessary by using the alternate VM shutdown approaches listed below. Let's walk through each approach for performing a VM Shutdown action on Windows Azure so that we can understand the benefits and considerations of each... How many ways can I shutdown a VM? In Windows Azure Infrastructure Services, there's three general ways that can be used to safely shutdown VMs: Shutdown VM via Windows Azure Management Portal Shutdown Guest Operating System inside the VM Stop VM via Windows PowerShell using Windows Azure PowerShell Module Although each of these options performs a safe shutdown of the guest operation system and the VM itself, each option handles the VM shutdown end state differently. Shutdown VM via Windows Azure Management Portal When clicking the Shutdown button at the bottom of the Virtual Machines page in the Windows Azure Management Portal, the VM is safely shutdown and "deallocated" from fabric resources. Shutdown button on Virtual Machines page in Windows Azure Management Portal When the shutdown process completes, the VM will be shown on the Virtual Machines page with a "Stopped ( Deallocated )" status as shown in the figure below. Virtual Machine in a "Stopped (Deallocated)" Status "Deallocated" means that the VM configuration is no longer being actively associated with fabric resources, such as virtual CPUs, memory and networks. In this state, the VM will not continue to allocate compute charges, but since fabric resources are deallocated, the VM could receive a different internal IP address ( called "Dynamic IPs" or "DIPs" in Windows Azure ) the next time it is started. TIP: If you are leveraging this shutdown option and consistency of DIPs is important to applications running inside your VMs, you should consider using virtual networks with your VMs. Virtual networks permit you to assign a specific IP Address Space for use with VMs that are assigned to that virtual network. As long as you start VMs in the same order in which they were originally provisioned, each VM should be reassigned the same DIP that it was previously using. What about consistency of External IP Addresses? Great question! External IP addresses ( called "Virtual IPs" or "VIPs" in Windows Azure ) are associated with the cloud service in which one or more Windows Azure VMs are running. As long as at least 1 VM inside a cloud service remains in a "Running" state, the VIP assigned to a cloud service will be preserved. If all VMs inside a cloud service are in a "Stopped ( Deallocated )" status, then the cloud service may receive a different VIP when VMs are next restarted. TIP: If consistency of VIPs is important for the cloud services in which you are running VMs, consider keeping one VM inside each cloud service in the alternate VM shutdown state listed below to preserve the VIP associated with the cloud service. Shutdown Guest Operating System inside the VM When performing a Guest OS shutdown or restart ( ie., a shutdown or restart operation initiated from the Guest OS running inside the VM ), the VM configuration will not be deallocated from fabric resources. In the figure below, the VM has been shutdown from within the Guest OS and is shown with a "Stopped" VM status rather than the "Stopped ( Deallocated )" VM status that was shown in the previous figure. Note that it may require a few minutes for the Windows Azure Management Portal to reflect that the VM is in a "Stopped" state in this scenario, because we are performing an OS shutdown inside the VM rather than through an Azure management endpoint. Virtual Machine in a "Stopped" Status VMs shown in a "Stopped" status will continue to accumulate compute charges, because fabric resources are still being reserved for these VMs. However, this also means that DIPs and VIPs are preserved for VMs in this state, so you don't have to worry about VMs and cloud services getting different IP addresses when they are started in the future. Stop VM via Windows PowerShell In the latest version of the Windows Azure PowerShell Module, a new -StayProvisioned parameter has been added to the Stop-AzureVM cmdlet. This new parameter provides the flexibility to choose the VM configuration end result when stopping VMs using PowerShell: When running the Stop-AzureVM cmdlet without the -StayProvisioned parameter specified, the VM will be safely stopped and deallocated; that is, the VM will be left in a "Stopped ( Deallocated )" status just like the end result when a VM Shutdown operation is performed via the Windows Azure Management Portal. When running the Stop-AzureVM cmdlet with the -StayProvisioned parameter specified, the VM will be safely stopped but fabric resource reservations will be preserved; that is the VM will be left in a "Stopped" status just like the end result when performing a Guest OS shutdown operation. So, with PowerShell, you can choose how Windows Azure should handle VM configuration and fabric resource reservations when stopping VMs on a case-by-case basis. TIP: It's important to note that the -StayProvisioned parameter is only available in the latest version of the Windows Azure PowerShell Module. So, if you've previously downloaded this module, be sure to download and install the latest version to get this new functionality. Want to Learn More about Windows Azure Infrastructure Services? To learn more about Windows Azure Infrastructure Services, be sure to check-out these additional FREE resources: Become our next "Early Expert"! Complete the Early Experts "Cloud Quest" and build a multi-VM lab network in the cloud for FREE! Build some cool scenarios! Check out our list of over 20+ Step-by-Step Lab Guides based on key scenarios that IT Pros are implementing on Windows Azure Infrastructure Services TODAY! Looking forward to seeing you in the Cloud! - Keith Build Your Lab! Download Windows Server 2012 Don’t Have a Lab? Build Your Lab in the Cloud with Windows Azure Virtual Machines Want to Get Certified? Join our Windows Server 2012 "Early Experts" Study Group

Read the article
CodePlex Daily Summary for Saturday, April 03, 2010

CodePlex Daily Summary for Saturday, April 03, 2010New ProjectsASP.NET MVC Demo: aspnetmvcdemoClasslessInterDomainRouting: ClasslessInterDomainRouting provides a class that is designed to detail with CIDR requests and ranges, it is developed within the C# Langauge and f...ClientSideRefactor: Plugin for Visual Studio.ColinTest: ColinTestePMS: An educational project to learn ASP.Net MVC, entity framework using vs 2010Extensible ASP.NET: Extensible Framework on top of ASP.NET - infrastructure level. Uses MEF for extensibility.Franchise Computing Model: Franchise Computing is a client-centric, contract-oriented, consumption-based computing model. Its framework allows service providers and consumers...GameEngine ReactorFX: Set of tools and code snippets for creation DirectX based games. Also provides a number of ideas, algorythms and problem-solutions.It's All Just Ones And Zeros: Utility code libraries for Vault API developers.Live Writer Picasa Plugin: Live Writer Picasa Plugin is a plugin for Windows Live Writer that allows you to embed photos from your Picasa Web Albums into your blog posts. Liv...Managed SDK for Meizu Cell Phone: The goal of this project is to deliver an open source managed SDK for Meizu cell phones, currently for M8. Media Player Field Type: Display a media player in a column of you document library. The library can contain movie files of diferent formats. The player will appear in the ...praca magisterska: This is my thesis: Algebraical aspects of modern cryptography,Pyx: An experimental programming language for statistics.SharpHydroLiDAR: A C# version of Lidar Hydrographic ExtractionSql Server Mds Destination: SSIS destination transform component for SQL Server Master Data ServicesStackOverflow.Net: A C# library for the StackOverflow API (currently in beta). Provides methods for every call currently in the StackOverflow API.TRX Merger Utility: People working on test projects that involve test management and execution from Visual Studio Team System 2008 and who do not have a TFS server for...UniPlanner: The UniPlanner project goal is to develop a web application able to visualize and schedule a university timetable.WikiNETParser: Wiki .NET Parser, Open Source project powered by ANTLR. Syntax defined in 3(4) files Lexer, Grammar, AST Parser.New ReleasesaaronERP builder - a framework to create customized ERP solutions: aaronERP_0.4.0.0: Changes (compared to version 0.3.0.0) : Businesslayer : - Caching of data-tables - ITranslatable Interface for mutli-language DAOs Web-Frontend: ...BatterySaver: Version 0.5: Add support for executing a power state event manually (Issue) Add support for battery percentage thresholds (Issue)ColinTest: asdfzxcv: asdfasdfComposer: V1.0.402.2001 Beta: Minor bug fixes Minor changes in interfaces Added documentation to the setup packageDynamic Configuration: Dynamic Configuration Release 2: Added ConfigurationChanged event fired whenever changes in .config file detected. Improved file watching filtering.Facebook Developer Toolkit: Version 3.1 BETA: Lots of bug fixes. Issues addressed: http://facebooktoolkit.codeplex.com/WorkItem/View.aspx?WorkItemId=14808 http://facebooktoolkit.codeplex.com/W...iExporter - iTunes playlist exporting: iExporter gui v2.5.0.0 - console v1.2.1.0: Paypal donate! New features and redesign for iExporter Gui You can now select/deselect all visible items with one click in the overview When yo...Line Counter: 1.5.5: The Line Counter is a tool to calculate lines of your code files. The tool was written in .NET 2.0. Line Counter 1.5.5 Fixed bugs in C# counter an...Live Writer Picasa Plugin: Live Writer Picasa Plugin 1.0.0: Changelog Since this is the first version there are no changes.Media Player Field Type: Media Player Field Type v1.0: Display a media player in a column of you document library. The library can contain movie files of diferent formats. The player will appear in the ...Numina Application/Security Framework: Numina.Framework Core 49601: Added .LESS library for CSS Updated default style and logo Added a few methods and method overloads to the .NET libraryOver Store: OverStore 1.16.0.0: Version 1.16.0.0 Runtime components uses PersistingRuntimeException instead of many exception types. PersistingRuntimeException message includes...patterns & practices Web Client Developer Guidance: Web Client Software Factory 2010 beta source code: The Web Client Software Factory 2010 provides an integrated set of guidance that assists architects and developers in creating web client applicati...SCSI Interface for Multimedia and Block Devices: Release 12 - View CD-DVD Drive Features: Changes in this version: - Added the ability to view the features of a CD/DVD device (e.g.: what discs it supports, whether it supports Mount Raini...SharePoint Labs: SPLab5006A-FRA-Level100: SPLab5006A-FRA-Level100 This SharePoint Lab will teach you how to create a Feature within Visual Studio, how to brand it, how to incorporate ressou...SharePoint Labs: SPLab5007A-FRA-Level300: SPLab5007A-FRA-Level300 This SharePoint Lab will teach you how to create a reusable and distributable project model for developping Features within...SharePoint Labs: SPLab5008A-FRA-Level100: SPLab5008A-FRA-Level100 This SharePoint Lab will teach you how to add an option in the ECB menu (Edit Control Block) only for specific file types w...SharePoint Labs: SPLab5009A-FRA-Level100: SPLab5009A-FRA-Level100 This SharePoint Lab will teach you the "Site Pages" model and the differences between customized/uncustomized pages (ghoste...SharePoint Labs: SPLab5010A-FRA-Level100: SPLab5010A-FRA-Level100 This SharePoint Lab will teach you the "Application Pages" model and the differences between "Site Pages" and "Application ...SharePoint Labs: SPLab5011A-FRA-Level100: SPLab5011A-FRA-Level100 This SharePoint Lab will teach you how to create a basic Application Page in the 12\TEMPLATE\LAYOUTS. Lab Language : French...sPATCH: sPatch v0.9b: + Fixed: an issue most webservers need leading slash to return filestreamsTASKedit: sTASKedit (pre-Alpha Release): This release is only for playing around, currently not useful Supported Files:Open 1.3.6 client tasks.data Export to 1.3.6 client tasks.data E...TRX Merger Utility: TRX Merger v1.0: First versionttgLib: ttgLib-0.01-beta1: In beta-version we've implemented basic functionality of ttgLib - now it can solve various problems using CPU+GPU bundle. Most important things: ...WikiNETParser: Wiki .NET Parser 2.5: Wiki .NET Parser 2.5 The documentation, binaries and source code could be downloaded from http://catarsa.com portal The latest release to downloa...WPF Zen Garden: Release 1.0: This is the first release.XNA 3D World Studio Content Pipeline: XNA 3DWS Content Pipeline - R2: This version adds terrains and brush based modelsMost Popular ProjectsRawrWBFS ManagerMicrosoft SQL Server Product Samples: DatabaseASP.NET Ajax LibrarySilverlight ToolkitAJAX Control ToolkitWindows Presentation Foundation (WPF)ASP.NETMicrosoft SQL Server Community & SamplesDotNetNuke® Community EditionMost Active ProjectsGraffiti CMSRawrjQuery Library for SharePoint Web ServicesFacebook Developer ToolkitBlogEngine.NETN2 CMSBase Class LibrariesFarseer Physics EngineLINQ to TwitterMicrosoft Biology Foundation

Read the article
Announcing Windows Azure Mobile Services

- by ScottGu

I’m excited to announce a new capability we are adding to Windows Azure today: Windows Azure Mobile Services Windows Azure Mobile Services makes it incredibly easy to connect a scalable cloud backend to your client and mobile applications. It allows you to easily store structured data in the cloud that can span both devices and users, integrate it with user authentication, as well as send out updates to clients via push notifications. Today’s release enables you to add these capabilities to any Windows 8 app in literally minutes, and provides a super productive way for you to quickly build out your app ideas. We’ll also be adding support to enable these same scenarios for Windows Phone, iOS, and Android devices soon. Read this getting started tutorial to walkthrough how you can build (in less than 5 minutes) a simple Windows 8 “Todo List” app that is cloud enabled using Windows Azure Mobile Services. Or watch this video of me showing how to do it step by step. Getting Started If you don’t already have a Windows Azure account, you can sign up for a no-obligation Free Trial. Once you are signed-up, click the “preview features” section under the “account” tab of the www.windowsazure.com website and enable your account to support the “Mobile Services” preview. Instructions on how to enable this can be found here. Once you have the mobile services preview enabled, log into the Windows Azure Portal, click the “New” button and choose the new “Mobile Services” icon to create your first mobile backend. Once created, you’ll see a quick-start page like below with instructions on how to connect your mobile service to an existing Windows 8 client app you have already started working on, or how to create and connect a brand-new Windows 8 client app with it: Read this getting started tutorial to walkthrough how you can build (in less than 5 minutes) a simple Windows 8 “Todo List” app that stores data in Windows Azure. Storing Data in the Cloud Storing data in the cloud with Windows Azure Mobile Services is incredibly easy. When you create a Windows Azure Mobile Service, we automatically associate it with a SQL Database inside Windows Azure. The Windows Azure Mobile Service backend then provides built-in support for enabling remote apps to securely store and retrieve data from it (using secure REST end-points utilizing a JSON-based ODATA format) – without you having to write or deploy any custom server code. Built-in management support is provided within the Windows Azure portal for creating new tables, browsing data, setting indexes, and controlling access permissions. This makes it incredibly easy to connect client applications to the cloud, and enables client developers who don’t have a server-code background to be productive from the very beginning. They can instead focus on building the client app experience, and leverage Windows Azure Mobile Services to provide the cloud backend services they require. Below is an example of client-side Windows 8 C#/XAML code that could be used to query data from a Windows Azure Mobile Service. Client-side C# developers can write queries like this using LINQ and strongly typed POCO objects, which are then translated into HTTP REST queries that run against a Windows Azure Mobile Service. Developers don’t have to write or deploy any custom server-side code in order to enable client-side code below to execute and asynchronously populate their client UI: Because Mobile Services is part of Windows Azure, developers can later choose to augment or extend their initial solution and add custom server functionality and more advanced logic if they want. This provides maximum flexibility, and enables developers to grow and extend their solutions to meet any needs. User Authentication and Push Notifications Windows Azure Mobile Services also make it incredibly easy to integrate user authentication/authorization and push notifications within your applications. You can use these capabilities to enable authentication and fine grain access control permissions to the data you store in the cloud, as well as to trigger push notifications to users/devices when the data changes. Windows Azure Mobile Services supports the concept of “server scripts” (small chunks of server-side script that executes in response to actions) that make it really easy to enable these scenarios. Below are some tutorials that walkthrough common authentication/authorization/push scenarios you can do with Windows Azure Mobile Services and Windows 8 apps: Enabling User Authentication Authorizing Users Get Started with Push Notifications Push Notifications to multiple Users Manage and Monitor your Mobile Service Just like with every other service in Windows Azure, you can monitor usage and metrics of your mobile service backend using the “Dashboard” tab within the Windows Azure Portal. The dashboard tab provides a built-in monitoring view of the API calls, Bandwidth, and server CPU cycles of your Windows Azure Mobile Service. You can also use the “Logs” tab within the portal to review error messages. This makes it easy to monitor and track how your application is doing. Scale Up as Your Business Grows Windows Azure Mobile Services now allows every Windows Azure customer to create and run up to 10 Mobile Services in a free, shared/multi-tenant hosting environment (where your mobile backend will be one of multiple apps running on a shared set of server resources). This provides an easy way to get started on projects at no cost beyond the database you connect your Windows Azure Mobile Service to (note: each Windows Azure free trial account also includes a 1GB SQL Database that you can use with any number of apps or Windows Azure Mobile Services). If your client application becomes popular, you can click the “Scale” tab of your Mobile Service and switch from “Shared” to “Reserved” mode. Doing so allows you to isolate your apps so that you are the only customer within a virtual machine. This allows you to elastically scale the amount of resources your apps use – allowing you to scale-up (or scale-down) your capacity as your traffic grows: With Windows Azure you pay for compute capacity on a per-hour basis – which allows you to scale up and down your resources to match only what you need. This enables a super flexible model that is ideal for new mobile app scenarios, as well as startups who are just getting going. Summary I’ve only scratched the surface of what you can do with Windows Azure Mobile Services – there are a lot more features to explore. With Windows Azure Mobile Services you’ll be able to build mobile app experiences faster than ever, and enable even better user experiences – by connecting your client apps to the cloud. Visit the Windows Azure Mobile Services development center to learn more, and build your first Windows 8 app connected with Windows Azure today. And read this getting started tutorial to walkthrough how you can build (in less than 5 minutes) a simple Windows 8 “Todo List” app that is cloud enabled using Windows Azure Mobile Services. Hope this helps, Scott P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

Read the article
General monitoring for SQL Server Analysis Services using Performance Monitor

- by Testas

A recent customer engagement required a setup of a monitoring solution for SSAS, due to the time restrictions placed upon this, native Windows Performance Monitor (Perfmon) and SQL Server Profiler Monitoring Tools was used as using a third party tool would have meant the customer providing an additional monitoring server that was not available.I wanted to outline the performance monitoring counters that was used to monitor the system on which SSAS was running. Due to the slow query performance that was occurring during certain scenarios, perfmon was used to establish if any pressure was being placed on the Disk, CPU or Memory subsystem when concurrent connections access the same query, and Profiler to pinpoint how the query was being managed within SSAS, profiler I will leave for another blogThis guide is not designed to provide a definitive list of what should be used when monitoring SSAS, different situations may require the addition or removal of counters as presented by the situation. However I hope that it serves as a good basis for starting your monitoring of SSAS. I would also like to acknowledge Chris Webb’s awesome chapters from “Expert Cube Development” that also helped shape my monitoring strategy:http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!6657.entrySimulating ConnectionsTo simulate the additional connections to the SSAS server whilst monitoring, I used ascmd to simulate multiple connections to the typical and worse performing queries that were identified by the customer. A similar sript can be downloaded from codeplex at http://www.codeplex.com/SQLSrvAnalysisSrvcs. File name: ASCMD_StressTestingScripts.zip. Performance MonitorWithin performance monitor, a counter log was created that contained the list of counters below. The important point to note when running the counter log is that the RUN AS property within the counter log properties should be changed to an account that has rights to the SSAS instance when monitoring MSAS counters. Failure to do so means that the counter log runs under the system account, no errors or warning are given while running the counter log, and it is not until you need to view the MSAS counters that they will not be displayed if run under the default account that has no right to SSAS. If your connection simulation takes hours, this could prove quite frustrating if not done beforehand JThe counters used…… Object Counter Instance Justification System Processor Queue legnth N/A Indicates how many threads are waiting for execution against the processor. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. System Context Switches/sec N/A Measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if the Processor\Interrupts/sec\(_Total) counter on your machine shows a similar unexplained increase Process % Processor Time sqlservr Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process % Processor Time msmdsrv Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process Working Set sqlservr If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Process Working Set msmdsrv If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Processor % Processor Time _Total and individual cores measures the total utilization of your processor by all running processes. If multi-proc then be mindful only an average is provided Processor % Privileged Time _Total To see how the OS is handling basic IO requests. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. Processor % User Time _Total To see how the applications is interacting from a processor perspective, a high percentage utilisation determine that the server is dealing with too many apps and may require increasing thje hardware or scaling out Processor Interrupts/sec _Total The average rate, in incidents per second, at which the processor received and serviced hardware interrupts. Shoulr be consistant over time but a sudden unexplained increase could indicate a device malfunction which can be confirmed using the System\Context Switches/sec counter Memory Pages/sec N/A Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays, this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. A good idea here is to configure a perfmon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system. May also want to see the configuration of the page file on the Server Memory Available Mbytes N/A is the amount of physical memory, in bytes, available to processes running on the computer. if this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM. monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM. Physical Disk Disk Transfers/sec for each physical disk If it goes above 10 disk I/Os per second then you've got poor response time for your disk. Physical Disk Idle Time _total If Disk Transfers/sec is above 25 disk I/Os per second use this counter. which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. Physical Disk Disk queue legnth For the OLAP and SQL physical disk A value that is consistently less than 2 means that the disk system is handling the IO requests against the physical disk Network Interface Bytes Total/sec For the NIC Should be monitored over a period of time to see if there is anb increase/decrease in network utilisation Network Interface Current Bandwidth For the NIC is an estimate of the current bandwidth of the network interface in bits per second (BPS). MSAS 2005: Memory Memory Limit High KB N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Limit Low KB N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Usage KB N/A Displays the memory usage of the server process. MSAS 2005: Memory File Store KB N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache MSAS 2005: Storage Engine Query Queries from Cache Direct / sec N/A Displays the rate of queries answered from the cache directly MSAS 2005: Storage Engine Query Queries from Cache Filtered / Sec N/A Displays the Rate of queries answered by filtering existing cache entry. MSAS 2005: Storage Engine Query Queries from File / Sec N/A Displays the Rate of queries answered from files. MSAS 2005: Storage Engine Query Average time /query N/A Displays the average time of a query MSAS 2005: Connection Current connections N/A Displays the number of connections against the SSAS instance MSAS 2005: Connection Requests / sec N/A Displays the rate of query requests per second MSAS 2005: Locks Current Lock Waits N/A Displays thhe number of connections waiting on a lock MSAS 2005: Threads Query Pool job queue Length N/A The number of queries in the job queue MSAS 2005:Proc Aggregations Temp file bytes written/sec N/A Shows the number of bytes of data processed in a temporary file MSAS 2005:Proc Aggregations Temp file rows written/sec N/A Shows the number of bytes of data processed in a temporary file

Read the article
SPARC T3-1 Record Results Running JD Edwards EnterpriseOne Day in the Life Benchmark with Added Batch Component

- by Brian

Using Oracle's SPARC T3-1 server for the application tier and Oracle's SPARC Enterprise M3000 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne applications Day in the Life benchmark run concurrently with a batch workload. The SPARC T3-1 server based result has 25% better performance than the IBM Power 750 POWER7 server even though the IBM result did not include running a batch component. The SPARC T3-1 server based result has 25% better space/performance than the IBM Power 750 POWER7 server as measured by the online component. The SPARC T3-1 server based result is 5x faster than the x86-based IBM x3650 M2 server system when executing the online component of the JD Edwards EnterpriseOne 9.0.1 Day in the Life benchmark. The IBM result did not include a batch component. The SPARC T3-1 server based result has 2.5x better space/performance than the x86-based IBM x3650 M2 server as measured by the online component. The combination of SPARC T3-1 and SPARC Enterprise M3000 servers delivered a Day in the Life benchmark result of 5000 online users with 0.875 seconds of average transaction response time running concurrently with 19 Universal Batch Engine (UBE) processes at 10 UBEs/minute. The solution exercises various JD Edwards EnterpriseOne applications while running Oracle WebLogic Server 11g Release 1 and Oracle Web Tier Utilities 11g HTTP server in Oracle Solaris Containers, together with the Oracle Database 11g Release 2. The SPARC T3-1 server showed that it could handle the additional workload of batch processing while maintaining the same number of online users for the JD Edwards EnterpriseOne Day in the Life benchmark. This was accomplished with minimal loss in response time. JD Edwards EnterpriseOne 9.0.1 takes advantage of the large number of compute threads available in the SPARC T3-1 server at the application tier and achieves excellent response times. The SPARC T3-1 server consolidates the application/web tier of the JD Edwards EnterpriseOne 9.0.1 application using Oracle Solaris Containers. Containers provide flexibility, easier maintenance and better CPU utilization of the server leaving processing capacity for additional growth. A number of Oracle advanced technology and features were used to obtain this result: Oracle Solaris 10, Oracle Solaris Containers, Oracle Java Hotspot Server VM, Oracle WebLogic Server 11g Release 1, Oracle Web Tier Utilities 11g, Oracle Database 11g Release 2, the SPARC T3 and SPARC64 VII+ based servers. This is the first published result running both online and batch workload concurrently on the JD Enterprise Application server. No published results are available from IBM running the online component together with a batch workload. The 9.0.1 version of the benchmark saw some minor performance improvements relative to 9.0. When comparing between 9.0.1 and 9.0 results, the reader should take this into account when the difference between results is small. Performance Landscape JD Edwards EnterpriseOne Day in the Life Benchmark Online with Batch Workload This is the first publication on the Day in the Life benchmark run concurrently with batch jobs. The batch workload was provided by Oracle's Universal Batch Engine. System RackUnits Online Users Resp Time (sec) BatchConcur(# of UBEs) BatchRate(UBEs/m) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII+ (2.86 GHz), Solaris 10 4 5000 0.88 19 10 9.0.1 Resp Time (sec) — Response time of online jobs reported in seconds Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs Batch Rate (UBEs/m) — Batch transaction rate in UBEs/minute. JD Edwards EnterpriseOne Day in the Life Benchmark Online Workload Only These results are for the Day in the Life benchmark. They are run without any batch workload. System RackUnits Online Users ResponseTime (sec) Version SPARC T3-1, 1xSPARC T3 (1.65 GHz), Solaris 10 M3000, 1xSPARC64 VII (2.75 GHz), Solaris 10 4 5000 0.52 9.0.1 IBM Power 750, 1xPOWER7 (3.55 GHz), IBM i7.1 4 4000 0.61 9.0 IBM x3650M2, 2xIntel X5570 (2.93 GHz), OVM 2 1000 0.29 9.0 IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere Configuration Summary Hardware Configuration: 1 x SPARC T3-1 server 1 x 1.65 GHz SPARC T3 128 GB memory 16 x 300 GB 10000 RPM SAS 1 x Sun Flash Accelerator F20 PCIe Card, 92 GB 1 x 10 GbE NIC 1 x SPARC Enterprise M3000 server 1 x 2.86 SPARC64 VII+ 64 GB memory 1 x 10 GbE NIC 2 x StorageTek 2540 + 2501 Software Configuration: JD Edwards EnterpriseOne 9.0.1 with Tools 8.98.3.3 Oracle Database 11g Release 2 Oracle 11g WebLogic server 11g Release 1 version 10.3.2 Oracle Web Tier Utilities 11g Oracle Solaris 10 9/10 Mercury LoadRunner 9.10 with Oracle Day in the Life kit for JD Edwards EnterpriseOne 9.0.1 Oracle’s Universal Batch Engine - Short UBEs and Long UBEs Benchmark Description JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations. Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and other manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company. The workload consists of online transactions and the UBE workload of 15 short and 4 long UBEs. LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time. The UBE processes workload runs from the JD Enterprise Application server. Oracle's UBE processes come as three flavors: Short UBEs < 1 minute engage in Business Report and Summary Analysis, Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address, Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs. The UBE workload generates large numbers of PDF files reports and log files. The UBE Queues are categorized as the QBATCHD, a single threaded queue for large UBEs, and the QPROCESS queue for short UBEs run concurrently. One of the Oracle Solaris Containers ran 4 Long UBEs, while another Container ran 15 short UBEs concurrently. The mixed size UBEs ran concurrently from the SPARC T3-1 server with the 5000 online users driven by the LoadRunner. Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute. Key Points and Best Practices Two JD Edwards EnterpriseOne Application Servers and two Oracle Fusion Middleware WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T3-1 server were hosted in four separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers. See Also SPARC T3-1 oracle.com SPARC Enterprise M3000 oracle.com Oracle Solaris oracle.com JD Edwards EnterpriseOne oracle.com Oracle Database 11g Release 2 Enterprise Edition oracle.com Disclosure Statement Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 6/27/2011.

Read the article
Running TeamCity from Amazon EC2 - Cloud based scalable build and continuous Integration

- by RoyOsherove

I’ve been having fun playing with the amazon EC2 cloud service. I set up a server running TeamCity, and an image of a server that just runs a TeamCity agent. I also setup TeamCity to automatically instantiate agents on EC2 and shut them down based upon availability of free agents. Here’s how I did it: The first step was setting up the teamcity server. Create an account on amazon EC2 (BTW, amazon’s sites works better in IE than it does in chrome.. who knew!?) Open the EC2 dashboard, and click “Launch Instance” . From the “Quick Start” tab I selected from the list: “Getting Started on Microsoft Windows Server 2008 (AMI Id: ami-c5e40dac)” . it’s good enough to just run teamcity. In the instance details, I used the default (Small instance, 1.7 GB mem). You might want to choose a close availability zone based on where you are. We want to “Launch instances” so click continue. Select the default kernel, RAM disk and all. No need to enable monitoring for now (you can do that later). click continue. If you don’t have a key pair, you will be prompted to create one. Once you do, select it in the list. Now you’ll be prompted to create a security group. I named mine “TC” as in “TeamCity”. each group is a bunch of settings on which ports can be let through into and out of a hosted machine. keep it as the default settings. We will change them later. Click continue, review and then click “Launch”. Now you’ll be able to see the new instance in the running instances list on your site. Now, you need to install stuff on that instance (TeamCity!) . To do that, you’ll need to Remote desktop into that instance. To do that, we’ll get the admin password for that instance: Check it on the list, and click “Instance Actions” - “Get Windows Admin Password”. You might have to wait about 10 minutes or so for the password to be generated for you. Once you have the password, you will remote desktop (start-run-‘mstsc’) into the instance. It’s address is a dns address shown below the list under “Public DNS”. it looks something like: ec2-256-226-194-91.compute-1.amazonaws.com Once you’re inside the instance – you’ll need to open IE (it is in hardened mode so you’ll have to relax its security settings to download stuff). I first downloaded chrome and using chrome I downloaded TeamCity. Note that the download speed is FAST. several MBs per second. To be able to see TeamCity from the outside, you will need to open the advanced firewall settings inside the remote machine, and add incoming and outgoing rules for port 80 (HTTP). Once you do that, you should be able to see the machine from the outside. If you still can’t, see the next step. I also enabled ports 9090 since I will use this machine to create an agent image later as well. Now configure the security group (TC) to enable talking to agents: IN the EC2 dashboard click on “Security Groups” and select your group. To add a rule, click on the empty list under the ‘protocol’ header. select TCP. from and ‘to’ ports are 9090. source ip is 0.0.0.0/0 (every ip is allowed). click “Save. Also make sure you can see “HTTP” tcp 80 in that list. if you can’t see it, add it or you won’t be able to browse to the machine’s teamcity server home page. I also set an elastic IP for the machine: so I always have the same IP for the machine instance. Allocate and set one through the”Elastic IP” link on the EC2 dashboard. you should now have a working instance of teamcity. Now let’s create an agent image. Repeat steps 1-9, but this time, make sure you select a machine that fits what an agent might do. I selected Instance type – Hihg-CPU medium machine, that is much faster. On that machine, I installed what I needed (VS 2010, PostSharp etc..). downloading VS 2010 from MSDN (2 GB took less than 10 min!) Now, instead of installing teamcity, browse using the browser to the teamcity homepage (from within the remote machine). go to the Administration page, and click the upper right link “Install agents”. Install the agent on he local machine – set it to the IP or DNS of the running TeamCity server. That way you’ll be able to check their connectivity live before making this machine your official agent image to reuse. Once the agent is installed, see that the TC server can see it and use it. see steps 13-14 above if they can’t. Once it works, you can take steps to make this image your agent image to be reused. next, here is a copy-paste of several steps to take from http://confluence.jetbrains.net/display/TCD5/Setting+Up+TeamCity+for+Amazon+EC2 Configure system so that agent it is started on machine boot (and make sure TeamCity server is accessible on machine boot). Test the setup by rebooting machine and checking that the agent connects normally to the server. Prepare the Image for bundling: Remove any temporary/history information in the system. Stop the agent (under Windows stop the service but leave it in Automatic startup type) Delete content agent logs and temp directories (not necessary) Delete "<Agent Home>/conf/amazon-*" file (not necessary) Change config/buildAgent.properties to remove properties: name, serverAddress, authToken (not necessary) Now, we need to: Make AMI from the running instance. Configure TeamCity EC2 support on TeamCity server. Making an AMI: Check the instance of the agent in the EC2 dashboard instance list, and select instance actions->Create Image (EBS AMI) you’ll see the image pending in the APIs list in the EC2 dashboard. this could take 30 minutes or more. meanwhile we can configure the could support in the teamcity server. COPY THE AMI ID to the clipboard (looks like ami-a88aa4ce) Configuring TeamCity for Cloud: In TeamCity, click on “Agents” and then on “Cloud” tab. this is where you will control your cloud agents. to configure new cloud agents based on APIs, click on the right link to the “configuration page” Create a new profile and select AMazon EC2 as cloud type. Use your AMI ID that you copied to the clipboard into the “Images” field. Select an availability zone that is the same as the one your instance is running on for best communication perf between them make sure you select the ‘TC’ security group hopefully, that should be it, and teamcity will try to instantiate new instances on demand. Note that it may take around 10 minutes for an agent to become available to teamcity from the time it’s started.

Read the article
MySQL Cluster 7.2: Over 8x Higher Performance than Cluster 7.1

- by Mat Keep

0 0 1 893 5092 Homework 42 11 5974 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary The scalability enhancements delivered by extensions to multi-threaded data nodes enables MySQL Cluster 7.2 to deliver over 8x higher performance than the previous MySQL Cluster 7.1 release on a recent benchmark What’s New in MySQL Cluster 7.2 MySQL Cluster 7.2 was released as GA (Generally Available) in February 2012, delivering many enhancements to performance on complex queries, new NoSQL Key / Value API, cross-data center replication and ease-of-use. These enhancements are summarized in the Figure below, and detailed in the MySQL Cluster New Features whitepaper Figure 1: Next Generation Web Services, Cross Data Center Replication and Ease-of-Use Once of the key enhancements delivered in MySQL Cluster 7.2 is extensions made to the multi-threading processes of the data nodes. Multi-Threaded Data Node Extensions The MySQL Cluster 7.2 data node is now functionally divided into seven thread types: 1) Local Data Manager threads (ldm). Note – these are sometimes also called LQH threads. 2) Transaction Coordinator threads (tc) 3) Asynchronous Replication threads (rep) 4) Schema Management threads (main) 5) Network receiver threads (recv) 6) Network send threads (send) 7) IO threads Each of these thread types are discussed in more detail below. MySQL Cluster 7.2 increases the maximum number of LDM threads from 4 to 16. The LDM contains the actual data, which means that when using 16 threads the data is more heavily partitioned (this is automatic in MySQL Cluster). Each LDM thread maintains its own set of data partitions, index partitions and REDO log. The number of LDM partitions per data node is not dynamically configurable, but it is possible, however, to map more than one partition onto each LDM thread, providing flexibility in modifying the number of LDM threads. The TC domain stores the state of in-flight transactions. This means that every new transaction can easily be assigned to a new TC thread. Testing has shown that in most cases 1 TC thread per 2 LDM threads is sufficient, and in many cases even 1 TC thread per 4 LDM threads is also acceptable. Testing also demonstrated that in some instances where the workload needed to sustain very high update loads it is necessary to configure 3 to 4 TC threads per 4 LDM threads. In the previous MySQL Cluster 7.1 release, only one TC thread was available. This limit has been increased to 16 TC threads in MySQL Cluster 7.2. The TC domain also manages the Adaptive Query Localization functionality introduced in MySQL Cluster 7.2 that significantly enhanced complex query performance by pushing JOIN operations down to the data nodes. Asynchronous Replication was separated into its own thread with the release of MySQL Cluster 7.1, and has not been modified in the latest 7.2 release. To scale the number of TC threads, it was necessary to separate the Schema Management domain from the TC domain. The schema management thread has little load, so is implemented with a single thread. The Network receiver domain was bound to 1 thread in MySQL Cluster 7.1. With the increase of threads in MySQL Cluster 7.2 it is also necessary to increase the number of recv threads to 8. This enables each receive thread to service one or more sockets used to communicate with other nodes the Cluster. The Network send thread is a new thread type introduced in MySQL Cluster 7.2. Previously other threads handled the sending operations themselves, which can provide for lower latency. To achieve highest throughput however, it has been necessary to create dedicated send threads, of which 8 can be configured. It is still possible to configure MySQL Cluster 7.2 to a legacy mode that does not use any of the send threads – useful for those workloads that are most sensitive to latency. The IO Thread is the final thread type and there have been no changes to this domain in MySQL Cluster 7.2. Multiple IO threads were already available, which could be configured to either one thread per open file, or to a fixed number of IO threads that handle the IO traffic. Except when using compression on disk, the IO threads typically have a very light load. Benchmarking the Scalability Enhancements The scalability enhancements discussed above have made it possible to scale CPU usage of each data node to more than 5x of that possible in MySQL Cluster 7.1. In addition, a number of bottlenecks have been removed, making it possible to scale data node performance by even more than 5x. Figure 2: MySQL Cluster 7.2 Delivers 8.4x Higher Performance than 7.1 The flexAsynch benchmark was used to compare MySQL Cluster 7.2 performance to 7.1 across an 8-node Intel Xeon x5670-based cluster of dual socket commodity servers (6 cores each). As the results demonstrate, MySQL Cluster 7.2 delivers over 8x higher performance per data nodes than MySQL Cluster 7.1. More details of this and other benchmarks will be published in a new whitepaper – coming soon, so stay tuned! In a following blog post, I’ll provide recommendations on optimum thread configurations for different types of server processor. You can also learn more from the Best Practices Guide to Optimizing Performance of MySQL Cluster Conclusion MySQL Cluster has achieved a range of impressive benchmark results, and set in context with the previous 7.1 release, is able to deliver over 8x higher performance per node. As a result, the multi-threaded data node extensions not only serve to increase performance of MySQL Cluster, they also enable users to achieve significantly improved levels of utilization from current and future generations of massively multi-core, multi-thread processor designs.

Read the article
Thread placement policies on NUMA systems - update

- by Dave

In a prior blog entry I noted that Solaris used a "maximum dispersal" placement policy to assign nascent threads to their initial processors. The general idea is that threads should be placed as far away from each other as possible in the resource topology in order to reduce resource contention between concurrently running threads. This policy assumes that resource contention -- pipelines, memory channel contention, destructive interference in the shared caches, etc -- will likely outweigh (a) any potential communication benefits we might achieve by packing our threads more densely onto a subset of the NUMA nodes, and (b) benefits of NUMA affinity between memory allocated by one thread and accessed by other threads. We want our threads spread widely over the system and not packed together. Conceptually, when placing a new thread, the kernel picks the least loaded node NUMA node (the node with lowest aggregate load average), and then the least loaded core on that node, etc. Furthermore, the kernel places threads onto resources -- sockets, cores, pipelines, etc -- without regard to the thread's process membership. That is, initial placement is process-agnostic. Keep reading, though. This description is incorrect. On Solaris 10 on a SPARC T5440 with 4 x T2+ NUMA nodes, if the system is otherwise unloaded and we launch a process that creates 20 compute-bound concurrent threads, then typically we'll see a perfect balance with 5 threads on each node. We see similar behavior on an 8-node x86 x4800 system, where each node has 8 cores and each core is 2-way hyperthreaded. So far so good; this behavior seems in agreement with the policy I described in the 1st paragraph. I recently tried the same experiment on a 4-node T4-4 running Solaris 11. Both the T5440 and T4-4 are 4-node systems that expose 256 logical thread contexts. To my surprise, all 20 threads were placed onto just one NUMA node while the other 3 nodes remained completely idle. I checked the usual suspects such as processor sets inadvertently left around by colleagues, processors left offline, and power management policies, but the system was configured normally. I then launched multiple concurrent instances of the process, and, interestingly, all the threads from the 1st process landed on one node, all the threads from the 2nd process landed on another node, and so on. This happened even if I interleaved thread creating between the processes, so I was relatively sure the effect didn't related to thread creation time, but rather that placement was a function of process membership. I this point I consulted the Solaris sources and talked with folks in the Solaris group. The new Solaris 11 behavior is intentional. The kernel is no longer using a simple maximum dispersal policy, and thread placement is process membership-aware. Now, even if other nodes are completely unloaded, the kernel will still try to pack new threads onto the home lgroup (socket) of the primordial thread until the load average of that node reaches 50%, after which it will pick the next least loaded node as the process's new favorite node for placement. On the T4-4 we have 64 logical thread contexts (strands) per socket (lgroup), so if we launch 48 concurrent threads we will find 32 placed on one node and 16 on some other node. If we launch 64 threads we'll find 32 and 32. That means we can end up with our threads clustered on a small subset of the nodes in a way that's quite different that what we've seen on Solaris 10. So we have a policy that allows process-aware packing but reverts to spreading threads onto other nodes if a node becomes too saturated. It turns out this policy was enabled in Solaris 10, but certain bugs suppressed the mixed packing/spreading behavior. There are configuration variables in /etc/system that allow us to dial the affinity between nascent threads and their primordial thread up and down: see lgrp_expand_proc_thresh, specifically. In the OpenSolaris source code the key routine is mpo_update_tunables(). This method reads the /etc/system variables and sets up some global variables that will subsequently be used by the dispatcher, which calls lgrp_choose() in lgrp.c to place nascent threads. Lgrp_expand_proc_thresh controls how loaded an lgroup must be before we'll consider homing a process's threads to another lgroup. Tune this value lower to have it spread your process's threads out more. To recap, the 'new' policy is as follows. Threads from the same process are packed onto a subset of the strands of a socket (50% for T-series). Once that socket reaches the 50% threshold the kernel then picks another preferred socket for that process. Threads from unrelated processes are spread across sockets. More precisely, different processes may have different preferred sockets (lgroups). Beware that I've simplified and elided details for the purposes of explication. The truth is in the code. Remarks: It's worth noting that initial thread placement is just that. If there's a gross imbalance between the load on different nodes then the kernel will migrate threads to achieve a better and more even distribution over the set of available nodes. Once a thread runs and gains some affinity for a node, however, it becomes "stickier" under the assumption that the thread has residual cache residency on that node, and that memory allocated by that thread resides on that node given the default "first-touch" page-level NUMA allocation policy. Exactly how the various policies interact and which have precedence under what circumstances could the topic of a future blog entry. The scheduler is work-conserving. The x4800 mentioned above is an interesting system. Each of the 8 sockets houses an Intel 7500-series processor. Each processor has 3 coherent QPI links and the system is arranged as a glueless 8-socket twisted ladder "mobius" topology. Nodes are either 1 or 2 hops distant over the QPI links. As an aside the mapping of logical CPUIDs to physical resources is rather interesting on Solaris/x4800. On SPARC/Solaris the CPUID layout is strictly geographic, with the highest order bits identifying the socket, the next lower bits identifying the core within that socket, following by the pipeline (if present) and finally the logical thread context ("strand") on the core. But on Solaris on the x4800 the CPUID layout is as follows. [6:6] identifies the hyperthread on a core; bits [5:3] identify the socket, or package in Intel terminology; bits [2:0] identify the core within a socket. Such low-level details should be of interest only if you're binding threads -- a bad idea, the kernel typically handles placement best -- or if you're writing NUMA-aware code that's aware of the ambient placement and makes decisions accordingly. Solaris introduced the so-called critical-threads mechanism, which is expressed by putting a thread into the FX scheduling class at priority 60. The critical-threads mechanism applies to placement on cores, not on sockets, however. That is, it's an intra-socket policy, not an inter-socket policy. Solaris 11 introduces the Power Aware Dispatcher (PAD) which packs threads instead of spreading them out in an attempt to be able to keep sockets or cores at lower power levels. Maximum dispersal may be good for performance but is anathema to power management. PAD is off by default, but power management polices constitute yet another confounding factor with respect to scheduling and dispatching. If your threads communicate heavily -- one thread reads cache lines last written by some other thread -- then the new dense packing policy may improve performance by reducing traffic on the coherent interconnect. On the other hand if your threads in your process communicate rarely, then it's possible the new packing policy might result on contention on shared computing resources. Unfortunately there's no simple litmus test that says whether packing or spreading is optimal in a given situation. The answer varies by system load, application, number of threads, and platform hardware characteristics. Currently we don't have the necessary tools and sensoria to decide at runtime, so we're reduced to an empirical approach where we run trials and try to decide on a placement policy. The situation is quite frustrating. Relatedly, it's often hard to determine just the right level of concurrency to optimize throughput. (Understanding constructive vs destructive interference in the shared caches would be a good start. We could augment the lines with a small tag field indicating which strand last installed or accessed a line. Given that, we could augment the CPU with performance counters for misses where a thread evicts a line it installed vs misses where a thread displaces a line installed by some other thread.)

Read the article
SQL SERVER – Weekly Series – Memory Lane – #035

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Row Overflow Data Explanation In SQL Server 2005 one table row can contain more than one varchar(8000) fields. One more thing, the exclusions has exclusions also the limit of each individual column max width of 8000 bytes does not apply to varchar(max), nvarchar(max), varbinary(max), text, image or xml data type columns. Comparison Index Fragmentation, Index De-Fragmentation, Index Rebuild – SQL SERVER 2000 and SQL SERVER 2005 An old but like a gold article. Talks about lots of concepts related to Index and the difference from earlier version to the newer version. I strongly suggest that everyone should read this article just to understand how SQL Server has moved forward with the technology. Improvements in TempDB SQL Server 2005 had come up with quite a lots of improvements and this blog post describes them and explains the same. If you ask me what is my the most favorite article from early career. I must point out to this article as when I wrote this one I personally have learned a lot of new things. Recompile All The Stored Procedure on Specific TableI prefer to recompile all the stored procedure on the table, which has faced mass insert or update. sp_recompiles marks stored procedures to recompile when they execute next time. This blog post explains the same with the help of a script. 2008 SQLAuthority Download – SQL Server Cheatsheet You can download and print this cheat sheet and use it for your personal reference. If you have any suggestions, please let me know and I will see if I can update this SQL Server cheat sheet. Difference Between DBMS and RDBMS What is the difference between DBMS and RDBMS? DBMS – Data Base Management System RDBMS – Relational Data Base Management System or Relational DBMS High Availability – Hot Add Memory Hot Add CPU and Hot Add Memory are extremely interesting features of the SQL Server, however, personally I have not witness them heavily used. These features also have few restriction as well. I blogged about them in detail. 2009 Delete Duplicate Rows I have demonstrated in this blog post how one can identify and delete duplicate rows. Interesting Observation of Logon Trigger On All Servers – Solution The question I put forth in my previous article was – In single login why the trigger fires multiple times; it should be fired only once. I received numerous answers in thread as well as in my MVP private news group. Now, let us discuss the answer for the same. The answer is – It happens because multiple SQL Server services are running as well as intellisense is turned on. Blog post demonstrates how we can do the same with the help of SQL scripts. Management Studio New Features I have selected my favorite 5 features and blogged about it. IntelliSense for Query Editing Multi Server Query Query Editor Regions Object Explorer Enhancements Activity Monitors Maximum Number of Index per Table One of the questions I asked in my user group was – What is the maximum number of Index per table? I received lots of answers to this question but only two answers are correct. Let us now take a look at them in this blog post. 2010 Default Statistics on Column – Automatic Statistics on Column The truth is, Statistics can be in a table even though there is no Index in it. If you have the auto- create and/or auto-update Statistics feature turned on for SQL Server database, Statistics will be automatically created on the Column based on a few conditions. Please read my previously posted article, SQL SERVER – When are Statistics Updated – What triggers Statistics to Update, for the specific conditions when Statistics is updated. 2011 T-SQL Scripts to Find Maximum between Two Numbers In this blog post there are two different scripts listed which demonstrates way to find the maximum number between two numbers. I need your help, which one of the script do you think is the most accurate way to find maximum number? Find Details for Statistics of Whole Database – DMV – T-SQL Script I was recently asked is there a single script which can provide all the necessary details about statistics for any database. This question made me write following script. I was initially planning to use sp_helpstats command but I remembered that this is marked to be deprecated in future. 2012 Introduction to Function SIGN SIGN Function is very fundamental function. It will return the value 1, -1 or 0. If your value is negative it will return you negative -1 and if it is positive it will return you positive +1. Let us start with a simple small example. Template Browser – A Very Important and Useful Feature of SSMS Templates are like a quick cheat sheet or quick reference. Templates are available to create objects like databases, tables, views, indexes, stored procedures, triggers, statistics, and functions. Templates are also available for Analysis Services as well. The template scripts contain parameters to help you customize the code. You can Replace Template Parameters dialog box to insert values into the script. An invalid floating point operation occurred If you run any of the above functions they will give you an error related to invalid floating point. Honestly there is no workaround except passing the function appropriate values. SQRT of a negative number will give you result in real numbers which is not supported at this point of time as well LOG of a negative number is not possible (because logarithm is the inverse function of an exponential function and the exponential function is NEVER negative). Validating Spatial Object with IsValidDetailed Function SQL Server 2012 has introduced the new function IsValidDetailed(). This function has made my life very easy. In simple words, this function will check if the spatial object passed is valid or not. If it is valid it will give information that it is valid. If the spatial object is not valid it will return the answer that it is not valid and the reason for the same. This makes it very easy to debug the issue and make the necessary correction. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Oracle Expands Sun Blade Portfolio for Cloud and Highly Virtualized Environments

- by Ferhat Hatay

Oracle announced the expansion of Sun Blade Portfolio for cloud and highly virtualized environments that deliver powerful performance and simplified management as tightly integrated systems. Along with the SPARC T3-1B blade server, Oracle VM blade cluster reference configuration and Oracle's optimized solution for Oracle WebLogic Suite, Oracle introduced the dual-node Sun Blade X6275 M2 server module with some impressive benchmark results. Benchmarks on the Sun Blade X6275 M2 server module demonstrate the outstanding performance characteristics critical for running varied commercial applications used in cloud and highly virtualized environments. These include best-in-class SPEC CPU2006 results with the Intel Xeon processor 5600 series, six Fluent world records and 1.8 times the price-performance of the IBM Power 755 running NAMD, a prominent bio-informatics workload. Benchmarks for Sun Blade X6275 M2 server module SPEC CPU2006 The Sun Blade X6275 M2 server module demonstrated best in class SPECint_rate2006 results for all published results using the Intel Xeon processor 5600 series, with a result of 679. This result is 97% better than the HP BL460c G7 blade, 80% better than the IBM HS22V blade, and 79% better than the Dell M710 blade. This result demonstrates the density advantage of the new Oracle's server module for space-constrained data centers. Sun Blade X6275M2 (2 Nodes, Intel Xeon X5670 2.93GHz) - 679 SPECint_rate2006; HP ProLiant BL460c G7 (2.93 GHz, Intel Xeon X5670) - 347 SPECint_rate2006; IBM BladeCenter HS22V (Intel Xeon X5680) - 377 SPECint_rate2006; Dell PowerEdge M710 (Intel Xeon X5680, 3.33 GHz) - 380 SPECint_rate2006. SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/24/2010 and this report. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf Fluent The Sun Fire X6275 M2 server module produced world-record results on each of the six standard cases in the current "FLUENT 12" benchmark test suite at 8-, 12-, 24-, 32-, 64- and 96-core configurations. These results beat the most recent QLogic score with IBM DX 360 M series platforms and QLogic "Truescale" interconnects. Results on sedan_4m test case on the Sun Blade X6275 M2 server module are 23% better than the HP C7000 system, and 20% better than the IBM DX 360 M2; Dell has not posted a result for this test case. Results can be found at the FLUENT website. ANSYS's FLUENT software solves fluid flow problems, and is based on a numerical technique called computational fluid dynamics (CFD), which is used in the automotive, aerospace, and consumer products industries. The FLUENT 12 benchmark test suite consists of seven models that are well suited for multi-node clustered environments and representative of modern engineering CFD clusters. Vendors benchmark their systems with the principal objective of providing comparative performance information for FLUENT software that, among other things, depends on compilers, optimization, interconnect, and the performance characteristics of the hardware. FLUENT application performance is representative of other commercial applications that require memory and CPU resources to be available in a scalable cluster-ready format. FLUENT benchmark has six conventional test cases (eddy_417k, turbo_500k, aircraft_2m, sedan_4m, truck_14m, truck_poly_14m) at various core counts. All information on the FLUENT website (http://www.fluent.com) is Copyrighted1995-2010 by ANSYS Inc. Results as of November 24, 2010. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf NAMD Results on the Sun Blade X6275 M2 server module running NAMD (a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems) show up to a 1.8X better price/performance than IBM's Power 7-based system. For space-constrained environments, the ultra-dense Sun Blade X6275 M2 server module provides a 1.7X better price/performance per rack unit than IBM's system. IBM Power 755 4-way Cluster (16U). Total price for cluster: $324,212. See IBM United States Hardware Announcement 110-008, dated February 9, 2010, pp. 4, 21 and 39-46. Sun Blade X6275 M2 8-Blade Cluster (10U). Total price for cluster: $193,939. Price/performance and performance/RU comparisons based on f1ATPase molecule test results. Sun Blade X6275 M2 cluster: $3,568/step/sec, 5.435 step/sec/RU. IBM Power 755 cluster: $6,355/step/sec, 3.189 step/sec/U. See http://www-03.ibm.com/systems/power/hardware/reports/system_perf.html. See http://www.ks.uiuc.edu/Research/namd/performance.html for more information, results as of 11/24/10. For more specifics about these results, please go to see http://blogs.sun.com/BestPerf Reverse Time Migration The Reverse Time Migration is heavily used in geophysical imaging and modeling for Oil & Gas Exploration. The Sun Blade X6275 M2 server module showed up to a 40% performance improvement over the previous generation server module with super-linear scalability to 16 nodes for the 9-Point Stencil used in this Reverse Time Migration computational kernel. The balanced combination of Oracle's Sun Storage 7410 system with the Sun Blade X6275 M2 server module cluster showed linear scalability for the total application throughput, including the I/O and MPI communication, to produce a final 3-D seismic depth imaged cube for interpretation. The final image write time from the Sun Blade X6275 M2 server module nodes to Oracle's Sun Storage 7410 system achieved 10GbE line speed of 1.25 GBytes/second or better performance. Between subsequent runs, the effects of I/O buffer caching on the Sun Blade X6275 M2 server module nodes and write optimized caching on the Sun Storage 7410 system gave up to 1.8 GBytes/second effective write performance. The performance results and characterization of this Reverse Time Migration benchmark could serve as a useful measure for many other I/O intensive commercial applications. 3D VTI Reverse Time Migration Seismic Depth Imaging, see http://blogs.sun.com/BestPerf/entry/3d_vti_reverse_time_migration for more information, results as of 11/14/2010.

Read the article
nginx problem accessing virtual hosts

- by Sc0rian

I am setting up nginx as a reverse proxy. The server runs on directadmin and lamp stack. I have nginx running on port 81. I can access all my sites (including virtual ips) on the port 81. However when I forward the traffic from port 80 to 81, the virtual ips have a message saying "Apache is running normally". Server IPs are fine, and I can still access virtual IP's on 81. [root@~]# netstat -an | grep LISTEN | egrep ":80|:81" tcp 0 0 <virtual ip>:81 0.0.0.0:* LISTEN tcp 0 0 <virtual ip>:81 0.0.0.0:* LISTEN tcp 0 0 <serverip>:81 0.0.0.0:* LISTEN tcp 0 0 :::80 :::* LISTEN apache 24090 0.6 1.3 29252 13612 ? S 18:34 0:00 /usr/sbin/httpd -k start -DSSL apache 24092 0.9 2.1 39584 22056 ? S 18:34 0:00 /usr/sbin/httpd -k start -DSSL apache 24096 0.2 1.9 35892 20256 ? S 18:34 0:00 /usr/sbin/httpd -k start -DSSL apache 24120 0.3 1.7 35752 17840 ? S 18:34 0:00 /usr/sbin/httpd -k start -DSSL apache 24495 0.0 1.4 30892 14756 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24496 1.0 2.1 39892 22164 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24516 1.5 3.6 55496 38040 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24519 0.1 1.2 28996 13224 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24521 2.7 4.0 58244 41984 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24522 0.0 1.2 29124 12672 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24524 0.0 1.1 28740 12364 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24535 1.1 1.7 36008 17876 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24536 0.0 1.1 28592 12084 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24537 0.0 1.1 28592 12112 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24539 0.0 0.0 0 0 ? Z 18:35 0:00 [httpd] <defunct> apache 24540 0.0 1.1 28592 11540 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL apache 24541 0.0 1.1 28592 11548 ? S 18:35 0:00 /usr/sbin/httpd -k start -DSSL root 24548 0.0 0.0 4132 752 pts/0 R+ 18:35 0:00 egrep apache|nginx root 28238 0.0 0.0 19576 284 ? Ss May29 0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf apache 28239 0.0 0.0 19888 804 ? S May29 0:00 nginx: worker process apache 28240 0.0 0.0 19888 548 ? S May29 0:00 nginx: worker process apache 28241 0.0 0.0 19736 484 ? S May29 0:00 nginx: cache manager process here is my nginx conf: cat /usr/local/nginx/conf/nginx.conf user apache apache; worker_processes 2; # Set it according to what your CPU have. 4 Cores = 4 worker_rlimit_nofile 8192; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; server_tokens off; access_log /var/log/nginx_access.log main; error_log /var/log/nginx_error.log debug; server_names_hash_bucket_size 64; sendfile on; tcp_nopush on; tcp_nodelay off; keepalive_timeout 30; gzip on; gzip_comp_level 9; gzip_proxied any; proxy_buffering on; proxy_cache_path /usr/local/nginx/proxy_temp levels=1:2 keys_zone=one:15m inactive=7d max_size=1000m; proxy_buffer_size 16k; proxy_buffers 100 8k; proxy_connect_timeout 60; proxy_send_timeout 60; proxy_read_timeout 60; server { listen <server ip>:81 default rcvbuf=8192 sndbuf=16384 backlog=32000; # Real IP here server_name <server host name> _; # "_" is for handle all hosts that are not described by server_name charset off; access_log /var/log/nginx_host_general.access.log main; location / { proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_pass http://<server ip>; # Real IP here client_max_body_size 16m; client_body_buffer_size 128k; proxy_buffering on; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 120; proxy_buffer_size 16k; proxy_buffers 32 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } } include /usr/local/nginx/vhosts/*.conf; } here is my vhost conf: # cat /usr/local/nginx/vhosts/1.conf server { listen <virt ip>:81 default rcvbuf=8192 sndbuf=16384 backlog=32000; # Real IP here server_name <virt domain name>.com ; # "_" is for handle all hosts that are not described by server_name charset off; access_log /var/log/nginx_host_general.access.log main; location / { proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_pass http://<virt ip>; # Real IP here client_max_body_size 16m; client_body_buffer_size 128k; proxy_buffering on; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 120; proxy_buffer_size 16k; proxy_buffers 32 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } }

Read the article
Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article
SQL SERVER – Weekly Series – Memory Lane – #032

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Complete Series of Database Coding Standards and Guidelines SQL SERVER Database Coding Standards and Guidelines – Introduction SQL SERVER – Database Coding Standards and Guidelines – Part 1 SQL SERVER – Database Coding Standards and Guidelines – Part 2 SQL SERVER Database Coding Standards and Guidelines Complete List Download Explanation and Example – SELF JOIN When all of the data you require is contained within a single table, but data needed to extract is related to each other in the table itself. Examples of this type of data relate to Employee information, where the table may have both an Employee’s ID number for each record and also a field that displays the ID number of an Employee’s supervisor or manager. To retrieve the data tables are required to relate/join to itself. Insert Multiple Records Using One Insert Statement – Use of UNION ALL This is very interesting question I have received from new developer. How can I insert multiple values in table using only one insert? Now this is interesting question. When there are multiple records are to be inserted in the table following is the common way using T-SQL. Function to Display Current Week Date and Day – Weekly Calendar Straight blog post with script to find current week date and day based on the parameters passed in the function. 2008 In my beginning years, I have almost same confusion as many of the developer had in their earlier years. Here are two of the interesting question which I have attempted to answer in my early year. Even if you are experienced developer may be you will still like to read following two questions: Order Of Column In Index Order of Conditions in WHERE Clauses Example of DISTINCT in Aggregate Functions Have you ever used DISTINCT with the Aggregation Function? Here is a simple example about how users can do it. Create a Comma Delimited List Using SELECT Clause From Table Column Straight to script example where I explained how to do something easy and quickly. Compound Assignment Operators SQL SERVER 2008 has introduced new concept of Compound Assignment Operators. Compound Assignment Operators are available in many other programming languages for quite some time. Compound Assignment Operators is operator where variables are operated upon and assigned on the same line. PIVOT and UNPIVOT Table Examples Here is a very interesting question – the answer to the question can be YES or NO both. “If we PIVOT any table and UNPIVOT that table do we get our original table?” Read the blog post to get the explanation of the question above. 2009 What is Interim Table – Simple Definition of Interim Table The interim table is a table that is generated by joining two tables and not the final result table. In other words, when two tables are joined they create an interim table as resultset but the resultset is not final yet. It may be possible that more tables are about to join on the interim table, and more operations are still to be applied on that table (e.g. Order By, Having etc). Besides, it may be possible that there is no interim table; sometimes final table is what is generated when the query is run. 2010 Stored Procedure and Transactions If Stored Procedure is transactional then, it should roll back complete transactions when it encounters any errors. Well, that does not happen in this case, which proves that Stored Procedure does not only provide just the transactional feature to a batch of T-SQL. Generate Database Script for SQL Azure When talking about SQL Azure the most common complaint I hear is that the script generated from stand-along SQL Server database is not compatible with SQL Azure. This was true for some time for sure but not any more. If you have SQL Server 2008 R2 installed you can follow the guideline below to generate a script which is compatible with SQL Azure. Convert IN to EXISTS – Performance Talk It is NOT necessary that every time when IN is replaced by EXISTS it gives better performance. However, in our case listed above it does for sure give better performance. You can read about this subject in the associated blog post. Subquery or Join – Various Options – SQL Server Engine Knows the Best Every single time whenever there is a performance tuning exercise, I hear the conversation from developer where some prefer subquery and some prefer join. In this two part blog post, I explain the same in the detail with examples. Part 1 | Part 2 Merge Operations – Insert, Update, Delete in Single Execution MERGE is a new feature that provides an efficient way to do multiple DML operations. In earlier versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions; however, at present, by using the MERGE statement, we can include the logic of such data changes in one statement that even checks when the data is matched and then just update it, and similarly, when the data is unmatched, it is inserted. 2011 Puzzle – Statistics are not updated but are Created Once Here is the quick scenario about my setup. Create Table Insert 1000 Records Check the Statistics Now insert 10 times more 10,000 indexes Check the Statistics – it will be NOT updated – WHY? Question to You – When to use Function and When to use Stored Procedure Personally, I believe that they are both different things - they cannot be compared. I can say, it will be like comparing apples and oranges. Each has its own unique use. However, they can be used interchangeably at many times and in real life (i.e., production environment). I have personally seen both of these being used interchangeably many times. This is the precise reason for asking this question. 2012 In year 2012 I had two interesting series ran on the blog. If there is no fun in learning, the learning becomes a burden. For the same reason, I had decided to build a three part quiz around SEQUENCE. The quiz was to identify the next value of the sequence. I encourage all of you to take part in this fun quiz. Guess the Next Value – Puzzle 1 Guess the Next Value – Puzzle 2 Guess the Next Value – Puzzle 3 Guess the Next Value – Puzzle 4 Simple Example to Configure Resource Governor – Introduction to Resource Governor Resource Governor is a feature which can manage SQL Server Workload and System Resource Consumption. We can limit the amount of CPU and memory consumption by limiting /governing /throttling on the SQL Server. If there are different workloads running on SQL Server and each of the workload needs different resources or when workloads are competing for resources with each other and affecting the performance of the whole server resource governor is a very important task. Tricks to Replace SELECT * with Column Names – SQL in Sixty Seconds #017 – Video Retrieves unnecessary columns and increases network traffic When a new columns are added views needs to be refreshed manually Leads to usage of sub-optimal execution plan Uses clustered index in most of the cases instead of using optimal index It is difficult to debug SQL SERVER – Load Generator – Free Tool From CodePlex The best part of this SQL Server Load Generator is that users can run multiple simultaneous queries again SQL Server using different login account and different application name. The interface of the tool is extremely easy to use and very intuitive as well. A Puzzle – Swap Value of Column Without Case Statement Let us assume there is a single column in the table called Gender. The challenge is to write a single update statement which will flip or swap the value in the column. For example if the value in the gender column is ‘male’ swap it with ‘female’ and if the value is ‘female’ swap it with ‘male’. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
NVIDIA x server - "sudo nvidia config" does not generate a working 'xorg.config'

- by Mike

I am over 18 hours deep on this challenge. I got to this point and am stuck. very stuck. Maybe you can figure it out? Ubuntu Version 12.04 LTS with all the updates installed. Problem: The default settings in "etc/X11/xorg.conf" that are generated by the "nvidia-xconfig" tool, do not allow the NVIDIA x server to connect to the driver in my "System Settings Additional Driver window". (that's how I understand it. Lots of information below). Symptoms of Problem "System Settings Additional Driver" window has drivers, but the nvidia x server cannot connect/utilize any of the 4 drivers. the drivers are activated, but not in use. When I go to "System Tools Administration NVIDIA x server settings" I get an error that basically tells me to create a default file to initialize the NVIDIA X server (screen shot below). This is the messages the terminal gives after running a "sudo nvidia-xconfig" command for the first time. It seems that the generated file by the tool i just ran is generating a bad/unusable file: If I run the "sudo nvidia-xconfig" command again, I wont get an error the second time. However when I reboot, the default file that is generated (etc/X11/xorg.conf) simply puts the screen resolution at 800 x 600 (or something big like that). When I try to go to NVIDIA x server settings I am greeted with the same screen as the screen shot as in symptom 2 (no option to change the resolution). If I try to go to "system settings display" there are no other resolutions to choose from. At this point I must delete the newly minted "xorg.conf" and reinstate the original in its place. Here are the contents of the "xorg.conf" that is generated first (the one missing required information): # nvidia-xconfig: X configuration file generated by nvidia-xconfig # nvidia-xconfig: version 304.88 (buildmeister@swio-display-x86-rhel47-06) Wed Mar 27 15:32:58 PDT 2013 Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" EndSection Section "Files" EndSection Section "InputDevice" # generated from default Identifier "Mouse0" Driver "mouse" Option "Protocol" "auto" Option "Device" "/dev/psaux" Option "Emulate3Buttons" "no" Option "ZAxisMapping" "4 5" EndSection Section "InputDevice" # generated from default Identifier "Keyboard0" Driver "kbd" EndSection Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Unknown" HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option "DPMS" EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 SubSection "Display" Depth 24 EndSubSection EndSection Hardware: I ran the "lspci|grep VGA". There results are: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [Quadro 1000M] (rev a1) More Hardware info: Ram: 16GB CPU: Intel Core i7-2720QM @2.2GHz * 8 Other: 64 bit. This is a triple boot computer and not a VM. Attempts With Not Success on My End: 1) Tried to append the "xorg.conf" with what I perceive is missing information and obviously it didn't fly. 2) All the other stuff I tried got me to this point. 3) See if this link is helpful to you (I barely get it, but i get enough knowing that a smarter person might find this useful): http://manpages.ubuntu.com/manpages/lucid/man1/nvidia-xconfig.1.html 4) I am completely new to Linux (40 hours over past week), but not to programming. However I am very serious about changing over to Linux. When you respond (I hope someone responds...) please respond in a way that a person new to Linux can understand. 5) By the way, the reason I am in this mess is because I MUST have a second monitor running from my laptop, and "System Settings Display" doesn't recognize my second display. I know it is possible to make the second display work in my system, because when I boot from the install CD, I perform work on the native laptop monitor, but the second monitor shows a purple screen with Ubuntu in the middle, so I know the VGA port is sending a signal out. If this is too much for you to tackle please suggest an alternative method to get a second display. I don't want to go to windows but I cannot have a single display. I am really fudged here. I hope some smart person can help. Thanks in advance. Mike. **********************EDIT #1********************** More Details About Graphics Card I was asked "which brand of nvidia-card do you have exactly?" Here is what I did to provide more info (maybe relevant, maybe not, but here is everything): 1) Took my Lenovo W520 right apart to see if there is an identifier on the actual card. However I realized that if I get deep enough to take a look, the laptop "won't like it". so I put it back together. Figuring out the card this way is not an option for me right now. 2) (My computer is triple boot) I logged into Win7 and ran 'dxdiag' command. here is the screen shot: 3) I tried to look on the lenovo website for more details... but no luck. I took a look at my receipts and here is info form receipt: System Unit: W520 NVIDIA Quadro 1000M 2GB 4) In win7 I went to the NVIDIA website and used the option to have my card 'scanned' by a Java applet to determine the latest update for my card. I tried the same with Ubuntu but I can't get the applet to run. Here is the recommended driver from from the NVIDIA Applet for my card for Win7 (I hope this shines some light on the specifics of the card): Quadro/NVS/Tesla/GRID Desktop Driver Release R319 Version: 320.00 WHQL Release Date: 3.5.2013 5) Also I went on the NVIDIA driver search and looked through every possible combination of product type + product series + product to find all the combinations that yield a 1000M card. My card is: Product Type: Quadro Product Series: Quadro Series (Notebooks) Product: 1000M ***********************EDIT #2******************* Additional Symptoms Another question that generated more symptoms I previously didn't mention was: "After generating xorg.conf by nvidia-xconfig, go to additional drivers, do you see nvidia-304?" 1) I took a screen shot of the "additional drivers" right after generating xorg.conf by nvidia-xconfig. Here it is: 2) Then I did a reboot. Now Ubuntu is 600 x 800 resolution. When I logged in after the computer came up I got an error (which I always get after generating xorg.conf by nvidia-xconfig and rebooting) 3) To finally answer the question - No. There is no "NVIDIA-304" driver. Screen shot of additional drivers after generating xorg.conf by nvidia-xconfig and rebooting : At this point I revert to the original xorg.conf and delete the xorg.conf generated by Nvidia.

Read the article
C# Performance Pitfall – Interop Scenarios Change the Rules

- by Reed

C# and .NET, overall, really do have fantastic performance in my opinion. That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar. However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++. Often, they’re exactly backwards. For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages. If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler. Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph. For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data. As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack. However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem. In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating. At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit. In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes. This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code. For good or bad, this is typically not a good idea. The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea. In this scenario, for example, I may have been much better off leaving the original code alone. The reason for this is the garbage collector. The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages. First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code. This is something that should be avoided unless there is a significant reason. Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast. Finally, and most critically, there is a large advantage to having short lived objects. If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2). This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops. This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1. It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean. It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true. That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary. In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects. In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph. Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply. After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration. This required going back to a “native programming” mindset for optimization. Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

Read the article
tile_static, tile_barrier, and tiled matrix multiplication with C++ AMP

- by Daniel Moth

We ended the previous post with a mechanical transformation of the C++ AMP matrix multiplication example to the tiled model and in the process introduced tiled_index and tiled_grid. This is part 2. tile_static memory You all know that in regular CPU code, static variables have the same value regardless of which thread accesses the static variable. This is in contrast with non-static local variables, where each thread has its own copy. Back to C++ AMP, the same rules apply and each thread has its own value for local variables in your lambda, whereas all threads see the same global memory, which is the data they have access to via the array and array_view. In addition, on an accelerator like the GPU, there is a programmable cache, a third kind of memory type if you'd like to think of it that way (some call it shared memory, others call it scratchpad memory). Variables stored in that memory share the same value for every thread in the same tile. So, when you use the tiled model, you can have variables where each thread in the same tile sees the same value for that variable, that threads from other tiles do not. The new storage class for local variables introduced for this purpose is called tile_static. You can only use tile_static in restrict(direct3d) functions, and only when explicitly using the tiled model. What this looks like in code should be no surprise, but here is a snippet to confirm your mental image, using a good old regular C array // each tile of threads has its own copy of locA, // shared among the threads of the tile tile_static float locA[16][16]; Note that tile_static variables are scoped and have the lifetime of the tile, and they cannot have constructors or destructors. tile_barrier In amp.h one of the types introduced is tile_barrier. You cannot construct this object yourself (although if you had one, you could use a copy constructor to create another one). So how do you get one of these? You get it, from a tiled_index object. Beyond the 4 properties returning index objects, tiled_index has another property, barrier, that returns a tile_barrier object. The tile_barrier class exposes a single member, the method wait. 15: // Given a tiled_index object named t_idx 16: t_idx.barrier.wait(); 17: // more code …in the code above, all threads in the tile will reach line 16 before a single one progresses to line 17. Note that all threads must be able to reach the barrier, i.e. if you had branchy code in such a way which meant that there is a chance that not all threads could reach line 16, then the code above would be illegal. Tiled Matrix Multiplication Example – part 2 So now that we added to our understanding the concepts of tile_static and tile_barrier, let me obfuscate rewrite the matrix multiplication code so that it takes advantage of tiling. Before you start reading this, I suggest you get a cup of your favorite non-alcoholic beverage to enjoy while you try to fully understand the code. 01: void MatrixMultiplyTiled(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: static const int TS = 16; 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M,N,vC); 07: parallel_for_each(c.grid.tile< TS, TS >(), 08: [=] (tiled_index< TS, TS> t_idx) restrict(direct3d) 09: { 10: int row = t_idx.local[0]; int col = t_idx.local[1]; 11: float sum = 0.0f; 12: for (int i = 0; i < W; i += TS) { 13: tile_static float locA[TS][TS], locB[TS][TS]; 14: locA[row][col] = a(t_idx.global[0], col + i); 15: locB[row][col] = b(row + i, t_idx.global[1]); 16: t_idx.barrier.wait(); 17: for (int k = 0; k < TS; k++) 18: sum += locA[row][k] * locB[k][col]; 19: t_idx.barrier.wait(); 20: } 21: c[t_idx.global] = sum; 22: }); 23: } Notice that all the code up to line 9 is the same as per the changes we made in part 1 of tiling introduction. If you squint, the body of the lambda itself preserves the original algorithm on lines 10, 11, and 17, 18, and 21. The difference being that those lines use new indexing and the tile_static arrays; the tile_static arrays are declared and initialized on the brand new lines 13-15. On those lines we copy from the global memory represented by the array_view objects (a and b), to the tile_static vanilla arrays (locA and locB) – we are copying enough to fit a tile. Because in the code that follows on line 18 we expect the data for this tile to be in the tile_static storage, we need to synchronize the threads within each tile with a barrier, which we do on line 16 (to avoid accessing uninitialized memory on line 18). We also need to synchronize the threads within a tile on line 19, again to avoid the race between lines 14, 15 (retrieving the next set of data for each tile and overwriting the previous set) and line 18 (not being done processing the previous set of data). Luckily, as part of the awesome C++ AMP debugger in Visual Studio there is an option that helps you find such races, but that is a story for another blog post another time. May I suggest reading the next section, and then coming back to re-read and walk through this code with pen and paper to really grok what is going on, if you haven't already? Cool. Why would I introduce this tiling complexity into my code? Funny you should ask that, I was just about to tell you. There is only one reason we tiled our extent, had to deal with finding a good tile size, ensure the number of threads we schedule are correctly divisible with the tile size, had to use a tiled_index instead of a normal index, and had to understand tile_barrier and to figure out where we need to use it, and double the size of our lambda in terms of lines of code: the reason is to be able to use tile_static memory. Why do we want to use tile_static memory? Because accessing tile_static memory is around 10 times faster than accessing the global memory on an accelerator like the GPU, e.g. in the code above, if you can get 150GB/second accessing data from the array_view a, you can get 1500GB/second accessing the tile_static array locA. And since by definition you are dealing with really large data sets, the savings really pay off. We have seen tiled implementations being twice as fast as their non-tiled counterparts. Now, some algorithms will not have performance benefits from tiling (and in fact may deteriorate), e.g. algorithms that require you to go only once to global memory will not benefit from tiling, since with tiling you already have to fetch the data once from global memory! Other algorithms may benefit, but you may decide that you are happy with your code being 150 times faster than the serial-version you had, and you do not need to invest to make it 250 times faster. Also algorithms with more than 3 dimensions, which C++ AMP supports in the non-tiled model, cannot be tiled. Also note that in future releases, we may invest in making the non-tiled model, which already uses tiling under the covers, go the extra step and use tile_static memory on your behalf, but it is obviously way to early to commit to anything like that, and we certainly don't do any of that today. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
6 Facts About GlassFish Announcement

- by Bruno.Borges

Since Oracle announced the end of commercial support for future Oracle GlassFish Server versions, the Java EE world has started wondering what will happen to GlassFish Server Open Source Edition. Unfortunately, there's a lot of misleading information going around. So let me clarify some things with facts, not FUD. Fact #1 - GlassFish Open Source Edition is not dead GlassFish Server Open Source Edition will remain the reference implementation of Java EE. The current trunk is where an implementation for Java EE 8 will flourish, and this will become the future GlassFish 5.0. Calling "GlassFish is dead" does no good to the Java EE ecosystem. The GlassFish Community will remain strong towards the future of Java EE. Without revenue-focused mind, this might actually help the GlassFish community to shape the next version, and set free from any ties with commercial decisions. Fact #2 - OGS support is not over As I said before, GlassFish Server Open Source Edition will continue. Main change is that there will be no more future commercial releases of Oracle GlassFish Server. New and existing OGS 2.1.x and 3.1.x commercial customers will continue to be supported according to the Oracle Lifetime Support Policy. In parallel, I believe there's no other company in the Java EE business that offers commercial support to more than one build of a Java EE application server. This new direction can actually help customers and partners, simplifying decision through commercial negotiations. Fact #3 - WebLogic is not always more expensive than OGS Oracle GlassFish Server ("OGS") is a build of GlassFish Server Open Source Edition bundled with a set of commercial features called GlassFish Server Control and license bundles such as Java SE Support. OGS has at the moment of this writing the pricelist of U$ 5,000 / processor. One information that some bloggers are mentioning is that WebLogic is more expensive than this. Fact 3.1: it is not necessarily the case. The initial edition of WebLogic is called "Standard Edition" and falls into a policy where some “Standard Edition” products are licensed on a per socket basis. As of current pricelist, US$ 10,000 / socket. If you do the math, you will realize that WebLogic SE can actually be significantly more cost effective than OGS, and a customer can save money if running on a CPU with 4 cores or more for example. Quote from the price list: “When licensing Oracle programs with Standard Edition One or Standard Edition in the product name (with the exception of Java SE Support, Java SE Advanced, and Java SE Suite), a processor is counted equivalent to an occupied socket; however, in the case of multi-chip modules, each chip in the multi-chip module is counted as one occupied socket.” For more details speak to your Oracle sales representative - this is clearly at list price and every customer typically has a relationship with Oracle (like they do with other vendors) and different contractual details may apply. And although OGS has always been production-ready for Java EE applications, it is no secret that WebLogic has always been more enterprise, mission critical application server than OGS since BEA. Different editions of WLS provide features and upgrade irons like the WebLogic Diagnostic Framework, Work Managers, Side by Side Deployment, ADF and TopLink bundled license, Web Tier (Oracle HTTP Server) bundled licensed, Fusion Middleware stack support, Oracle DB integration features, Oracle RAC features (such as GridLink), Coherence Management capabilities, Advanced HA (Whole Service Migration and Server Migration), Java Mission Control, Flight Recorder, Oracle JDK support, etc. Fact #4 - There’s no major vendor supporting community builds of Java EE app servers There are no major vendors providing support for community builds of any Open Source application server. For example, IBM used to provide community support for builds of Apache Geronimo, not anymore. Red Hat does not commercially support builds of WildFly and if I remember correctly, never supported community builds of former JBoss AS. Oracle has never commercially supported GlassFish Server Open Source Edition builds. Tomitribe appears to be the exception to the rule, offering commercial support for Apache TomEE. Fact #5 - WebLogic and GlassFish share several Java EE implementations It has been no secret that although GlassFish and WebLogic share some JSR implementations (as stated in the The Aquarium announcement: JPA, JSF, WebSockets, CDI, Bean Validation, JAX-WS, JAXB, and WS-AT) and WebLogic understands GlassFish deployment descriptors, they are not from the same codebase. Fact #6 - WebLogic is not for GlassFish what JBoss EAP is for WildFly WebLogic is closed-source offering. It is commercialized through a license-based plus support fee model. OGS although from an Open Source code, has had the same commercial model as WebLogic. Still, one cannot compare GlassFish/WebLogic to WildFly/JBoss EAP. It is simply not the same case, since Oracle has had two different products from different codebases. The comparison should be limited to GlassFish Open Source / Oracle GlassFish Server versus WildFly / JBoss EAP. But the message now is much clear: Oracle will commercially support only the proprietary product WebLogic, and invest on GlassFish Server Open Source Edition as the reference implementation for the Java EE platform and future Java EE 8, as a developer-friendly community distribution, and encourages community participation through Adopt a JSR and contributions to GlassFish. In comparison Oracle's decision has pretty much the same goal as to when IBM killed support for Websphere Community Edition; and to when Red Hat decided to change the name of JBoss Community Edition to WildFly, simplifying and clarifying marketing message and leaving the commercial field wide open to JBoss EAP only. Oracle can now, as any other vendor has already been doing, focus on only one commercial offer. Some users are saying they will now move to WildFly, but it is important to note that Red Hat does not offer commercial support for WildFly builds. Although the future JBoss EAP versions will come from the same codebase as WildFly, the builds will definitely not be the same, nor sharing 100% of their functionalities and bug fixes. This means there will be no company running a WildFly build in production with support from Red Hat. This discussion has also raised an important and interesting information: Oracle offers a free for developers OTN License for WebLogic. For other environments this is different, but please note this is the same policy Red Hat applies to JBoss EAP, as stated in their download page and terms. Oracle had the same policy for OGS. TL;DR; GlassFish Server Open Source Edition isn’t dead. Current and new OGS 2.x/3.x customers will continue to have support (respecting LSP). WebLogic is not necessarily more expensive than OGS. Oracle will focus on one commercially supported Java EE application server, like other vendors also limit themselves to support one build/product only. Community builds are hardly supported. Commercially supported builds of Open Source products are not exactly from the same codebase as community builds. What's next for GlassFish and the Java EE community? There are conversations in place to tackle some of the community desires, most of them stated by Markus Eisele in his blog post. We will keep you posted.

Read the article
Azure Diagnostics: The Bad, The Ugly, and a Better Way

- by jasont

If you’re a .Net web developer today, no doubt you’ve enjoyed watching Windows Azure grow up over the past couple of years. The platform has scaled, stabilized (mostly), and added on a slew of great (and sometimes overdue) features. What was once just an endpoint to host a solution, developers today have tremendous flexibility and options in the platform. Organizations are building new solutions and offerings on the platform, and others have, or are in the process of, migrating existing applications out of their own data centers into the Azure cloud. Whether new application development or migrating legacy, every development shop and IT organization needs to monitor their applications in the cloud, the same as they do on premises. Azure Diagnostics has some capabilities, but what I constantly hear from users is that it’s either (a) not enough, or (b) too cumbersome to set up. Today, Stackify is happy to announce that we fully support Azure deployments, just the same as your on-premises deployments. Let’s take a look below and compare and contrast the options. Azure Diagnostics Let’s crack open the Windows Azure documentation on Azure Diagnostics and see just how easy it is to use. The high level steps are: Step 1: Import the Diagnostics Oh, I’ve already deployed my app without the diagnostics module. Guess I can’t do anything until I do this and re-deploy. Step 2: Configure the Diagnostics (and multiple sub-steps) Do I want it all? Or just pieces of it? Whoops, forgot to include a specific performance counter, I guess I’ll have to deploy again. Wait a minute… I have to specifically code these performance counters into my role’s OnStart() method, compile and deploy again? And query and consume it myself? Step 3: (Optional) Permanently store diagnostic data Lucky for me, Azure storage has gotten pretty cheap. But how often should I move the data into storage? I want to see real-time data, so I guess that’s out now as well. Step 4: (Optional) View stored diagnostic data Optional? Of course I want to see it. Conveniently, Microsoft recommends 3 tools to do this with. Un-conveniently, none of these are web based and they all just give you access to raw data, and very little charting or real-time intelligence. Just….. data. Nevermind that one product seems to have gotten stale since a recent acquisition, and doesn’t even have screenshots! So, let’s summarize: lots of diagnostics data is available, but think realistically. Think Dev Ops. What happens when you are in the middle of a major production performance issue and you don’t have the diagnostics you need? You are redeploying an application (and thankfully you have a great branching strategy, so you feel perfectly safe just willy-nilly launching code into prod, don’t you?) to get data, then shipping it to storage, and then digging through that data to find a needle in a haystack. Would you like to be able to troubleshoot a performance issue in the middle of the night, or on a weekend, from your iPad or home computer’s web browser? Forget it: the best you get is this spark line in the Azure portal. If it’s real pointy, you probably have an issue; but since there is no alert based on a threshold your customers have likely already let you know. And high CPU, Memory, I/O, or Network doesn’t tell you anything about where the problem is. The Better Way – Stackify Stackify supports application and server monitoring in real time, all through a great web interface. All of the things that Azure Diagnostics provides, Stackify provides for your on-premises deployments, and you don’t need to know ahead of time that you’ll need it. It’s always there, it’s always on. Azure deployments are essentially no different than on-premises. It’s a Windows Server (or Linux) in the cloud. It’s behind a different firewall than your corporate servers. That’s it. Stackify can provide the same powerful tools to your Azure deployments in two simple steps. Step 1 Add a startup task to your web or worker role and deploy. If you can’t deploy and need it right now, no worries! Remote Desktop to the Azure instance and you can execute a Powershell script to download / install Stackify. Step 2 Log in to your account at www.stackify.com and begin monitoring as much as you want, as often as you want and see the results instantly. WMI? It’s there Event Viewer? You’ve got it. File System Access? Yes, please! Would love to make sure my web.config is correct. IIS / App Pool Info? Yep. You can even restart it. Running Services? All of them. Start and Stop them to your heart’s content. SQL Database access? You bet’cha. Alerts and Notification? Of course! You should know before your customers let you know. … and so much more. Conclusion Microsoft has shown, consistently, that they love developers, developers, developers. What every developer needs to realize from this is that they’ve given you a canvas, which is exactly what Azure is. It’s great infrastructure that is readily available, easy to manage, and fairly cost effective. However, the tooling is your responsibility. What you get, at best, is bare bones. App and server diagnostics should be available when you need them. While we, as developers, try to plan for and think of everything ahead of time, there will come times where we need to get data that just isn’t available. And having to go through a lot of cumbersome steps to get that data, and then have to find a friendlier way to consume it…. well, that just doesn’t make a lot of sense to me. I’d rather spend my time writing and developing features and completing bug fixes for my applications, than to be writing code to monitor and diagnose.

Read the article
Benchmarking MySQL Replication with Multi-Threaded Slaves

- by Mat Keep

0 0 1 1145 6530 Homework 54 15 7660 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} The objective of this benchmark is to measure the performance improvement achieved when enabling the Multi-Threaded Slave enhancement delivered as a part MySQL 5.6. As the results demonstrate, Multi-Threaded Slaves delivers 5x higher replication performance based on a configuration with 10 databases/schemas. For real-world deployments, higher replication performance directly translates to: · Improved consistency of reads from slaves (i.e. reduced risk of reading "stale" data) · Reduced risk of data loss should the master fail before replicating all events in its binary log (binlog) The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments. Multi-Threaded Slaves are just one of many enhancements to replication previewed as part of the MySQL 5.6 Development Release, which include: · Global Transaction Identifiers coupled with MySQL utilities for automatic failover / switchover and slave promotion · Crash Safe Slaves and Binlog · Optimized Row Based Replication · Replication Event Checksums · Time Delayed Replication These and many more are discussed in the “MySQL 5.6 Replication: Enabling the Next Generation of Web & Cloud Services” Developer Zone article Back to the benchmark - details are as follows. Environment The test environment consisted of two Linux servers: · one running the replication master · one running the replication slave. Only the slave was involved in the actual measurements, and was based on the following configuration: - Hardware: Oracle Sun Fire X4170 M2 Server - CPU: 2 sockets, 6 cores with hyper-threading, 2930 MHz. - OS: 64-bit Oracle Enterprise Linux 6.1 - Memory: 48 GB Test Procedure Initial Setup: Two MySQL servers were started on two different hosts, configured as replication master and slave. 10 sysbench schemas were created, each with a single table: CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k` (`k`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 10,000 rows were inserted in each of the 10 tables, for a total of 100,000 rows. When the inserts had replicated to the slave, the slave threads were stopped. The slave data directory was copied to a backup location and the slave threads position in the master binlog noted. 10 sysbench clients, each configured with 10 threads, were spawned at the same time to generate a random schema load against each of the 10 schemas on the master. Each sysbench client executed 10,000 "update key" statements: UPDATE sbtest set k=k+1 WHERE id = <random row> In total, this generated 100,000 update statements to later replicate during the test itself. Test Methodology: The number of slave workers to test with was configured using: SET GLOBAL slave_parallel_workers=<workers> Then the slave IO thread was started and the test waited for all the update queries to be copied over to the relay log on the slave. The benchmark clock was started and then the slave SQL thread was started. The test waited for the slave SQL thread to finish executing the 100k update queries, doing "select master_pos_wait()". When master_pos_wait() returned, the benchmark clock was stopped and the duration calculated. The calculated duration from the benchmark clock should be close to the time it took for the SQL thread to execute the 100,000 update queries. The 100k queries divided by this duration gave the benchmark metric, reported as Queries Per Second (QPS). Test Reset: The test-reset cycle was implemented as follows: · the slave was stopped · the slave data directory replaced with the previous backup · the slave restarted with the slave threads replication pointer repositioned to the point before the update queries in the binlog. The test could then be repeated with identical set of queries but a different number of slave worker threads, enabling a fair comparison. The Test-Reset cycle was repeated 3 times for 0-24 number of workers and the QPS metric calculated and averaged for each worker count. MySQL Configuration The relevant configuration settings used for MySQL are as follows: binlog-format=STATEMENT relay-log-info-repository=TABLE master-info-repository=TABLE As described in the test procedure, the slave_parallel_workers setting was modified as part of the test logic. The consequence of changing this setting is: 0 worker threads: - current (i.e. single threaded) sequential mode - 1 x IO thread and 1 x SQL thread - SQL thread both reads and executes the events 1 worker thread: - sequential mode - 1 x IO thread, 1 x Coordinator SQL thread and 1 x Worker thread - coordinator reads the event and hands it to the worker who executes 2+ worker threads: - parallel execution - 1 x IO thread, 1 x Coordinator SQL thread and 2+ Worker threads - coordinator reads events and hands them to the workers who execute them Results Figure 1 below shows that Multi-Threaded Slaves deliver ~5x higher replication performance when configured with 10 worker threads, with the load evenly distributed across our 10 x schemas. This result is compared to the current replication implementation which is based on a single SQL thread only (i.e. zero worker threads). Figure 1: 5x Higher Performance with Multi-Threaded Slaves The following figure shows more detailed results, with QPS sampled and reported as the worker threads are incremented. The raw numbers behind this graph are reported in the Appendix section of this post. Figure 2: Detailed Results As the results above show, the configuration does not scale noticably from 5 to 9 worker threads. When configured with 10 worker threads however, scalability increases significantly. The conclusion therefore is that it is desirable to configure the same number of worker threads as schemas. Other conclusions from the results: · Running with 1 worker compared to zero workers just introduces overhead without the benefit of parallel execution. · As expected, having more workers than schemas adds no visible benefit. Aside from what is shown in the results above, testing also demonstrated that the following settings had a very positive effect on slave performance: relay-log-info-repository=TABLE master-info-repository=TABLE For 5+ workers, it was up to 2.3 times as fast to run with TABLE compared to FILE. Conclusion As the results demonstrate, Multi-Threaded Slaves deliver significant performance increases to MySQL replication when handling multiple schemas. This, and the other replication enhancements introduced in MySQL 5.6 are fully available for you to download and evaluate now from the MySQL Developer site (select Development Release tab). You can learn more about MySQL 5.6 from the documentation Please don’t hesitate to comment on this or other replication blogs with feedback and questions. Appendix – Detailed Results

Read the article
When is a Seek not a Seek?

- by Paul White

The following script creates a single-column clustered table containing the integers from 1 to 1,000 inclusive. IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test ; GO CREATE TABLE #Test ( id INTEGER PRIMARY KEY CLUSTERED ); ; INSERT #Test (id) SELECT V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 1000 ; Let’s say we need to find the rows with values from 100 to 170, excluding any values that divide exactly by 10. One way to write that query would be: SELECT T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; That query produces a pretty efficient-looking query plan: Knowing that the source column is defined as an INTEGER, we could also express the query this way: SELECT T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; We get a similar-looking plan: If you look closely, you might notice that the line connecting the two icons is a little thinner than before. The first query is estimated to produce 61.9167 rows – very close to the 63 rows we know the query will return. The second query presents a tougher challenge for SQL Server because it doesn’t know how to predict the selectivity of the modulo expression (T.id % 10 > 0). Without that last line, the second query is estimated to produce 68.1667 rows – a slight overestimate. Adding the opaque modulo expression results in SQL Server guessing at the selectivity. As you may know, the selectivity guess for a greater-than operation is 30%, so the final estimate is 30% of 68.1667, which comes to 20.45 rows. The second difference is that the Clustered Index Seek is costed at 99% of the estimated total for the statement. For some reason, the final SELECT operator is assigned a small cost of 0.0000484 units; I have absolutely no idea why this is so, or what it models. Nevertheless, we can compare the total cost for both queries: the first one comes in at 0.0033501 units, and the second at 0.0034054. The important point is that the second query is costed very slightly higher than the first, even though it is expected to produce many fewer rows (20.45 versus 61.9167). If you run the two queries, they produce exactly the same results, and both complete so quickly that it is impossible to measure CPU usage for a single execution. We can, however, compare the I/O statistics for a single run by running the queries with STATISTICS IO ON: Table '#Test'. Scan count 63, logical reads 126, physical reads 0. Table '#Test'. Scan count 01, logical reads 002, physical reads 0. The query with the IN list uses 126 logical reads (and has a ‘scan count’ of 63), while the second query form completes with just 2 logical reads (and a ‘scan count’ of 1). It is no coincidence that 126 = 63 * 2, by the way. It is almost as if the first query is doing 63 seeks, compared to one for the second query. In fact, that is exactly what it is doing. There is no indication of this in the graphical plan, or the tool-tip that appears when you hover your mouse over the Clustered Index Seek icon. To see the 63 seek operations, you have click on the Seek icon and look in the Properties window (press F4, or right-click and choose from the menu): The Seek Predicates list shows a total of 63 seek operations – one for each of the values from the IN list contained in the first query. I have expanded the first seek node to show the details; it is seeking down the clustered index to find the entry with the value 101. Each of the other 62 nodes expands similarly, and the same information is contained (even more verbosely) in the XML form of the plan. Each of the 63 seek operations starts at the root of the clustered index B-tree and navigates down to the leaf page that contains the sought key value. Our table is just large enough to need a separate root page, so each seek incurs 2 logical reads (one for the root, and one for the leaf). We can see the index depth using the INDEXPROPERTY function, or by using the a DMV: SELECT S.index_type_desc, S.index_depth FROM sys.dm_db_index_physical_stats ( DB_ID(N'tempdb'), OBJECT_ID(N'tempdb..#Test', N'U'), 1, 1, DEFAULT ) AS S ; Let’s look now at the Properties window when the Clustered Index Seek from the second query is selected: There is just one seek operation, which starts at the root of the index and navigates the B-tree looking for the first key that matches the Start range condition (id >= 101). It then continues to read records at the leaf level of the index (following links between leaf-level pages if necessary) until it finds a row that does not meet the End range condition (id <= 169). Every row that meets the seek range condition is also tested against the Residual Predicate highlighted above (id % 10 > 0), and is only returned if it matches that as well. You will not be surprised that the single seek (with a range scan and residual predicate) is much more efficient than 63 singleton seeks. It is not 63 times more efficient (as the logical reads comparison would suggest), but it is around three times faster. Let’s run both query forms 10,000 times and measure the elapsed time: DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON; SET STATISTICS XML OFF; ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; GO DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; On my laptop, running SQL Server 2008 build 4272 (SP2 CU2), the IN form of the query takes around 830ms and the range query about 300ms. The main point of this post is not performance, however – it is meant as an introduction to the next few parts in this mini-series that will continue to explore scans and seeks in detail. When is a seek not a seek? When it is 63 seeks © Paul White 2011 email: [email protected] twitter: @SQL_kiwi

Read the article
HTG Explains: Why Does Rebooting a Computer Fix So Many Problems?

- by Chris Hoffman

Ask a geek how to fix a problem you’ve having with your Windows computer and they’ll likely ask “Have you tried rebooting it?” This seems like a flippant response, but rebooting a computer can actually solve many problems. So what’s going on here? Why does resetting a device or restarting a program fix so many problems? And why don’t geeks try to identify and fix problems rather than use the blunt hammer of “reset it”? This Isn’t Just About Windows Bear in mind that this soltion isn’t just limited to Windows computers, but applies to all types of computing devices. You’ll find the advice “try resetting it” applied to wireless routers, iPads, Android phones, and more. This same advice even applies to software — is Firefox acting slow and consuming a lot of memory? Try closing it and reopening it! Some Problems Require a Restart To illustrate why rebooting can fix so many problems, let’s take a look at the ultimate software problem a Windows computer can face: Windows halts, showing a blue screen of death. The blue screen was caused by a low-level error, likely a problem with a hardware driver or a hardware malfunction. Windows reaches a state where it doesn’t know how to recover, so it halts, shows a blue-screen of death, gathers information about the problem, and automatically restarts the computer for you . This restart fixes the blue screen of death. Windows has gotten better at dealing with errors — for example, if your graphics driver crashes, Windows XP would have frozen. In Windows Vista and newer versions of Windows, the Windows desktop will lose its fancy graphical effects for a few moments before regaining them. Behind the scenes, Windows is restarting the malfunctioning graphics driver. But why doesn’t Windows simply fix the problem rather than restarting the driver or the computer itself? Well, because it can’t — the code has encountered a problem and stopped working completely, so there’s no way for it to continue. By restarting, the code can start from square one and hopefully it won’t encounter the same problem again. Examples of Restarting Fixing Problems While certain problems require a complete restart because the operating system or a hardware driver has stopped working, not every problem does. Some problems may be fixable without a restart, though a restart may be the easiest option. Windows is Slow: Let’s say Windows is running very slowly. It’s possible that a misbehaving program is using 99% CPU and draining the computer’s resources. A geek could head to the task manager and look around, hoping to locate the misbehaving process an end it. If an average user encountered this same problem, they could simply reboot their computer to fix it rather than dig through their running processes. Firefox or Another Program is Using Too Much Memory: In the past, Firefox has been the poster child for memory leaks on average PCs. Over time, Firefox would often consume more and more memory, getting larger and larger and slowing down. Closing Firefox will cause it to relinquish all of its memory. When it starts again, it will start from a clean state without any leaked memory. This doesn’t just apply to Firefox, but applies to any software with memory leaks. Internet or Wi-Fi Network Problems: If you have a problem with your Wi-Fi or Internet connection, the software on your router or modem may have encountered a problem. Resetting the router — just by unplugging it from its power socket and then plugging it back in — is a common solution for connection problems. In all cases, a restart wipes away the current state of the software . Any code that’s stuck in a misbehaving state will be swept away, too. When you restart, the computer or device will bring the system up from scratch, restarting all the software from square one so it will work just as well as it was working before. “Soft Resets” vs. “Hard Resets” In the mobile device world, there are two types of “resets” you can perform. A “soft reset” is simply restarting a device normally — turning it off and then on again. A “hard reset” is resetting its software state back to its factory default state. When you think about it, both types of resets fix problems for a similar reason. For example, let’s say your Windows computer refuses to boot or becomes completely infected with malware. Simply restarting the computer won’t fix the problem, as the problem is with the files on the computer’s hard drive — it has corrupted files or malware that loads at startup on its hard drive. However, reinstalling Windows (performing a “Refresh or Reset your PC” operation in Windows 8 terms) will wipe away everything on the computer’s hard drive, restoring it to its formerly clean state. This is simpler than looking through the computer’s hard drive, trying to identify the exact reason for the problems or trying to ensure you’ve obliterated every last trace of malware. It’s much faster to simply start over from a known-good, clean state instead of trying to locate every possible problem and fix it. Ultimately, the answer is that “resetting a computer wipes away the current state of the software, including any problems that have developed, and allows it to start over from square one.” It’s easier and faster to start from a clean state than identify and fix any problems that may be occurring — in fact, in some cases, it may be impossible to fix problems without beginning from that clean state. Image Credit: Arria Belli on Flickr, DeclanTM on Flickr

Read the article

Search Results

Search found 5572 results on 223 pages for 'cpu'.

Page 215/223 | < Previous Page | 211 212 213 214 215 216 217 218 219 220 221 222 | Next Page >

- by mv

- by Matthew Guay

- by BuckWoody

- by Los Frijoles

- by KeithMayer

- by ScottGu

- by Testas

- by Brian

- by RoyOsherove

- by Mat Keep

- by Dave

- by Pinal Dave

- by Ferhat Hatay

- by Sc0rian

- by danx

- by Pinal Dave

- by Mike

- by Reed

- by Daniel Moth

- by Bruno.Borges

- by jasont

- by Mat Keep

- by Paul White

- by Chris Hoffman

< Previous Page | 211 212 213 214 215 216 217 218 219 220 221 222 | Next Page >