Search Results

Search found 7538 results on 302 pages for 'parallel processing'.

Page 114/302 | < Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >

Which of these studies would benefit a CS student the most? [closed]

- by user1265125

Which of these extra-curricular studies would benefit a CS student the most? Algorithms Advanced OS programming Image processing Computer graphics Open source development Practicing on TopCoder or Codechef Something else? I realize the decision can be influenced by a number of factors, such as personal preference, what's currently hot in the jobs market, and what is likely to be in demand more in the future, however I would like to ask more experienced programmers which one(s) of these would be most beneficial to learn alongside all the required CS academics.

Read the article
I have data that sends in "bursts" of 100 records with a significant delay. How do I structure my classes for multithreading?

- by makerofthings7

My datasource sends information in 100 batches of 100 records with a delay of 1 to 3 seconds between batches. I would like to start processing data as soon as it's received, but I'm not sure how to best approach this. Some ideas I've been playing with include: yield Concurrent Dictionary ConcurrentDictionary with INotifyProperyChanged Events etc. As you can see I'm all over the place, and would appreciate some tested guidance on how to approach this

Read the article
Defining the CMMS Need

Does your business rise or fall on the way the floors are mopped and the time of day that it happens? Probably not. But if you are in an industry such as food and beverage processing, maintenance is ... [Author: Ashley Combs - Computers and Internet - April 26, 2010]

Read the article
Review: AbiWord: The Underappreciated Word Processor

OpenOffice.org isn't the only game in town for open source word processing. One of the best, and underexposed, open word processors is AbiWord.

Read the article
AbiWord: The Underappreciated Word Processor

<b>Linux Planet:</b> "OpenOffice.org isn't the only game in town for open source word processing. One of the best, and underexposed, open word processors is AbiWord."

Read the article
Setting up a Complete Django E-commerce store in 30 minutes

<b>Packt:</b> "In order to demonstrate Django's rapid development potential, we will begin by constructing a simple, but fully-featured, e-commerce store. The goal is to be up and running with a product catalog and products for sale, including a simple payment processing interface, in about half-an-hour."

Read the article
SEO Company?

SEO is an area where tactics change rapidly, most of which are rooted in the way of processing searches by major search engines. Surpassing the opponents is difficult and necessitates more than just normal optimization skills.

Read the article
dbus dependency with yum

- by Hengjie

Whenever, I try and run yum update I get the following error: [root@server ~]# yum update Loaded plugins: dellsysid, fastestmirror Loading mirror speeds from cached hostfile * base: mirror01.idc.hinet.net * extras: mirror01.idc.hinet.net * rpmforge: fr2.rpmfind.net * updates: mirror01.idc.hinet.net Excluding Packages in global exclude list Finished Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package NetworkManager.x86_64 1:0.7.0-13.el5 set to be updated ---> Package NetworkManager-glib.x86_64 1:0.7.0-13.el5 set to be updated ---> Package SysVinit.x86_64 0:2.86-17.el5 set to be updated ---> Package acl.x86_64 0:2.2.39-8.el5 set to be updated ---> Package acpid.x86_64 0:1.0.4-12.el5 set to be updated ---> Package apr.x86_64 0:1.2.7-11.el5_6.5 set to be updated ---> Package aspell.x86_64 12:0.60.3-12 set to be updated ---> Package audit.x86_64 0:1.8-2.el5 set to be updated ---> Package audit-libs.x86_64 0:1.8-2.el5 set to be updated ---> Package audit-libs-python.x86_64 0:1.8-2.el5 set to be updated ---> Package authconfig.x86_64 0:5.3.21-7.el5 set to be updated ---> Package autofs.x86_64 1:5.0.1-0.rc2.163.el5 set to be updated ---> Package bash.x86_64 0:3.2-32.el5 set to be updated ---> Package bind.x86_64 30:9.3.6-20.P1.el5 set to be updated ---> Package bind-libs.x86_64 30:9.3.6-20.P1.el5 set to be updated ---> Package bind-utils.x86_64 30:9.3.6-20.P1.el5 set to be updated ---> Package binutils.x86_64 0:2.17.50.0.6-20.el5 set to be updated ---> Package centos-release.x86_64 10:5-8.el5.centos set to be updated ---> Package centos-release-notes.x86_64 0:5.8-0 set to be updated ---> Package coreutils.x86_64 0:5.97-34.el5_8.1 set to be updated ---> Package cpp.x86_64 0:4.1.2-52.el5 set to be updated ---> Package cpuspeed.x86_64 1:1.2.1-10.el5 set to be updated ---> Package crash.x86_64 0:5.1.8-1.el5.centos set to be updated ---> Package cryptsetup-luks.x86_64 0:1.0.3-8.el5 set to be updated ---> Package cups.x86_64 1:1.3.7-30.el5 set to be updated ---> Package cups-libs.x86_64 1:1.3.7-30.el5 set to be updated ---> Package curl.x86_64 0:7.15.5-15.el5 set to be updated --> Processing Dependency: dbus = 1.1.2-15.el5_6 for package: dbus-libs ---> Package dbus.x86_64 0:1.1.2-16.el5_7 set to be updated ---> Package dbus-libs.x86_64 0:1.1.2-16.el5_7 set to be updated ---> Package device-mapper.x86_64 0:1.02.67-2.el5 set to be updated ---> Package device-mapper-event.x86_64 0:1.02.67-2.el5 set to be updated ---> Package device-mapper-multipath.x86_64 0:0.4.7-48.el5_8.1 set to be updated ---> Package dhclient.x86_64 12:3.0.5-31.el5 set to be updated ---> Package dmidecode.x86_64 1:2.11-1.el5 set to be updated ---> Package dmraid.x86_64 0:1.0.0.rc13-65.el5 set to be updated ---> Package dmraid-events.x86_64 0:1.0.0.rc13-65.el5 set to be updated ---> Package dump.x86_64 0:0.4b41-6.el5 set to be updated ---> Package e2fsprogs.x86_64 0:1.39-33.el5 set to be updated ---> Package e2fsprogs-devel.x86_64 0:1.39-33.el5 set to be updated ---> Package e2fsprogs-libs.x86_64 0:1.39-33.el5 set to be updated ---> Package ecryptfs-utils.x86_64 0:75-8.el5 set to be updated ---> Package file.x86_64 0:4.17-21 set to be updated ---> Package finger.x86_64 0:0.17-33 set to be updated ---> Package firstboot-tui.x86_64 0:1.4.27.9-1.el5.centos set to be updated ---> Package freetype.x86_64 0:2.2.1-28.el5_7.2 set to be updated ---> Package freetype-devel.x86_64 0:2.2.1-28.el5_7.2 set to be updated ---> Package ftp.x86_64 0:0.17-37.el5 set to be updated ---> Package gamin.x86_64 0:0.1.7-10.el5 set to be updated ---> Package gamin-python.x86_64 0:0.1.7-10.el5 set to be updated ---> Package gawk.x86_64 0:3.1.5-15.el5 set to be updated ---> Package gcc.x86_64 0:4.1.2-52.el5 set to be updated ---> Package gcc-c++.x86_64 0:4.1.2-52.el5 set to be updated ---> Package glibc.i686 0:2.5-81.el5_8.1 set to be updated ---> Package glibc.x86_64 0:2.5-81.el5_8.1 set to be updated ---> Package glibc-common.x86_64 0:2.5-81.el5_8.1 set to be updated ---> Package glibc-devel.x86_64 0:2.5-81.el5_8.1 set to be updated ---> Package glibc-headers.x86_64 0:2.5-81.el5_8.1 set to be updated ---> Package gnutls.x86_64 0:1.4.1-7.el5_8.2 set to be updated ---> Package groff.x86_64 0:1.18.1.1-13.el5 set to be updated ---> Package gtk2.x86_64 0:2.10.4-21.el5_7.7 set to be updated ---> Package gzip.x86_64 0:1.3.5-13.el5.centos set to be updated ---> Package hmaccalc.x86_64 0:0.9.6-4.el5 set to be updated ---> Package htop.x86_64 0:1.0.1-2.el5.rf set to be updated ---> Package hwdata.noarch 0:0.213.26-1.el5 set to be updated ---> Package ifd-egate.x86_64 0:0.05-17.el5 set to be updated ---> Package initscripts.x86_64 0:8.45.42-1.el5.centos set to be updated ---> Package iproute.x86_64 0:2.6.18-13.el5 set to be updated ---> Package iptables.x86_64 0:1.3.5-9.1.el5 set to be updated ---> Package iptables-ipv6.x86_64 0:1.3.5-9.1.el5 set to be updated ---> Package iscsi-initiator-utils.x86_64 0:6.2.0.872-13.el5 set to be updated ---> Package kernel.x86_64 0:2.6.18-308.1.1.el5 set to be installed ---> Package kernel-headers.x86_64 0:2.6.18-308.1.1.el5 set to be updated ---> Package kpartx.x86_64 0:0.4.7-48.el5_8.1 set to be updated ---> Package krb5-devel.x86_64 0:1.6.1-70.el5 set to be updated ---> Package krb5-libs.x86_64 0:1.6.1-70.el5 set to be updated ---> Package krb5-workstation.x86_64 0:1.6.1-70.el5 set to be updated ---> Package ksh.x86_64 0:20100621-5.el5_8.1 set to be updated ---> Package kudzu.x86_64 0:1.2.57.1.26-3.el5.centos set to be updated ---> Package less.x86_64 0:436-9.el5 set to be updated ---> Package lftp.x86_64 0:3.7.11-7.el5 set to be updated ---> Package libX11.x86_64 0:1.0.3-11.el5_7.1 set to be updated ---> Package libX11-devel.x86_64 0:1.0.3-11.el5_7.1 set to be updated ---> Package libXcursor.x86_64 0:1.1.7-1.2 set to be updated ---> Package libacl.x86_64 0:2.2.39-8.el5 set to be updated ---> Package libgcc.x86_64 0:4.1.2-52.el5 set to be updated ---> Package libgomp.x86_64 0:4.4.6-3.el5.1 set to be updated ---> Package libpng.x86_64 2:1.2.10-16.el5_8 set to be updated ---> Package libpng-devel.x86_64 2:1.2.10-16.el5_8 set to be updated ---> Package libsmbios.x86_64 0:2.2.27-3.2.el5 set to be updated ---> Package libstdc++.x86_64 0:4.1.2-52.el5 set to be updated ---> Package libstdc++-devel.x86_64 0:4.1.2-52.el5 set to be updated ---> Package libsysfs.x86_64 0:2.1.0-1.el5 set to be updated ---> Package libusb.x86_64 0:0.1.12-6.el5 set to be updated ---> Package libvolume_id.x86_64 0:095-14.27.el5_7.1 set to be updated ---> Package libxml2.x86_64 0:2.6.26-2.1.15.el5_8.2 set to be updated ---> Package libxml2-python.x86_64 0:2.6.26-2.1.15.el5_8.2 set to be updated ---> Package logrotate.x86_64 0:3.7.4-12 set to be updated ---> Package lsof.x86_64 0:4.78-6 set to be updated ---> Package lvm2.x86_64 0:2.02.88-7.el5 set to be updated ---> Package m2crypto.x86_64 0:0.16-8.el5 set to be updated ---> Package man.x86_64 0:1.6d-2.el5 set to be updated ---> Package man-pages.noarch 0:2.39-20.el5 set to be updated ---> Package mcelog.x86_64 1:0.9pre-1.32.el5 set to be updated ---> Package mesa-libGL.x86_64 0:6.5.1-7.10.el5 set to be updated ---> Package mesa-libGL-devel.x86_64 0:6.5.1-7.10.el5 set to be updated ---> Package microcode_ctl.x86_64 2:1.17-1.56.el5 set to be updated ---> Package mkinitrd.x86_64 0:5.1.19.6-75.el5 set to be updated ---> Package mktemp.x86_64 3:1.5-24.el5 set to be updated --> Processing Dependency: nash = 5.1.19.6-68.el5_6.1 for package: mkinitrd ---> Package nash.x86_64 0:5.1.19.6-75.el5 set to be updated ---> Package net-snmp.x86_64 1:5.3.2.2-17.el5 set to be updated ---> Package net-snmp-devel.x86_64 1:5.3.2.2-17.el5 set to be updated ---> Package net-snmp-libs.x86_64 1:5.3.2.2-17.el5 set to be updated ---> Package net-snmp-utils.x86_64 1:5.3.2.2-17.el5 set to be updated ---> Package net-tools.x86_64 0:1.60-82.el5 set to be updated ---> Package nfs-utils.x86_64 1:1.0.9-60.el5 set to be updated ---> Package nfs-utils-lib.x86_64 0:1.0.8-7.9.el5 set to be updated ---> Package nscd.x86_64 0:2.5-81.el5_8.1 set to be updated ---> Package nspr.x86_64 0:4.8.9-1.el5_8 set to be updated ---> Package nspr-devel.x86_64 0:4.8.9-1.el5_8 set to be updated ---> Package nss.x86_64 0:3.13.1-5.el5_8 set to be updated ---> Package nss-devel.x86_64 0:3.13.1-5.el5_8 set to be updated ---> Package nss-tools.x86_64 0:3.13.1-5.el5_8 set to be updated ---> Package nss_ldap.x86_64 0:253-49.el5 set to be updated ---> Package ntp.x86_64 0:4.2.2p1-15.el5.centos.1 set to be updated ---> Package numactl.x86_64 0:0.9.8-12.el5_6 set to be updated ---> Package oddjob.x86_64 0:0.27-12.el5 set to be updated ---> Package oddjob-libs.x86_64 0:0.27-12.el5 set to be updated ---> Package openldap.x86_64 0:2.3.43-25.el5 set to be updated ---> Package openssh.x86_64 0:4.3p2-82.el5 set to be updated ---> Package openssh-clients.x86_64 0:4.3p2-82.el5 set to be updated ---> Package openssh-server.x86_64 0:4.3p2-82.el5 set to be updated ---> Package openssl.i686 0:0.9.8e-22.el5_8.1 set to be updated ---> Package openssl.x86_64 0:0.9.8e-22.el5_8.1 set to be updated ---> Package openssl-devel.x86_64 0:0.9.8e-22.el5_8.1 set to be updated ---> Package pam_krb5.x86_64 0:2.2.14-22.el5 set to be updated ---> Package pam_pkcs11.x86_64 0:0.5.3-26.el5 set to be updated ---> Package pango.x86_64 0:1.14.9-8.el5.centos.3 set to be updated ---> Package parted.x86_64 0:1.8.1-29.el5 set to be updated ---> Package pciutils.x86_64 0:3.1.7-5.el5 set to be updated ---> Package perl.x86_64 4:5.8.8-38.el5 set to be updated ---> Package perl-Compress-Raw-Bzip2.x86_64 0:2.037-1.el5.rf set to be updated ---> Package perl-Compress-Raw-Zlib.x86_64 0:2.037-1.el5.rf set to be updated ---> Package perl-rrdtool.x86_64 0:1.4.7-1.el5.rf set to be updated ---> Package poppler.x86_64 0:0.5.4-19.el5 set to be updated ---> Package poppler-utils.x86_64 0:0.5.4-19.el5 set to be updated ---> Package popt.x86_64 0:1.10.2.3-28.el5_8 set to be updated ---> Package postgresql-libs.x86_64 0:8.1.23-1.el5_7.3 set to be updated ---> Package procps.x86_64 0:3.2.7-18.el5 set to be updated ---> Package proftpd.x86_64 0:1.3.4a-1.el5.rf set to be updated --> Processing Dependency: perl(Mail::Sendmail) for package: proftpd ---> Package python.x86_64 0:2.4.3-46.el5 set to be updated ---> Package python-ctypes.x86_64 0:1.0.2-3.el5 set to be updated ---> Package python-libs.x86_64 0:2.4.3-46.el5 set to be updated ---> Package python-smbios.x86_64 0:2.2.27-3.2.el5 set to be updated ---> Package rhpl.x86_64 0:0.194.1-2 set to be updated ---> Package rmt.x86_64 0:0.4b41-6.el5 set to be updated ---> Package rng-utils.x86_64 1:2.0-5.el5 set to be updated ---> Package rpm.x86_64 0:4.4.2.3-28.el5_8 set to be updated ---> Package rpm-build.x86_64 0:4.4.2.3-28.el5_8 set to be updated ---> Package rpm-devel.x86_64 0:4.4.2.3-28.el5_8 set to be updated ---> Package rpm-libs.x86_64 0:4.4.2.3-28.el5_8 set to be updated ---> Package rpm-python.x86_64 0:4.4.2.3-28.el5_8 set to be updated ---> Package rrdtool.x86_64 0:1.4.7-1.el5.rf set to be updated ---> Package rsh.x86_64 0:0.17-40.el5_7.1 set to be updated ---> Package rsync.x86_64 0:3.0.6-4.el5_7.1 set to be updated ---> Package ruby.x86_64 0:1.8.5-24.el5 set to be updated ---> Package ruby-libs.x86_64 0:1.8.5-24.el5 set to be updated ---> Package sblim-sfcb.x86_64 0:1.3.11-49.el5 set to be updated ---> Package sblim-sfcc.x86_64 0:2.2.2-49.el5 set to be updated ---> Package selinux-policy.noarch 0:2.4.6-327.el5 set to be updated ---> Package selinux-policy-targeted.noarch 0:2.4.6-327.el5 set to be updated ---> Package setup.noarch 0:2.5.58-9.el5 set to be updated ---> Package shadow-utils.x86_64 2:4.0.17-20.el5 set to be updated ---> Package smartmontools.x86_64 1:5.38-3.el5 set to be updated ---> Package smbios-utils-bin.x86_64 0:2.2.27-3.2.el5 set to be updated ---> Package smbios-utils-python.x86_64 0:2.2.27-3.2.el5 set to be updated ---> Package sos.noarch 0:1.7-9.62.el5 set to be updated ---> Package srvadmin-omilcore.x86_64 0:6.5.0-1.452.1.el5 set to be updated ---> Package strace.x86_64 0:4.5.18-11.el5_8 set to be updated ---> Package subversion.x86_64 0:1.6.11-7.el5_6.4 set to be updated ---> Package sudo.x86_64 0:1.7.2p1-13.el5 set to be updated ---> Package sysfsutils.x86_64 0:2.1.0-1.el5 set to be updated ---> Package syslinux.x86_64 0:3.11-7 set to be updated ---> Package system-config-network-tui.noarch 0:1.3.99.21-1.el5 set to be updated ---> Package talk.x86_64 0:0.17-31.el5 set to be updated ---> Package tar.x86_64 2:1.15.1-31.el5 set to be updated ---> Package traceroute.x86_64 3:2.0.1-6.el5 set to be updated ---> Package tzdata.x86_64 0:2012b-3.el5 set to be updated ---> Package udev.x86_64 0:095-14.27.el5_7.1 set to be updated ---> Package util-linux.x86_64 0:2.13-0.59.el5 set to be updated ---> Package vixie-cron.x86_64 4:4.1-81.el5 set to be updated ---> Package wget.x86_64 0:1.11.4-3.el5_8.1 set to be updated ---> Package xinetd.x86_64 2:2.3.14-16.el5 set to be updated ---> Package yp-tools.x86_64 0:2.9-2.el5 set to be updated ---> Package ypbind.x86_64 3:1.19-12.el5_6.1 set to be updated ---> Package yum.noarch 0:3.2.22-39.el5.centos set to be updated ---> Package yum-dellsysid.x86_64 0:2.2.27-3.2.el5 set to be updated ---> Package yum-fastestmirror.noarch 0:1.1.16-21.el5.centos set to be updated ---> Package zlib.x86_64 0:1.2.3-4.el5 set to be updated ---> Package zlib-devel.x86_64 0:1.2.3-4.el5 set to be updated --> Running transaction check --> Processing Dependency: dbus = 1.1.2-15.el5_6 for package: dbus-libs --> Processing Dependency: nash = 5.1.19.6-68.el5_6.1 for package: mkinitrd ---> Package perl-Mail-Sendmail.noarch 0:0.79-1.2.el5.rf set to be updated base/filelists | 3.5 MB 00:00 dell-omsa-indep/filelists | 195 kB 00:01 dell-omsa-specific/filelists | 1.0 kB 00:00 extras/filelists_db | 224 kB 00:00 rpmforge/filelists | 4.8 MB 00:06 updates/filelists_db | 715 kB 00:00 --> Finished Dependency Resolution dbus-libs-1.1.2-15.el5_6.i386 from installed has depsolving problems --> Missing Dependency: dbus = 1.1.2-15.el5_6 is needed by package dbus-libs-1.1.2-15.el5_6.i386 (installed) mkinitrd-5.1.19.6-68.el5_6.1.i386 from installed has depsolving problems --> Missing Dependency: nash = 5.1.19.6-68.el5_6.1 is needed by package mkinitrd-5.1.19.6-68.el5_6.1.i386 (installed) Error: Missing Dependency: nash = 5.1.19.6-68.el5_6.1 is needed by package mkinitrd-5.1.19.6-68.el5_6.1.i386 (installed) Error: Missing Dependency: dbus = 1.1.2-15.el5_6 is needed by package dbus-libs-1.1.2-15.el5_6.i386 (installed) You could try using --skip-broken to work around the problem You could try running: package-cleanup --problems package-cleanup --dupes rpm -Va --nofiles --nodigest The program package-cleanup is found in the yum-utils package. I have tried running package-cleanup --dupes and package-cleanup --problems but to no avail.

Read the article
GPU Debugging with VS 11

- by Daniel Moth

With VS 11 Developer Preview we have invested tremendously in parallel debugging for both CPU (managed and native) and GPU debugging. I'll be doing a whole bunch of blog posts on those topics, and in this post I just wanted to get people started with GPU debugging, i.e. with debugging C++ AMP code. First I invite you to watch 6 minutes of a glimpse of the C++ AMP debugging experience though this video (ffw to minute 51:54, up until minute 59:16). Don't read the rest of this post, just go watch that video, ideally download the High Quality WMV. Summary GPU debugging essentially means debugging the lambda that you pass to the parallel_for_each call (plus any functions you call from the lambda, of course). CPU debugging means debugging all the code above and below the parallel_for_each call, i.e. all the code except the restrict(direct3d) lambda and the functions that it calls. With VS 11 you have to choose what debugger you want to use for a particular debugging session, CPU or GPU. So you can place breakpoints all over your code, then choose what debugger you want (CPU or GPU), and you'll only be able to hit breakpoints for the code type that the debugger engine understands – the remaining breakpoints will appear as unbound. If you want to hit the unbound breakpoints, you'd have to stop debugging, and start again with the other debugger. Sorry. We suck. We know. But once you are past that limitation, I think you'll find the experience truly rewarding – seriously! Switching debugger engines With the Developer Preview bits, one way to switch the debugger engine is through the project properties – see the screenshots that follow. This one is showing the CPU option selected, which is basically the default that you are all familiar with: This screenshot is showing the GPU option selected, by changing the debugger launcher (notice that this applies for both the local and remote case): You actually do not have to open the project properties just for switching the debugger engine, you can switch the selection from the toolbar in VS 11 Developer Preview too – see following screenshot (the effect is the same as if you opened the project properties and switched there) Breakpoint behavior Here are two screenshots, one showing a debugging session for CPU and the other a debugging session for GPU (notice the unbound breakpoints in each case) …and here is the GPU case (where we cannot bind the CPU breakpoints but can the GPU breakpoint, which is actually hit) Give C++ AMP debugging a try So to debug your C++ AMP code, pull down the drop down under the 'play' button to select the 'GPU C++ Direct3D Compute Debugger' menu option, then hit F5 (or the 'play' button itself). Then you can explore debugging by exploring the menus under the Debug and under the Debug->Windows menus. One way to do that exploration is through the C++ AMP debugging walkthrough on MSDN. Another way to explore the C++ AMP debugging experience, you can use the moth.cpp code file, which is what I used in my BUILD session debugger demo. Note that for my demo I was using the latest internal VS11 bits, so your experience with the Developer Preview bits won't be identical to what you saw me demonstrate, but it shouldn't be far off. Stay tuned for a lot more content on the parallel debugger in VS 11, both CPU and GPU, both managed and native. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
When to implement: Together with or after the source product?

- by Jeremy Oosthuizen

Somebody recently relayed a prospect's question to me: How hard would it be to implement OUBI after the source product (CC&B, WAM or NMS) has already been implemented? Fact is that MOST non-OUBI Data Warehouse / Business Intelligence implementations take place after the source application(s) are in place and hopefully stable. If an organization decides that they need better reporting and management information, then the logical path (see The Data Warehouse Institute's Data Warehouse Maturity Model) is to a Data Warehouse -- no matter when their last applications were implemented. If there is a pre-built Data Warehouse for their specific application, or even for the desired business process in their industry, they're in luck. Else they have to design and build from scratch, using a toolset. The implementation of a toolset is unlike the implementation of OUBI which, like OBI Apps, contain pre-built ETL routines and user content. Much has been written before about the advantages of that. So, because OUBI is designed specifically for Oracle Utilities transactional products, we often implement them in parallel -- with OUBI lagging a little behind by necessity, like Reporting. Customers know from the start they're going to need the solution, and therefore purchase the products at the same time. My biggest argument FOR a parallel installation/implementation of OUBI with the source product is two-fold: - There could be things (which is the technical term for data elements) that customers figure out they need when implementing OUBI, which are often easier added to the source product's implementation project, than to add later; - OUBI's ETL often points out errors (severe or not) with converted data, which are easier to fix during the source product's implementation project, or it may even be impossible to fix afterwards. The Conversion routines sometimes miss these errors, because the source system can live with the not-quite-perfect converted data. If the data can't be properly extracted, i.e. the proper Dimensions linked to the Facts, then it can't get into OUBI. That means it can't be analyzed effectively along with the rest of the organization's data. Then there is also the throw-away-work argument, which may be significant. The operational / transactional system cannot go live without reports on Day 1. A lot of those reports would be taken care of by the implementation of OUBI. If OUBI is implemented after go-live, those reports STILL have to be built during the source product's implementation project, but they become throw-away after the OUBI implementation. I have sometimes been told that it is better to implement OUBI after the source product, because it cuts down on scope and risk for the source product's implementation project. All I can say to that, is bah humbug. No, seriously, given the arguments above, planning has to include the OUBI implementation and it has to be managed properly -- just like any other implementation. If so, it should not add any risk and it should be included in the scope from the start. The answer to the prospect's question is therefore that it is not that much more difficult; after all, most DW/BI implemenations are done like that. They just have to consider the points above.

Read the article
Triangle Line-Segment Intersection - detecting near misses

- by Will

A ray is a very poor approximation of a player! I think approximating a player with a sphere traveling a straight line each game tick will solve my problems of the player intersecting edges of scenery because their line segment missed it yet their own model is not infinitely thin... I have a 3D triangle and a line segment. I have the normal triangle-line-segment intersection code which I admit I have only a woolly grasp of. To model movement and compute collisions of the player I have to determine if a line passes within sphere-radius of a triangle. But I can find no convenient line near-miss intersection code! Here's the classic triangle intersection ### commented ### code with my starting assumptions: function triangle_ray_intersection(a,b,c,ray_origin,ray_dir,ray_radius) { // http://softsurfer.com/Archive/algorithm_0105/algorithm_0105.htm#intersect_RayTriangle%28%29 // get triangle edge vectors and plane normal var u = vec3_sub(b,a); var v = vec3_sub(c,a); var n = vec3_cross(u,v); if(n[0]==0 && n[1]==0 && n[2]==0) return null; // triangle is degenerate var w0 = vec3_sub(ray_origin,a); var j = vec3_dot(n,ray_dir); if(Math.abs(j) < 0.00000001) { //### if parallel, might still pass within ray_radius of it return null; // parallel, disjoint or on plane } var i = -vec3_dot(n,w0); // get intersect point of ray with triangle plane var k = i / j; if(k < 0.0) return null; // ray goes away from triangle //### as its a line segment, k > 1+ray_radius means no intersect var hit = vec3_add(ray_origin,vec3_scale(ray_dir,k)); // intersect point of ray and plane // is I inside T? //### here I'm a bit lost; this is presumably computing barycentric coordinates? var uu = vec3_dot(u,u); var uv = vec3_dot(u,v); var vv = vec3_dot(v,v); var w = vec3_sub(hit,a); var wu = vec3_dot(w,u); var wv = vec3_dot(w,v); var D = uv * uv - uu * vv; var s = (uv * wv - vv * wu) / D; //### therefore, compute if its within ray_radius scaled to the 0..1 of barycentric coordinates? if(s<0.0 || s>1.0) return null; // I is outside T var t = (uv * wu - uu * wv) / D; if(t<0.0 || (s+t)>1.0) return null; // I is outside T //### finally, if it passses a barycentric test it might still be too far //### to a point; must check that its distance from a corner is within ray_radius too if more than one barycentric coord is >1 //### so we have rounded corners... return [hit,n]; // I is in T } Given the distance between the point of plane intersection and each corner, I ought to be able to determine distance at world scale of how far beyond the edge - beyond 1.0 in barycentric coordinates for each axis - that point is... At this point my head explodes! Is this the right track? What's the actual code? UPDATE: you can earn 100 pts on SO if you answer this question there...! How can you determine if a line segment passes within some distance of a triangle?

Read the article
ODI 11g – Oracle Multi Table Insert

- by David Allan

With the IKM Oracle Multi Table Insert you can generate Oracle specific DML for inserting into multiple target tables from a single query result – without reprocessing the query or staging its result. When designing this to exploit the IKM you must split the problem into the reusable parts – the select part goes in one interface (I named SELECT_PART), then each target goes in a separate interface (INSERT_SPECIAL and INSERT_REGULAR). So for my statement below… /*INSERT_SPECIAL interface */ insert all when 1=1 And (INCOME_LEVEL > 250000) then into SCOTT.CUSTOMERS_NEW (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) values (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) /* INSERT_REGULAR interface */ when 1=1 then into SCOTT.CUSTOMERS_SPECIAL (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) values (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) /*SELECT*PART interface */ select CUSTOMERS.EMAIL EMAIL, CUSTOMERS.CREDIT_LIMIT CREDIT_LIMIT, UPPER(CUSTOMERS.NAME) NAME, CUSTOMERS.USER_MODIFIED USER_MODIFIED, CUSTOMERS.DATE_MODIFIED DATE_MODIFIED, CUSTOMERS.BIRTH_DATE BIRTH_DATE, CUSTOMERS.MARITAL_STATUS MARITAL_STATUS, CUSTOMERS.ID ID, CUSTOMERS.USER_CREATED USER_CREATED, CUSTOMERS.GENDER GENDER, CUSTOMERS.DATE_CREATED DATE_CREATED, CUSTOMERS.INCOME_LEVEL INCOME_LEVEL from SCOTT.CUSTOMERS CUSTOMERS where (1=1) Firstly I create a SELECT_PART temporary interface for the query to be reused and in the IKM assignment I state that it is defining the query, it is not a target and it should not be executed. Then in my INSERT_SPECIAL interface loading a target with a filter, I set define query to false, then set true for the target table and execute to false. This interface uses the SELECT_PART query definition interface as a source. Finally in my final interface loading another target I set define query to false again, set target table to true and execute to true – this is the go run it indicator! To coordinate the statement construction you will need to create a package with the select and insert statements. With 11g you can now execute the package in simulation mode and preview the generated code including the SQL statements. Hopefully this helps shed some light on how you can leverage the Oracle MTI statement. A similar IKM exists for Teradata. The ODI IKM Teradata Multi Statement supports this multi statement request in 11g, here is an extract from the paper at www.teradata.com/white-papers/born-to-be-parallel-eb3053/ Teradata Database offers an SQL extension called a Multi-Statement Request that allows several distinct SQL statements to be bundled together and sent to the optimizer as if they were one. Teradata Database will attempt to execute these SQL statements in parallel. When this feature is used, any sub-expressions that the different SQL statements have in common will be executed once, and the results shared among them. It works in the same way as the ODI MTI IKM, multiple interfaces orchestrated in a package, each interface contributes some SQL, the last interface in the chain executes the multi statement.

Read the article
F# and the rose-tinted reflection

- by CliveT

We're already seeing increasing use of many cores on client desktops. It is a change that has been long predicted. It is not just a change in architecture, but our notions of efficiency in a program. No longer can we focus on the asymptotic complexity of an algorithm by counting the steps that a single core processor would take to execute it. Instead we'll soon be more concerned about the scalability of the algorithm and how well we can increase the performance as we increase the number of cores. This may even lead us to throw away our most efficient algorithms, and switch to less efficient algorithms that scale better. We might even be willing to waste cycles in order to speculatively execute at the algorithm rather than the hardware level. State is the big headache in this parallel world. At the hardware level, main memory doesn't necessarily contain the definitive value corresponding to a particular address. An update to a location might still be held in a CPU's local cache and it might be some time before the value gets propagated. To get the latest value, and the notion of "latest" takes a lot of defining in this world of rapidly mutating state, the CPUs may well need to communicate to decide who has the definitive value of a particular address in order to avoid lost updates. At the user program level, this means programmers will need to lock objects before modifying them, or attempt to avoid the overhead of locking by understanding the memory models at a very deep level. I think it's this need to avoid statefulness that has led to the recent resurgence of interest in functional languages. In the 1980s, functional languages started getting traction when research was carried out into how programs in such languages could be auto-parallelised. Sadly, the impracticality of some of the languages, the overheads of communication during this parallel execution, and rapid improvements in compiler technology on stock hardware meant that the functional languages fell by the wayside. The one thing that these languages were good at was getting rid of implicit state, and this single idea seems like a solution to the problems we are going to face in the coming years. Whether these languages will catch on is hard to predict. The mindset for writing a program in a functional language is really very different from the way that object-oriented problem decomposition happens - one has to focus on the verbs instead of the nouns, which takes some getting used to. There are a number of hybrid functional/object languages that have been becoming more popular in recent times. These half-way houses make it easy to use functional ideas for some parts of the program while still allowing access to the underlying object-focused platform without a great deal of impedance mismatch. One example is F# running on the CLR which, in Visual Studio 2010, has because a first class member of the pack. Inside Visual Studio 2010, the tooling for F# has improved to the point where it is easy to set breakpoints and watch values change while debugging at the source level. In my opinion, it is the tooling support that will enable the widespread adoption of functional languages - without this support, people will put off any transition into the functional world for as long as they possibly can. Without tool support it will make it hard to learn these languages. One tool that doesn't currently support F# is Reflector. The idea of decompiling IL to a functional language is daunting, but F# is potentially so important I couldn't dismiss the idea. As I'm currently developing Reflector 6.5, I thought it wise to take four days just to see how far I could get in doing so, even if it achieved little more than to be clearer on how much was possible, and how long it might take. You can read what happened here, and of the insights it gave us on ways to improve the tool.

Read the article
Give a session on C++ AMP – here is how

- by Daniel Moth

Ever since presenting on C++ AMP at the AMD Fusion conference in June, then the Gamefest conference in August, and the BUILD conference in September, I've had numerous requests about my material from folks that want to re-deliver the same session. The C++ AMP session I put together has evolved over the 3 presentations to its final form that I used at BUILD, so that is the one I recommend you base yours on. Please get the slides and the recording from channel9 (I'll refer to slide numbers below). This is how I've been presenting the C++ AMP session: Context (slide 3, 04:18-08:18) Start with a demo, on my dual-GPU machine. I've been using the N-Body sample (for VS 11 Developer Preview). (slide 4) Use an nvidia slide that has additional examples of performance improvements that customers enjoy with heterogeneous computing. (slide 5) Talk a bit about the differences today between CPU and GPU hardware, leading to the fact that these will continue to co-exist and that GPUs are great for data parallel algorithms, but not much else today. One is a jack of all trades and the other is a number cruncher. (slide 6) Use the APU example from amd, as one indication that the hardware space is still in motion, emphasizing that the C++ AMP solution is a data parallel API, not a GPU API. It has a future proof design for hardware we have yet to see. (slide 7) Provide more meta-data, as blogged about when I first introduced C++ AMP. Code (slide 9-11) Introduce C++ AMP coding with a simplistic array-addition algorithm – the slides speak for themselves. (slide 12-13) index<N>, extent<N>, and grid<N>. (Slide 14-16) array<T,N>, array_view<T,N> and comparison between them. (Slide 17) parallel_for_each. (slide 18, 21) restrict. (slide 19-20) actual restrictions of restrict(direct3d) – the slides speak for themselves. (slide 22) bring it altogether with a matrix multiplication example. (slide 23-24) accelerator, and accelerator_view. (slide 26-29) Introduce tiling incl. tiled matrix multiplication [tiling probably deserves a whole session instead of 6 minutes!]. IDE (slide 34,37) Briefly touch on the concurrency visualizer. It supports GPU profiling, but enhancements specific to C++ AMP we hope will come at the Beta timeframe, which is when I'll be spending more time talking about it. (slide 35-36, 51:54-59:16) Demonstrate the GPU debugging experience in VS 11. Summary (slide 39) Re-iterate some of the points of slide 7, and add the point that the C++ AMP spec will be open for other compiler vendors to implement, even on other platforms (in fact, Microsoft is actively working on that). (slide 40) Links to content – see slide – including where all your questions should go: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads. "But I don't have time for a full blown session, I only need 2 (or just 1, or 3) C++ AMP slides to use in my session on related topic X" If all you want is a small number of slides, you can take some from the session above and customize them. But because I am so nice, I have created some slides for you, including talking points in the notes section. Download them here. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
SQL Azure: Notes on Building a Shard Technology

- by Herve Roggero

In Chapter 10 of the book on SQL Azure (http://www.apress.com/book/view/9781430229612) I am co-authoring, I am digging deeper in what it takes to write a Shard. It's actually a pretty cool exercise, and I wanted to share some thoughts on how I am designing the technology. A Shard is a technology that spreads the load of database requests over multiple databases, as transparently as possible. The type of shard I am building is called a Vertical Partition Shard (VPS). A VPS is a mechanism by which the data is stored in one or more databases behind the scenes, but your code has no idea at design time which data is in which database. It's like having a mini cloud for records instead of services. Imagine you have three SQL Azure databases that have the same schema (DB1, DB2 and DB3), you would like to issue a SELECT * FROM Users on all three databases, concatenate the results into a single resultset, and order by last name. Imagine you want to ensure your code doesn't need to change if you add a new database to the shard (DB4). Now imagine that you want to make sure all three databases are queried at the same time, in a multi-threaded manner so your code doesn't have to wait for three database calls sequentially. Then, imagine you would like to obtain a breadcrumb (in the form of a new, virtual column) that gives you a hint as to which database a record came from, so that you could update it if needed. Now imagine all that is done through the standard SqlClient library... and you have the Shard I am currently building. Here are some lessons learned and techniques I am using with this shard: Parellel Processing: Querying databases in parallel is not too hard using the Task Parallel Library; all you need is to lock your resources when needed Deleting/Updating Data: That's not too bad either as long as you have a breadcrumb. However it becomes more difficult if you need to update a single record and you don't know in which database it is. Inserting Data: I am using a round-robin approach in which each new insert request is directed to the next database in the shard. Not sure how to deal with Bulk Loads just yet... Shard Databases: I use a static collection of SqlConnection objects which needs to be loaded once; from there on all the Shard commands use this collection Extension Methods: In order to make it look like the Shard commands are part of the SqlClient class I use extension methods. For example I added ExecuteShardQuery and ExecuteShardNonQuery methods to SqlClient. Exceptions: Capturing exceptions in a multi-threaded code is interesting... but I kept it simple for now. I am using the ConcurrentQueue to store my exceptions. Database GUID: Every database in the shard is given a GUID, which is calculated based on the connection string's values. DataTable. The Shard methods return a DataTable object which can be bound to objects. I will be sharing the code soon as an open-source project in CodePlex. Please stay tuned on twitter to know when it will be available (@hroggero). Or check www.bluesyntax.net for updates on the shard. Thanks!

Read the article
Give a session on C++ AMP – here is how

- by Daniel Moth

Ever since presenting on C++ AMP at the AMD Fusion conference in June, then the Gamefest conference in August, and the BUILD conference in September, I've had numerous requests about my material from folks that want to re-deliver the same session. The C++ AMP session I put together has evolved over the 3 presentations to its final form that I used at BUILD, so that is the one I recommend you base yours on. Please get the slides and the recording from channel9 (I'll refer to slide numbers below). This is how I've been presenting the C++ AMP session: Context (slide 3, 04:18-08:18) Start with a demo, on my dual-GPU machine. I've been using the N-Body sample (for VS 11 Developer Preview). (slide 4) Use an nvidia slide that has additional examples of performance improvements that customers enjoy with heterogeneous computing. (slide 5) Talk a bit about the differences today between CPU and GPU hardware, leading to the fact that these will continue to co-exist and that GPUs are great for data parallel algorithms, but not much else today. One is a jack of all trades and the other is a number cruncher. (slide 6) Use the APU example from amd, as one indication that the hardware space is still in motion, emphasizing that the C++ AMP solution is a data parallel API, not a GPU API. It has a future proof design for hardware we have yet to see. (slide 7) Provide more meta-data, as blogged about when I first introduced C++ AMP. Code (slide 9-11) Introduce C++ AMP coding with a simplistic array-addition algorithm – the slides speak for themselves. (slide 12-13) index<N>, extent<N>, and grid<N>. (Slide 14-16) array<T,N>, array_view<T,N> and comparison between them. (Slide 17) parallel_for_each. (slide 18, 21) restrict. (slide 19-20) actual restrictions of restrict(direct3d) – the slides speak for themselves. (slide 22) bring it altogether with a matrix multiplication example. (slide 23-24) accelerator, and accelerator_view. (slide 26-29) Introduce tiling incl. tiled matrix multiplication [tiling probably deserves a whole session instead of 6 minutes!]. IDE (slide 34,37) Briefly touch on the concurrency visualizer. It supports GPU profiling, but enhancements specific to C++ AMP we hope will come at the Beta timeframe, which is when I'll be spending more time talking about it. (slide 35-36, 51:54-59:16) Demonstrate the GPU debugging experience in VS 11. Summary (slide 39) Re-iterate some of the points of slide 7, and add the point that the C++ AMP spec will be open for other compiler vendors to implement, even on other platforms (in fact, Microsoft is actively working on that). (slide 40) Links to content – see slide – including where all your questions should go: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads. "But I don't have time for a full blown session, I only need 2 (or just 1, or 3) C++ AMP slides to use in my session on related topic X" If all you want is a small number of slides, you can take some from the session above and customize them. But because I am so nice, I have created some slides for you, including talking points in the notes section. Download them here. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
SQL Server Optimizer Malfunction?

- by Tony Davis

There was a sharp intake of breath from the audience when Adam Machanic declared the SQL Server optimizer to be essentially "stuck in 1997". It was during his fascinating "Query Tuning Mastery: Manhandling Parallelism" session at the recent PASS SQL Summit. Paraphrasing somewhat, Adam (blog | @AdamMachanic) offered a convincing argument that the optimizer often delivers flawed plans based on assumptions that are no longer valid with today’s hardware. In 1997, when Microsoft engineers re-designed the database engine for SQL Server 7.0, SQL Server got its initial implementation of a cost-based optimizer. Up to SQL Server 2000, the developer often had to deploy a steady stream of hints in SQL statements to combat the occasionally wilful plan choices made by the optimizer. However, with each successive release, the optimizer has evolved and improved in its decision-making. It is still prone to the occasional stumble when we tackle difficult problems, join large numbers of tables, perform complex aggregations, and so on, but for most of us, most of the time, the optimizer purrs along efficiently in the background. Adam, however, challenged further any assumption that the current optimizer is competent at providing the most efficient plans for our more complex analytical queries, and in particular of offering up correctly parallelized plans. He painted a picture of a present where complex analytical queries have become ever more prevalent; where disk IO is ever faster so that reads from disk come into buffer cache faster than ever; where the improving RAM-to-data ratio means that we have a better chance of finding our data in cache. Most importantly, we have more CPUs at our disposal than ever before. To get these queries to perform, we not only need to have the right indexes, but also to be able to split the data up into subsets and spread its processing evenly across all these available CPUs. Improvements such as support for ColumnStore indexes are taking things in the right direction, but, unfortunately, deficiencies in the current Optimizer mean that SQL Server is yet to be able to exploit properly all those extra CPUs. Adam’s contention was that the current optimizer uses essentially the same costing model for many of its core operations as it did back in the days of SQL Server 7, based on assumptions that are no longer valid. One example he gave was a "slow disk" bias that may have been valid back in 1997 but certainly is not on modern disk systems. Essentially, the optimizer assesses the relative cost of serial versus parallel plans based on the assumption that there is no IO cost benefit from parallelization, only CPU. It assumes that a single request will saturate the IO channel, and so a query would not run any faster if we parallelized IO because the disk system simply wouldn’t be able to handle the extra pressure. As such, the optimizer often decides that a serial plan is lower cost, often in cases where a parallel plan would improve performance dramatically. It was challenging and thought provoking stuff, as were his techniques for driving parallelism through query logic based on subsets of rows that define the "grain" of the query. I highly recommend you catch the session if you missed it. I’m interested to hear though, when and how often people feel the force of the optimizer’s shortcomings. Barring mistakes, such as stale statistics, how often do you feel the Optimizer fails to find the plan you think it should, and what are the most common causes? Is it fighting to induce it toward parallelism? Combating unexpected plans, arising from table partitioning? Something altogether more prosaic? Cheers, Tony.

Read the article
About the K computer

- by nospam(at)example.com (Joerg Moellenkamp)

Okay ? after getting yet another mail because of the new #1 on the Top500 list, I want to add some comments from my side: Yes, the system is using SPARC processor. And that is great news for a SPARC fan like me. It is using the SPARC VIIIfx processor from Fujitsu clocked at 2 GHz. No, it isn't the only one. Most people are saying there are two in the Top500 list using SPARC (#77 JAXA and #1 K) but in fact there are three. The Tianhe-1 (#2 on the Top500 list) super computer contains 2048 Galaxy "FT-1000" 1 GHz 8-core processors. Don't know it? The FeiTeng-1000 ? this proc is a 8 core, 8 threads per core, 1 ghz processor made in China. And it's SPARC based. By the way ? this sounds really familiar to me ? perhaps the people just took the opensourced UltraSPARC-T2 design, because some of the parameters sound just to similar. However it looks like that Tianhe-1 is using the SPARCs as input nodes and not as compute notes. No, I don't see it as the next M-series processor. Simple reason: You can't create SMP systems out of them ? it simply hasn't the functionality to do so. Even when there are multiple CPUs on a single board, they are not connected like an SMP/NUMA machine to a shared memory machine ? they are connected with the cluster interconnect (in this case the Tofu interconnect) and work like a large cluster. Yes, it has a lot of oomph in Linpack ? however I assume a lot came from the extensions to the SPARCv9 standard. No, Linpack has no relevance for any commercial workload ? Linpack is such a special load, that even some HPC people are arguing that it isn't really a good benchmark for HPC. It's embarrassingly parallel, it can work with relatively small interconnects compared to the interconnects in SMP systems (however we get in spheres SMP interconnects where a few years ago). Amdahl isn't hitting that hard when running Linpack. Yes, it's a good move to use SPARC. At some time in the last 10 years, there was an interesting twist in perception: SPARC was considered as proprietary architecture and x86 was the open architecture. However it's vice versa ? try to create a x86 clone and you have a lot of intellectual property problems, create a SPARC clone and you have to spend 100 bucks or so to get the specification from the SPARC Foundation and develop your own SPARC processor. Fujitsu is doing this for a long time now. So they had their own processor, their own know-how. So why was SPARC a good choice? Well ? essentially Fujitsu can do what they want with their core as it is their core, for example adding the extensions to the SPARCv9 chipset ? getting Intel to create extensions to x86 to help you with your product is a little bit harder. So Fujitsu could do they needed to do with their processor in order to create such a supercomputer. No, the K is really using no FPGA or GPU as accelerators. The K is really using the CPU at doing this job. Yes, it has a significantly enhanced FPU capable to execute 8 instructions in parallel. No, it doesn't run Solaris. Yes, it uses Linux. No, it doesn't hurt me ... as my colleague Roland Rambau (he knows a lot about HPC) said once to me ... it doesn't matter which OS is staying out of the way of the workload in HPC.

Read the article
Faster Memory Allocation Using vmtasks

- by Steve Sistare

You may have noticed a new system process called "vmtasks" on Solaris 11 systems: % pgrep vmtasks 8 % prstat -p 8 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 8 root 0K 0K sleep 99 -20 9:10:59 0.0% vmtasks/32 What is vmtasks, and why should you care? In a nutshell, vmtasks accelerates creation, locking, and destruction of pages in shared memory segments. This is particularly helpful for locked memory, as creating a page of physical memory is much more expensive than creating a page of virtual memory. For example, an ISM segment (shmflag & SHM_SHARE_MMU) is locked in memory on the first shmat() call, and a DISM segment (shmflg & SHM_PAGEABLE) is locked using mlock() or memcntl(). Segment operations such as creation and locking are typically single threaded, performed by the thread making the system call. In many applications, the size of a shared memory segment is a large fraction of total physical memory, and the single-threaded initialization is a scalability bottleneck which increases application startup time. To break the bottleneck, we apply parallel processing, harnessing the power of the additional CPUs that are always present on modern platforms. For sufficiently large segments, as many of 16 threads of vmtasks are employed to assist an application thread during creation, locking, and destruction operations. The segment is implicitly divided at page boundaries, and each thread is given a chunk of pages to process. The per-page processing time can vary, so for dynamic load balancing, the number of chunks is greater than the number of threads, and threads grab chunks dynamically as they finish their work. Because the threads modify a single application address space in compressed time interval, contention on locks protecting VM data structures locks was a problem, and we had to re-scale a number of VM locks to get good parallel efficiency. The vmtasks process has 1 thread per CPU and may accelerate multiple segment operations simultaneously, but each operation gets at most 16 helper threads to avoid monopolizing CPU resources. We may reconsider this limit in the future. Acceleration using vmtasks is enabled out of the box, with no tuning required, and works for all Solaris platform architectures (SPARC sun4u, SPARC sun4v, x86). The following tables show the time to create + lock + destroy a large segment, normalized as milliseconds per gigabyte, before and after the introduction of vmtasks: ISM system ncpu before after speedup ------ ---- ------ ----- ------- x4600 32 1386 245 6X X7560 64 1016 153 7X M9000 512 1196 206 6X T5240 128 2506 234 11X T4-2 128 1197 107 11x DISM system ncpu before after speedup ------ ---- ------ ----- ------- x4600 32 1582 265 6X X7560 64 1116 158 7X M9000 512 1165 152 8X T5240 128 2796 198 14X (I am missing the data for T4 DISM, for no good reason; it works fine). The following table separates the creation and destruction times: ISM, T4-2 before after ------ ----- create 702 64 destroy 495 43 To put this in perspective, consider creating a 512 GB ISM segment on T4-2. Creating the segment would take 6 minutes with the old code, and only 33 seconds with the new. If this is your Oracle SGA, you save over 5 minutes when starting the database, and you also save when shutting it down prior to a restart. Those minutes go directly to your bottom line for service availability.

Read the article
Is Rails Metal (& Rack) a good way to implement a high traffic web service api?

- by Greg

I am working on a very typical web application. The main component of the user experience is a widget that a site owner would install on their front page. Every time their front page loads, the widget talks to our server and displays some of the data that returns. So there are two components to this web application: the front end UI that the site owner uses to configure their widget the back end component that responds to the widget's web api call Previously we had all of this running in PHP. Now we are experimenting with Rails, which is fantastic for #1 (the front end UI). The question is how to do #2, the back serving of widget information, efficiently. Obviously this is much higher load than the front end, since it is called every time the front page loads on one of our clients' websites. I can see two obvious approaches: A. Parallel Stack: Set up a parallel stack that uses something other than rails (e.g. our old PHP-based approach) but accesses the same database as the front end B. Rails Metal: Use Rails Metal/Rack to bypass the Rails routing mechanism, but keep the api call responder within the Rails app My main question: Is Rails/Metal a reasonable approach for something like this? But also... Will the overhead of loading the Rails environment still be too heavy? Is there a way to get even closer to the metal with Rails, bypassing most of the environment? Will Rails/Metal performance approach the perf of a similar task on straight PHP (just looking for ballpark here)? And... Is there a 'C' option that would be much better than both A and B? That is, something before going to the lengths of C code compiled to binary and installed as an nginx or apache module? Thanks in advance for any insights.

Read the article
solve a classic map-reduce problem with opencl?

- by liuliu

I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me. Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and the sample set (9000+ dimensions for each). It is a classic map-reduce problem in the sense that I need to calculate the score of every feature on every sample (Map). And then, sum up the overall score for every feature (Reduce). There are around 10k features and 30k samples. I tried different ways to solve the problem. First, I tried to decompose the problem by features. The problem is that the score calculation consists of random memory access (pick some of the 9000+ dimensions and do plus/subtraction calculations). Since I cannot coalesce memory access, it costs. Then, I tried to decompose the problem by samples. The problem is that to sum up overall score, all threads are competing for few score variables. It keeps overwriting the score which turns out to be incorrect. (I cannot carry out individual score first and sum up later because it requires 10k * 30k * 4 bytes). The first method I tried gives me the same performance on i7 860 CPU with 8 threads. However, I don't think the problem is unsolvable: it is remarkably similar to ray tracing problem (for which you carry out calculation that millions of rays against millions of triangles). Any ideas?

Read the article
animation extender in datalist control in asp.net 2008

- by BibiBuBu

Good Day! i have a question that how can i use animation extender in datalist control in asp.net with c#. i want the animation when i click the delete button (delete button will be in repeater). so that when i remove one record then it shows animation to bring the next record. it is in update panel. <cc1:AnimationExtender ID="AnimationExtender1" runat="server" Enabled="True" TargetControlID="btnDeleteId"> <Animations> <OnClick> <Sequence> <EnableAction Enabled="false" /> <Parallel Duration=".2"> <Resize Height="0" Width="0" Unit="px" /> <FadeOut /> </Parallel> <HideAction /> </Sequence> </OnClick> </Animations> </cc1:AnimationExtender> now if i put my button id in the Target control id then it gives error that it should not be in same update panel etc... but over all nothing working for animation. i am binding my datalist in itemDataBound....e.g. ImageButton imgbtn = (ImageButton)e.Item.FindControl("imgBtnPic"); Label lblAvatar = (Label)e.Item.FindControl("lblAvatar"); LinkButton lbName = (LinkButton)e.Item.FindControl("lbtnName"); Can somebody please suggest me something. thanks

Read the article
Spatial Rotation in Gmod Expression2.

- by Fascia

I'm using expression2 to program behavior in Garry's mod (http://wiki.garrysmod.com/?title=Wire_Expression2) Okay so, to set the precedent. In Gmod I have a block and I am at a complete loss of how to get it to rotate around the 3 up, down and right vectors (Which are local. ie; if I pitch it 45 degrees the forward vector is 0.707, 0.707, 0). Essentially, From the 3 vectors I'd like to be able to get local Pitch/Roll/Yaw. By Local Pitch Roll Yaw I mean that they are completely independent of one another allowing true 3d rotation. So for example; if I place my craft so its nose is parallel to the floor the X,Y,Z would be 0,0,0. If I turn it parallel to the floor (World and Local Yaw) 90 degrees it's now 0, 0, 90. If I then pitch it (World Roll, Local Pitch) it 180 degrees it's now 180, 0, 90. I've already explored quaternions however I don't believe I should post my code here as I think I was re-inventing the wheel. I know I didn't explain that well but I believe the problem is pretty generic. Any help anyone could offer is greatly appreciated. Oh, I'd like to avoid gimblelock too. Essentially calculating the rotation around each of the crafts up/forward/right vectors using the up/forward/right vectors. To simply the question a generic implementation rather than one specific to Gmod is absolutely fine.

Read the article
Getting browser to make an AJAX call ASAP, while page is still loading

- by Chris

I'm looking for tips on how to get the browser to kick off an AJAX call as soon as possible into processing a web page, ideally before the page is fully downloaded. Here's my approximate motivation: I have a web page that takes, say, 5 seconds to load. It calls a web service that takes, say, 10 seconds to load. If loading the page and calling the web service happened sequentially, the user would have to wait 15 seconds to see all the information. However, if I can get the web service call started before the 5 second page load is complete, then at least some of the work can happened in parallel. Ideally I'd like as much of the work to happen in parallel as possible. My initial theory was that I should place the AJAX-calling javascript as high up as possible in the web page HTML source (being mindful of putting it after the jquery.js include, because I'm making the call using jquery ajax). I'm also being mindful not to wrap the AJAX call in a jquery ready event handler. (I mention this because ready events are popular in a lot of jquery example code.) However, the AJAX call still doesn't seem to get kicked off as early as I'm hoping (at least as judged by the Google Chrome "Timeline" feature), so I'm wondering what other considerations apply. One thing that might potentially be detrimental is that the AJAX call is back to the same web server that's serving the original web page, so I might be in danger of hitting a browser limit on the # of HTTP connections back to that one machine? (The HTML page loads a number of images, css files, etc..)

Read the article
P6 Architecture - Register renaming aside, does the limited user registers result in more ops spent

- by mrjoltcola

I'm studying JIT design with regard to dynamic languages VM implementation. I haven't done much Assembly since the 8086/8088 days, just a little here or there, so be nice if I'm out of sorts. As I understand it, the x86 (IA-32) architecture still has the same basic limited register set today that it always did, but the internal register count has grown tremendously, but these internal registers are not generally available and are used with register renaming to achieve parallel pipelining of code that otherwise could not be parallelizable. I understand this optimization pretty well, but my feeling is, while these optimizations help in overall throughput and for parallel algorithms, the limited register set we are still stuck with results in more register spilling overhead such that if x86 had double, or quadruple the registers available to us, there may be significantly less push/pop opcodes in a typical instruction stream? Or are there other processor optmizations that also optimize this away that I am unaware of? Basically if I've a unit of code that has 4 registers to work with for integer work, but my unit has a dozen variables, I've got potentially a push/pop for every 2 or so instructions. Any references to studies, or better yet, personal experiences?

Read the article

Search Results

Search found 7538 results on 302 pages for 'parallel processing'.

Page 114/302 | < Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >

- by user1265125

- by makerofthings7

- by Hengjie

- by Daniel Moth

- by Jeremy Oosthuizen

- by Will

- by David Allan

- by CliveT

- by Daniel Moth

- by Herve Roggero

- by Daniel Moth

- by Tony Davis

- by nospam(at)example.com (Joerg Moellenkamp)

- by Steve Sistare

- by Greg

- by liuliu

- by BibiBuBu

- by Fascia

- by Chris

- by mrjoltcola

< Previous Page | 110 111 112 113 114 115 116 117 118 119 120 121 | Next Page >