Search Results

Search found 13619 results on 545 pages for 'memory mapped'.

Page 173/545 | < Previous Page | 169 170 171 172 173 174 175 176 177 178 179 180  | Next Page >

  • Slow draw on some apps and dynamic clocks not working properly with ATI/AMD proprietary drivers

    - by Rakeka
    I've recently purchased a new computer (around July 2010) and I've been having some problems with proprietary video drivers on Linux. The hardware is: Video: ATI/AMD Radeon HD 5870 (XFX HD-587X-ZNFC); Motherboard: Asus P7P55D-E Deluxe; Processor: Intel i5 750; Memory: Kingston Hyperx KHX1600C8D3K2/4GX (2x - 8GB Total); Power Supply: XFX P1-750B-CAG9; There are no overclocks, not even the memories (they are at 1333mhz due processor memory controller limitation). The operational system is a homebrew Linux distribution with the following software: Architecture: x86_64 (multilib) Kernel: 2.6.35.10 Xorg: 7.5 Window Manager: wmii-3.9.2 Video Driver: ATI/AMD Catalyst 10.12 There are no desktop effects programs like compiz fusion or beryl. The problems: With ATI/AMD proprietary driver, some applications are with slow draw/redraw, and, the same applications make the driver to increase the card clocks to maximum (0% gpu activity, only the clocks are increased). I dunno exactly how to describe the slow draw but I'll list some applications and symptoms. xterm Flickers a lot when drawing continuous output; When I'm in a workspace with fullscreen xterm, The gpu load stays at 12% in idle, and, with smaller xterm, smaller GPU load. "aticonfig --odgc" output: Default Adapter - ATI Radeon HD 5800 Series Core (MHz) Memory (MHz) Current Clocks : 157 300 Current Peak : 850 1200 Configurable Peak Range : [600-900] [900-1300] GPU load : 12% "aticonfig --pplib-cmd 'get activity'" output: Current Activity is Core Clock: 157MHZ Memory Clock: 300MHZ VDDC: 950 Activity: 12 percent Performance Level: 0 Bus Speed: 5000 Bus Lanes: 16 Maximum Bus Lanes: 16 More examples: mplayer time info flickers on terminal; "find /" flickers a lot (It takes some time to stop with control-c. But, If I change the workspace or put some window upon it, just after the control-c, it stops instantly); "cat somefile" if the file is big (Xorg.0.log for example) it takes some time to display; vim and less (ex: find / | less) don't have much problems, just a little flicker when scrolling; mplayer (no gui) Slow reproduction and seek with -vo x11; Tearing with -vo xv; Time info flickers on terminal (xterm consequence); gvim A little slow draw when scrolling with page up/page down; Firefox Slow draw/redraw on some pages like www.boadica.com.br and sometimes on www.youtube.com with flash enable (never noticed on many pages); Corruptions when informative yellow boxes are showing and scroll the page (an gray box appears at the same place of the informative box); "Wallpaper" After minimizing a fullscreen window or changing to an empty workspace it takes some time to redraw wallpaper. "Video Card" The core and memory clocks are increased with the events described above and on other situations like change workspace (even without wallpaper), minimize, maximize or move a window; Idle clocks: Core: 157mhz, Memory: 300mhz Full clocks: Core: 850mhz, Memory: 1200mhz xpdf Painful slow scrolling; display (from ImageMagick) Slow menus and sometimes slow image redraw; Programs that I use and are apparently without problems: gimp; pidgin; mplayer (-vo gl, gl2); blender; unigine heaven (better fps than on Windows); doom3; tibia; penumbra overture; amnesia the dark descent (wine); diablo 2 (wine); No problems on Windows (Windows 7 Ultimate 64bit). And special note to this: Full desktop effects from Debian and Ubuntu gnome appearance cpanel don't cause ANY problems, even the core and memory clocks don't increase when change workspace, minimize, maximize or move a window. What I've tested: Unsuccessful tests: Tested all drivers versions since 10.6 (released approximately when I've installed the first slackware in this PC); Tested other video card - ATI/AMD Radeon HD 5570 (XFX HD-557X-ZHF2); Tested some options on xorg.conf and that I've found googling (some of these options are commented on my xorg.conf. I'll send the links at the end of post); Tested some patches like 107_fedora_dont_fill_bg_none.patch and xserver-xorg-backclear.patch from Arch Linux Catalyst page (https://wiki.archlinux.org/index.php/ATI_Catalyst); Tested other distros and software versions: Tested XORG-7.6 on my own distribution; Tested Debian Squeeze (testing - from 2010-12-20); Tested Ubuntu Marverick (10.10); Tested Slackware 13.1; Distros info: Architecture: i386 Debian and Ubuntu with all default software (kernel, gnome, xorg, drivers); Slackware with Catalyst from AMD page and default window managers like: fvwm, xfce, and my own build of wmii; Successful tests: Tested other video card (only on my homebrew distro) - NVIDIA Geforce 7300GS with driver 260.19.29; That didn't shown the slow draw problems, but that card is a bit obsolete, so, dunno if that lacks features like the dynamic clocks. I don't dispose of other video cards like nvidia g/gt/gts/gtx 200~400~500 or Radeon HD 3000/4000/6000 to make more tests. Tested other hardware: Video: ATI/AMD Radeon HD 5570 (XFX HD-557X-ZHF2); Motherboard: Intel DG31PR; Processor: Core 2 Duo E6750; Software for that hardware: Fresh install of same distros (except for the mine) with same program versions; That video card (HD 5570) were full time at the maximum clocks (something like 500/750, don't remember) in all the operational systems (Windows XP and Windows 7 too), but it didn't shown the same problems that I have here. I've googled a lot about common problems with ATI/AMD proprietary drivers for Linux and didn't find similar problems, except by the Firefox corruptions, that the solutions were to disable ATI Direct2DAccel and use XAA. With XAA the problems persists and the other applications like pidgin and rest of Firefox showed the same problems of slow draw/redraw. Open source Drivers: With open source drivers (xf86-video-ati-6.13.2) I hadn't the same slow draw problems, but, had other problems, that, for now, make it no viable solution. I'll not discuss it here because this is another line of problems and will confuse everything. If it happens to be the only solution, I'll make another thread to discuss it. Logs and Configs: kernel .config dmesg xorg package list xorg.conf Xorg.0.log

    Read the article

  • Best Practices - Core allocation

    - by jsavit
    This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (also called Logical Domains) Introduction SPARC T-series servers currently have up to 4 CPU sockets, each of which has up to 8 or (on SPARC T3) 16 CPU cores, while each CPU core has 8 threads, for a maximum of 512 dispatchable CPUs. The defining feature of Oracle VM Server for SPARC is that each domain is assigned CPU threads or cores for its exclusive use. This avoids the overhead of software-based time-slicing and emulation (or binary rewriting) of system state-changing privileged instructions used in traditional hypervisors. To create a domain, administrators specify either the number of CPU threads or cores that the domain will own, as well as its memory and I/O resources. When CPU resources are assigned at the individual thread level, the logical domains constraint manager attempts to assign threads from the same cores to a domain, and avoid "split core" situations where the same CPU core is used by multiple domains. Sometimes this is unavoidable, especially when domains are allocated and deallocated CPUs in small increments. Why split cores can matter Split core allocations can silenty reduce performance because multiple domains with different address spaces and memory contents are sharing the core's Level 1 cache (L1$). This is called false cache sharing since even identical memory addresses from different domains must point to different locations in RAM. The effect of this is increased contention for the cache, and higher memory latency for each domain using that core. The degree of performance impact can be widely variable. For applications with very small memory working sets, and with I/O bound or low-CPU utilization workloads, it may not matter at all: all machines wait for work at the same speed. If the domains have substantial workloads, or are critical to performance then this can have an important impact: This blog entry was inspired by a customer issue in which one CPU core was split among 3 domains, one of which was the control and service domain. The reported problem was increased I/O latency in guest domains, but the root cause might be higher latency servicing the I/O requests due to the control domain being slowed down. What to do about it Split core situations are easily avoided. In most cases the logical domain constraint manager will avoid it without any administrative action, but it can be entirely prevented by doing one of the several actions: Assign virtual CPUs in multiples of 8 - the number of threads per core. For example: ldm set-vcpu 8 mydomain or ldm add-vcpu 24 mydomain. Each domain will then be allocated on a core boundary. Use the whole core constraint when assigning CPU resources. This allocates CPUs in increments of entire cores instead of virtual CPU threads. The equivalent of the above commands would be ldm set-core 1 mydomain or ldm add-core 3 mydomain. Older syntax does the same thing by adding the -c flag to the add-vcpu, rm-vcpu and set-vcpu commands, but the new syntax is recommended. When whole core allocation is used an attempt to add cores to a domain fails if there aren't enough completely empty cores to satisfy the request. See https://blogs.oracle.com/sharakan/entry/oracle_vm_server_for_sparc4 for an excellent article on this topic by Eric Sharakan. Don't obsess: - if the workloads have minimal CPU requirements and don't need anywhere near a full CPU core, then don't worry about it. If you have low utilization workloads being consolidated from older machines onto a current T-series, then there's no need to worry about this or to assign an entire core to domains that will never use that much capacity. In any case, make sure the most important domains have their own CPU cores, in particular the control domain and any I/O or service domain, and of course any important guests. Summary Split core CPU allocation to domains can potentially have an impact on performance, but the logical domains manager tends to prevent this situation, and it can be completely and simply avoided by allocating virtual CPUs on core boundaries.

    Read the article

  • Organization &amp; Architecture UNISA Studies &ndash; Chap 6

    - by MarkPearl
    Learning Outcomes Discuss the physical characteristics of magnetic disks Describe how data is organized and accessed on a magnetic disk Discuss the parameters that play a role in the performance of magnetic disks Describe different optical memory devices Magnetic Disk The way data is stored on and retried from magnetic disks Data is recorded on and later retrieved form the disk via a conducting coil named the head (in many systems there are two heads) The writ mechanism exploits the fact that electricity flowing through a coil produces a magnetic field. Electric pulses are sent to the write head, and the resulting magnetic patterns are recorded on the surface below with different patterns for positive and negative currents The physical characteristics of a magnetic disk   Summarize from book   The factors that play a role in the performance of a disk Seek time – the time it takes to position the head at the track Rotational delay / latency – the time it takes for the beginning of the sector to reach the head Access time – the sum of the seek time and rotational delay Transfer time – the time it takes to transfer data RAID The rate of improvement in secondary storage performance has been considerably less than the rate for processors and main memory. Thus secondary storage has become a bit of a bottleneck. RAID works on the concept that if one disk can be pushed so far, additional gains in performance are to be had by using multiple parallel components. Points to note about RAID… RAID is a set of physical disk drives viewed by the operating system as a single logical drive Data is distributed across the physical drives of an array in a scheme known as striping Redundant disk capacity is used to store parity information, which guarantees data recoverability in case of a disk failure (not supported by RAID 0 or RAID 1) Interesting to note that the increase in the number of drives, increases the probability of failure. To compensate for this decreased reliability RAID makes use of stored parity information that enables the recovery of data lost due to a disk failure.   The RAID scheme consists of 7 levels…   Category Level Description Disks Required Data Availability Large I/O Data Transfer Capacity Small I/O Request Rate Striping 0 Non Redundant N Lower than single disk Very high Very high for both read and write Mirroring 1 Mirrored 2N Higher than RAID 2 – 5 but lower than RAID 6 Higher than single disk Up to twice that of a signle disk for read Parallel Access 2 Redundant via Hamming Code N + m Much higher than single disk Highest of all listed alternatives Approximately twice that of a single disk Parallel Access 3 Bit interleaved parity N + 1 Much higher than single disk Highest of all listed alternatives Approximately twice that of a single disk Independent Access 4 Block interleaved parity N + 1 Much higher than single disk Similar to RAID 0 for read, significantly lower than single disk for write Similar to RAID 0 for read, significantly lower than single disk for write Independent Access 5 Block interleaved parity N + 1 Much higher than single disk Similar to RAID 0 for read, lower than single disk for write Similar to RAID 0 for read, generally  lower than single disk for write Independent Access 6 Block interleaved parity N + 2 Highest of all listed alternatives Similar to RAID 0 for read; lower than RAID 5 for write Similar to RAID 0 for read, significantly lower than RAID 5  for write   Read page 215 – 221 for detailed explanation on RAID levels Optical Memory There are a variety of optical-disk systems available. Read through the table on page 222 – 223 Some of the devices include… CD CD-ROM CD-R CD-RW DVD DVD-R DVD-RW Blue-Ray DVD Magnetic Tape Most modern systems use serial recording – data is lade out as a sequence of bits along each track. The typical recording used in serial is referred to as serpentine recording. In this technique when data is being recorded, the first set of bits is recorded along the whole length of the tape. When the end of the tape is reached the heads are repostioned to record a new track, and the tape is again recorded on its whole length, this time in the opposite direction. That process continued back and forth until the tape is full. To increase speed, the read-write head is capable of reading and writing a number of adjacent tracks simultaneously. Data is still recorded serially along individual tracks, but blocks in sequence are stored on adjacent tracks as suggested. A tape drive is a sequential access device. Magnetic tape was the first kind of secondary memory. It is still widely used as the lowest-cost, slowest speed member of the memory hierarchy.

    Read the article

  • Mastering snow and Java development at jDays in Gothenburg

    - by JavaCecilia
    Last weekend, I took the train from Stockholm to Gothenburg to attend and present at the new Java developer conference jDays. It was professionally arranged in the Swedish exhibition hall close to the amusement park Liseberg and we got a great deal out of the top-level presenters and hallway discussions. Understanding and Improving Your Java Process Our main purpose was to spread information on JVM and our monitoring tools for Java processes, so I held a crash course in the most important terms and concepts if you want to affect the performance of your Java process. From the beginning - the JVM specification to interpretation of heap usage graphs. For correct analysis, you also need to understand something about process memory - you need space for the Java heap (-Xms for initial size and -Xmx for max heap size), but the process memory also contain the thread stacks (to a size of -Xss), JVM internal data structures used for keeping track of Java objects on the heap, method compilation/optimization, native libraries, etc. If you get long pause times, make sure to monitor your application, see the allocation rate and frequency of pause times.My colleague Klara Ward then held a presentation on the Java Mission Control product, the profiling and diagnostics tools suite for HotSpot, coming soon. The room was packed and very appreciated, Klara demonstrated four different scenarios, e.g. how to diagnose and fix latencies due to lock contention for logging.My German colleague, OpenJDK ambassador Dalibor Topic travelled to Sweden to do the second keynote on "Make the Future Java". He let us in on the coming features and roadmaps of Java, now delivering major versions on a two-year schedule (Java 7 2011, Java 8 2013, etc). Also letting us in on where to download early versions of 8, to report problems early on. Software Development in teams Being a scout leader, I'm drilled in different team building and workshop techniques, creating strong groups - of course, I had to attend Henrik Berglund's session on building successful teams. He spoke about the importance of clear goals, autonomy and agreed processes. Thomas Sundberg ended the conference by doing live remote pair programming with Alex in Rumania and a concrete tips for people wanting to try it out (for local collaboration, remember to wash and change clothes). Memory Master Keynote The conference keynote was delivered by the Swedish memory master Mattias Ribbing, showing off by remembering the order of a deck of cards he'd seen once. He made it interactive by forcing the audience to learn a memory mastering technique of remembering ten ordered things by heart, asking us to shout out the order backwards and we made it! I desperately need this - bought the book, will get back on the subject. Continuous Delivery The most impressive presenter was Axel Fontaine on Continuous Delivery. Very well prepared slides with key images of his message and moved about the stage like a rock star. The topic is of course highly interesting, how to create an infrastructure enabling immediate feedback to developers and ability to release your product several times per day. Tomek Kaczanowski delivered a funny and useful presentation on good and bad tests, providing comic relief with poorly written tests and the useful rules of thumb how to rewrite them. To conclude, we had a great time and hope to see you at jDays next year :)

    Read the article

  • trying to setup wireless

    - by JohnMerlino
    I'm trying to set up wireless on vostro 1520 dell laptop, with latest Ubuntu install. Here's the output of some of the commands that I was told to run: lshw -C network viggy@ubuntu:~$ lshw -C network WARNING: you should run this program as super-user. *-network description: Ethernet interface product: RTL8111/8168B PCI Express Gigabit Ethernet controller vendor: Realtek Semiconductor Co., Ltd. physical id: 0 bus info: pci@0000:08:00.0 logical name: eth0 version: 03 serial: 00:24:e8:da:84:25 size: 100Mbit/s capacity: 1Gbit/s width: 64 bits clock: 33MHz capabilities: bus_master cap_list rom ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=2.3LK-NAPI duplex=full firmware=rtl_nic/rtl8168d-1.fw ip=192.168.2.6 latency=0 multicast=yes port=MII speed=100Mbit/s resources: irq:47 ioport:3000(size=256) memory:f6004000-f6004fff memory:f6000000-f6003fff memory:f6020000-f603ffff *-network description: Network controller product: BCM4312 802.11b/g LP-PHY vendor: Broadcom Corporation physical id: 0 bus info: pci@0000:0e:00.0 version: 01 width: 64 bits clock: 33MHz capabilities: bus_master cap_list configuration: driver=b43-pci-bridge latency=0 resources: irq:18 memory:fa000000-fa003fff *-network DISABLED description: Wireless interface physical id: 1 logical name: wlan0 serial: 0c:60:76:05:ee:74 capabilities: ethernet physical wireless configuration: broadcast=yes driver=b43 driverversion=3.2.0-29-generic firmware=N/A multicast=yes wireless=IEEE 802.11bg lspci 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07) 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03) 00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03) 00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03) 00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03) 00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03) 00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03) 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03) 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03) 00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93) 00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03) 08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03) 0e:00.0 Network controller: Broadcom Corporation BCM4312 802.11b/g LP-PHY (rev 01) 1a:00.0 FireWire (IEEE 1394): O2 Micro, Inc. Device 10f7 (rev 01) 1a:00.1 SD Host controller: O2 Micro, Inc. Device 8120 (rev 01) 1a:00.2 Mass storage controller: O2 Micro, Inc. Device 8130 (rev 01) iwconfig lo no wireless extensions. wlan0 IEEE 802.11bg ESSID:off/any Mode:Managed Access Point: Not-Associated Tx-Power=0 dBm Retry long limit:7 RTS thr:off Fragment thr:off Power Management:on eth0 no wireless extensions. At this point in time, I don't have wireless.

    Read the article

  • Data breakpoints to find points where data gets broken

    - by raccoon_tim
    When working with a large code base, finding reasons for bizarre bugs can often be like finding a needle in a hay stack. Finding out why an object gets corrupted without no apparent reason can be quite daunting, especially when it seems to happen randomly and totally out of context. Scenario Take the following scenario as an example. You have defined the a class that contains an array of characters that is 256 characters long. You now implement a method for filling this buffer with a string passed as an argument. At this point you mistakenly expect the buffer to be 256 characters long. At some point you notice that you require another character buffer and you add that after the previous one in the class definition. You now figure that you don’t need the 256 characters that the first member can hold and you shorten that to 128 to conserve space. At this point you should start thinking that you also have to modify the method defined above to safeguard against buffer overflow. It so happens, however, that in this not so perfect world this does not cross your mind. Buffer overflow is one of the most frequent sources for errors in a piece of software and often one of the most difficult ones to detect, especially when data is read from an outside source. Many mass copy functions provided by the C run-time provide versions that have boundary checking (defined with the _s suffix) but they can not guard against hard coded buffer lengths that at some point get changed. Finding the bug Getting back to the scenario, you’re now wondering why does the second string get modified with data that makes no sense at all. Luckily, Visual Studio provides you with a tool to help you with finding just these kinds of errors. It’s called data breakpoints. To add a data breakpoint, you first run your application in debug mode or attach to it in the usual way, and then go to Debug, select New Breakpoint and New Data Breakpoint. In the popup that opens, you can type in the memory address and the amount of bytes you wish to monitor. You can also use an expression here, but it’s often difficult to come up with an expression for data in an object allocated on the heap when not in the context of a certain stack frame. There are a couple of things to note about data breakpoints, however. First of all, Visual Studio supports a maximum of four data breakpoints at any given time. Another important thing to notice is that some C run-time functions modify memory in kernel space which does not trigger the data breakpoint. For instance, calling ReadFile on a buffer that is monitored by a data breakpoint will not trigger the breakpoint. The application will now break at the address you specified it to. Often you might immediately spot the issue but the very least this feature can do is point you in the right direction in search for the real reason why the memory gets inadvertently modified. Conclusions Data breakpoints are a great feature, especially when doing a lot of low level operations where multiple locations modify the same data. With the exception of some special cases, like kernel memory modification, you can use it whenever you need to check when memory at a certain location gets changed on purpose or inadvertently.

    Read the article

  • Concurrent Affairs

    - by Tony Davis
    I once wrote an editorial, multi-core mania, on the conundrum of ever-increasing numbers of processor cores, but without the concurrent programming techniques to get anywhere near exploiting their performance potential. I came to the.controversial.conclusion that, while the problem loomed for all procedural languages, it was not a big issue for the vast majority of programmers. Two years later, I still think most programmers don't concern themselves overly with this issue, but I do think that's a bigger problem than I originally implied. Firstly, is the performance boost from writing code that can fully exploit all available cores worth the cost of the additional programming complexity? Right now, with quad-core processors that, at best, can make our programs four times faster, the answer is still no for many applications. But what happens in a few years, as the number of cores grows to 100 or even 1000? At this point, it becomes very hard to ignore the potential gains from exploiting concurrency. Possibly, I was optimistic to assume that, by the time we have 100-core processors, and most applications really needed to exploit them, some technology would be around to allow us to do so with relative ease. The ideal solution would be one that allows programmers to forget about the problem, in much the same way that garbage collection removed the need to worry too much about memory allocation. From all I can find on the topic, though, there is only a remote likelihood that we'll ever have a compiler that takes a program written in a single-threaded style and "auto-magically" converts it into an efficient, correct, multi-threaded program. At the same time, it seems clear that what is currently the most common solution, multi-threaded programming with shared memory, is unsustainable. As soon as a piece of state can be changed by a different thread of execution, the potential number of execution paths through your program grows exponentially with the number of threads. If you have two threads, each executing n instructions, then there are 2^n possible "interleavings" of those instructions. Of course, many of those interleavings will have identical behavior, but several won't. Not only does this make understanding how a program works an order of magnitude harder, but it will also result in irreproducible, non-deterministic, bugs. And of course, the problem will be many times worse when you have a hundred or a thousand threads. So what is the answer? All of the possible alternatives require a change in the way we write programs and, currently, seem to be plagued by performance issues. Software transactional memory (STM) applies the ideas of database transactions, and optimistic concurrency control, to memory. However, working out how to break down your program into sufficiently small transactions, so as to avoid contention issues, isn't easy. Another approach is concurrency with actors, where instead of having threads share memory, each thread runs in complete isolation, and communicates with others by passing messages. It simplifies concurrent programs but still has performance issues, if the threads need to operate on the same large piece of data. There are doubtless other possible solutions that I haven't mentioned, and I would love to know to what extent you, as a developer, are considering the problem of multi-core concurrency, what solution you currently favor, and why. Cheers, Tony.

    Read the article

  • Unity.ResolutionFailedException - Resolution of the dependency failed

    - by Anibas
    I have the following code: public static IEngine CreateEngine() { UnityContainer container = Unity.LoadUnityContainer(DefaultStrategiesContainerName); IEnumerable<IStrategy> strategies = container.ResolveAll<IStrategy>(); ITraderProvider provider = container.Resolve<ITraderProvider>(); return new Engine(provider, new List<IStrategy>(strategies)); } and the config: <unity> <typeAliases> <typeAlias alias="singleton" type="Microsoft.Practices.Unity.ContainerControlledLifetimeManager, Microsoft.Practices.Unity" /> <typeAlias alias="weakRef" type="Microsoft.Practices.Unity.ExternallyControlledLifetimeManager, Microsoft.Practices.Unity" /> <typeAlias alias="Strategy" type="ADTrader.Core.Contracts.IStrategy, ADTrader.Core" /> <typeAlias alias="Trader" type="ADTrader.Core.Contracts.ITraderProvider, ADTrader.Core" /> </typeAliases> <containers> <container name="strategies"> <types> <type type="Strategy" mapTo="ADTrader.Strategies.ThreeTurningStrategy, ADTrader.Strategies" name="1" /> <type type="Trader" mapTo="ADTrader.MbTradingProvider.MBTradingProvider, ADTrader.MbTradingProvider" /> </types> </container> </containers></unity> I am getting the following exception: Microsoft.Practices.Unity.ResolutionFailedException: Resolution of the dependency failed, type = "ADTrader.Core.Contracts.ITraderProvider", name = "". Exception message is: The current build operation (build key Build Key[ADTrader.MbTradingProvider.MBTradingProvider, null]) failed: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. (Strategy type BuildPlanStrategy, index 3) --- Microsoft.Practices.ObjectBuilder2.BuildFailedException: The current build operation (build key Build Key[ADTrader.MbTradingProvider.MBTradingProvider, null]) failed: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. (Strategy type BuildPlanStrategy, index 3) --- System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. at MBTCOMLib.MbtComMgrClass.EnableSplash(Boolean bEnable) at ADTrader.MbTradingProvider.MBTradingProvider..ctor() at BuildUp_ADTrader.MbTradingProvider.MBTradingProvider(IBuilderContext ) at Microsoft.Practices.ObjectBuilder2.DynamicMethodBuildPlan.BuildUp(IBuilderContext context) at Microsoft.Practices.ObjectBuilder2.BuildPlanStrategy.PreBuildUp(IBuilderContext context) at Microsoft.Practices.ObjectBuilder2.StrategyChain.ExecuteBuildUp(IBuilderContext context) --- End of inner exception stack trace --- at Microsoft.Practices.ObjectBuilder2.StrategyChain.ExecuteBuildUp(IBuilderContext context) at Microsoft.Practices.ObjectBuilder2.Builder.BuildUp(IReadWriteLocator locator, ILifetimeContainer lifetime, IPolicyList policies, IStrategyChain strategies, Object buildKey, Object existing) at Microsoft.Practices.Unity.UnityContainer.DoBuildUp(Type t, Object existing, String name) --- End of inner exception stack trace --- at Microsoft.Practices.Unity.UnityContainer.DoBuildUp(Type t, Object existing, String name) at Microsoft.Practices.Unity.UnityContainer.Resolve(Type t, String name) at Microsoft.Practices.Unity.UnityContainerBase.ResolveT at ADTrader.Engine.EngineFactory.CreateEngine() Any idea?

    Read the article

  • Using Image Source with big images in WPF

    - by xyzzer
    I am working on an application that allows users to manipulate multiple images by using ItemsControl. I started running some tests and found that the app has problems displaying some big images - ie. it did not work with the high resolution (21600x10800), 20MB images from http://earthobservatory.nasa.gov/Features/BlueMarble/BlueMarble_monthlies.php, though it displays the 6200x6200, 60MB Hubble telescope image from http://zebu.uoregon.edu/hudf/hudf.jpg just fine. The original solution just specified an Image control with a Source property pointing at a file on a disk (through a binding). With the Blue Marble file - the image would just not show up. Now this could be just a bug hidden somewhere deep in the funky MVVM + XAML implementation - the visual tree displayed by Snoop goes like: Window/Border/AdornerDecorator/ContentPresenter/Grid/Canvas/UserControl/Border/ContentPresenter/Grid/Grid/Grid/Grid/Border/Grid/ContentPresenter/UserControl/UserControl/Border/ContentPresenter/Grid/Grid/Grid/Grid/Viewbox/ContainerVisual/UserControl/Border/ContentPresenter/Grid/Grid/ItemsControl/Border/ItemsPresenter/Canvas/ContentPresenter/Grid/Grid/ContentPresenter/Image... Now debug this! WPF can be crazy like that... Anyway, it turned out that if I create a simple WPF application - the images load just fine. I tried finding out the root cause, but I don't want to spend weeks on it. I figured the right thing to do might be to use a converter to scale the images down - this is what I have done: ImagePath = @"F:\Astronomical\world.200402.3x21600x10800.jpg"; TargetWidth = 2800; TargetHeight = 1866; and <Image> <Image.Source> <MultiBinding Converter="{StaticResource imageResizingConverter}"> <MultiBinding.Bindings> <Binding Path="ImagePath"/> <Binding RelativeSource="{RelativeSource Self}" /> <Binding Path="TargetWidth"/> <Binding Path="TargetHeight"/> </MultiBinding.Bindings> </MultiBinding> </Image.Source> </Image> and public class ImageResizingConverter : MarkupExtension, IMultiValueConverter { public Image TargetImage { get; set; } public string SourcePath { get; set; } public int DecodeWidth { get; set; } public int DecodeHeight { get; set; } public object Convert(object[] values, Type targetType, object parameter, CultureInfo culture) { this.SourcePath = values[0].ToString(); this.TargetImage = (Image)values[1]; this.DecodeWidth = (int)values[2]; this.DecodeHeight = (int)values[3]; return DecodeImage(); } private BitmapImage DecodeImage() { BitmapImage bi = new BitmapImage(); bi.BeginInit(); bi.DecodePixelWidth = (int)DecodeWidth; bi.DecodePixelHeight = (int)DecodeHeight; bi.UriSource = new Uri(SourcePath); bi.EndInit(); return bi; } public object[] ConvertBack(object value, Type[] targetTypes, object parameter, CultureInfo culture) { throw new Exception("The method or operation is not implemented."); } public override object ProvideValue(IServiceProvider serviceProvider) { return this; } } Now this works fine, except for one "little" problem. When you just specify a file path in Image.Source - the application actually uses less memory and works faster than if you use BitmapImage.DecodePixelWidth. Plus with Image.Source if you have multiple Image controls that point to the same image - they only use as much memory as if only one image was loaded. With the BitmapImage.DecodePixelWidth solution - each additional Image control uses more memory and each of them uses more than when just specifying Image.Source. Perhaps WPF somehow caches these images in compressed form while if you specify the decoded dimensions - it feels like you get an uncompressed image in memory, plus it takes 6 times the time (perhaps without it the scaling is done on the GPU?), plus it feels like the original high resolution image also gets loaded and takes up space. If I just scale the image down, save it to a temporary file and then use Image.Source to point at the file - it will probably work, but it will be pretty slow and it will require handling cleanup of the temporary file. If I could detect an image that does not get loaded properly - maybe I could only scale it down if I need to, but Image.ImageFailed never gets triggered. Maybe it has something to do with the video memory and this app just using more of it with the deep visual tree, opacity masks etc. Actual question: How can I load big images as quickly as Image.Source option does it, without using more memory for additional copies and additional memory for the scaled down image if I only need them at a certain resolution lower than original? Also, I don't want to keep them in memory if no Image control is using them anymore.

    Read the article

  • JPA Annotations in Android

    - by dasmaze
    Hello, We have a project that uses JPA/Hibernate on the server side, the mapped entity classes are in their own Library-Project and use Annotations to be mapped to the database. I want to use these classes in an Android-Project - is there any way to ignore the annotations within Android, to use these classes as standard POJOs?

    Read the article

  • "Chunked" MemoryStream

    - by Karol Kolenda
    I'm looking for the implementation of MemoryStream which does not allocate memory as one big block, but rather a collection of chunks. I want to store a few GB of data in memory (64 bit) and avoid limitation of memory fragmentation.

    Read the article

  • using securestring for a sql connection

    - by Rick
    Hi, I want to use a SecureString to hold a connection string for a database. But as soon as I set the SqlConnection object's ConnectionString property to the value of the securestring surely it will become visible to any other application that is able to read my application's memory? I have made the following assumptions: a) I am not able to instantiate a SqlConnection object outside of managed memory b) any string within managed memory can be read by an application such as Hawkeye

    Read the article

  • Measure heap used by each object in Java

    - by Fazal
    Can some suggest a good a free memory profiling tool which will show memory being used by each object in the heap separately. We are trying to profile our application and I used jconsole but its gives me total memory usage only. I am using Eclipse and OC4J

    Read the article

  • SSRS 2005 - Usability analysis - Is SSRS a good option for this scenario?

    - by Sach
    How practical is it to consider SSRS 2005 or SSRS 2008 as a reporting solution for a report that has to show reports with millions of records (records vary from 3 to 10 million)? Is there any threshold on the size of report in SSRS? How do I know that for a huge report, wheather SSRS will consume the whole memory and start paging the operations to disk or it will give a memory leak error? Even if I keep on increasing the memory how can I be sure that certain memory will be sufficient for such huge reports for the report server? All the above questions are haunting me because I have a dedicated report server with a decent hardware and OS configuration (8 processors, 8GB RAM, 64 bit OS and 64 bit SQL Server 2005). Still my report with around 2 million records is taking more than 6 minutes and going from one page to another takes 3 minutes!!! My datasource is on separate server and when I execute only the stored proc there, it returns the results in less than 2 minutes.

    Read the article

  • android compile error: could not reserve enough space for object heap

    - by moonlightcheese
    I'm getting this error during compilation: Error occurred during initialization of VM Could not create the Java virtual machine. Could not reserve enough space for object heap What's worse, the error occurs intermittently. Sometimes it happens, sometimes it doesn't. It seems to be dependent on the amount of code in the application. If I get rid of some variables or drop some imported libraries, it will compile. Then when I add more to it, I get the error again. I've included the following sources into the application in the [project_root]/src/ directory: org.apache.httpclient (I've stripped all references to log4j from the sources, so don't need it) org.apache.codec (as a dependency) org.apache.httpcore (dependency of httpclient) and my own activity code consisting of nothing more than an instance of HttpClient. I know this has something to do with the amount of memory necessary during compile time or some compiler options, and I'm not really stressing my system while i'm coding. I've got 2GB of memory on this Core Duo laptop and windows reports only 860MB page file usage (haven't used any other memory tools. I should have plenty of memory and processing power for this... and I'm only compiling some common http libs... total of 406 source files. What gives? edit (4/30/2010-18:24): Just compiled some code where I got the above stated error. I closed some web browser windows and recompiled the same exact code with no edits and it compiled with no issue. this is definitely a compiler issue related to memory usage. Any help would be great.... because I have no idea where to go from here. Android API Level: 5 Android SDK rel 5 JDK version: 1.6.0_12 Sorry I had to repost this question because regardless of whether I use the native HttpClient class in the Android SDK or my custom version downloaded from apache, the error still occurs.

    Read the article

  • Managed language for scientific computing software

    - by heisen
    Scientific computing is algorithm intensive and can also be data intensive. It often needs to use a lot of memory to run analysis and release it before continuing with the next. Sometime it also uses memory pool to recycle memory for each analysis. Managed language is interesting here because it can allow the developer to concentrate on the application logic. Since it might need to deal with huge dataset, performance is important too. But how can we control memory and performance with managed language?

    Read the article

  • Delphi: Alternative to using Assign/ReadLn for text file reading

    - by Ian Boyd
    i want to process a text file line by line. In the olden days i loaded the file into a StringList: slFile := TStringList.Create(); slFile.LoadFromFile(filename); for i := 0 to slFile.Count-1 do begin oneLine := slFile.Strings[i]; //process the line end; Problem with that is once the file gets to be a few hundred megabytes, i have to allocate a huge chunk of memory; when really i only need enough memory to hold one line at a time. (Plus, you can't really indicate progress when you the system is locked up loading the file in step 1). The i tried using the native, and recommended, file I/O routines provided by Delphi: var f: TextFile; begin Assign(filename, f); while ReadLn(f, oneLine) do begin //process the line end; Problem withAssign is that there is no option to read the file without locking (i.e. fmShareDenyNone). The former stringlist example doesn't support no-lock either, unless you change it to LoadFromStream: slFile := TStringList.Create; stream := TFileStream.Create(filename, fmOpenRead or fmShareDenyNone); slFile.LoadFromStream(stream); stream.Free; for i := 0 to slFile.Count-1 do begin oneLine := slFile.Strings[i]; //process the line end; So now even though i've gained no locks being held, i'm back to loading the entire file into memory. Is there some alternative to Assign/ReadLn, where i can read a file line-by-line, without taking a sharing lock? i'd rather not get directly into Win32 CreateFile/ReadFile, and having to deal with allocating buffers and detecting CR, LF, CRLF's. i thought about memory mapped files, but there's the difficulty if the entire file doesn't fit (map) into virtual memory, and having to maps views (pieces) of the file at a time. Starts to get ugly. i just want Assign with fmShareDenyNone!

    Read the article

  • Help with Perl persistent data storage using Data::Dumper

    - by stephenmm
    I have been trying to figure this out for way to long tonight. I have googled it to death and none of the examples or my hacks of the examples are getting it done. It seems like this should be pretty easy but I just cannot get it. Here is the code: #!/usr/bin/perl -w use strict; use Data::Dumper; my $complex_variable = {}; my $MEMORY = "$ENV{HOME}/data/memory-file"; $complex_variable->{ 'key' } = 'value'; $complex_variable->{ 'key1' } = 'value1'; $complex_variable->{ 'key2' } = 'value2'; $complex_variable->{ 'key3' } = 'value3'; print Dumper($complex_variable)."TEST001\n"; open M, ">$MEMORY" or die; print M Data::Dumper->Dump([$complex_variable], ['$complex_variable']); close M; $complex_variable = {}; print Dumper($complex_variable)."TEST002\n"; # Then later to restore the value, it's simply: do $MEMORY; #eval $MEMORY; print Dumper($complex_variable)."TEST003\n"; And here is my output: $VAR1 = { 'key2' => 'value2', 'key1' => 'value1', 'key3' => 'value3', 'key' => 'value' }; TEST001 $VAR1 = {}; TEST002 $VAR1 = {}; TEST003 Everything that I read says that the TEST003 output should look identical to the TEST001 output which is exactly what I am trying to achieve. What am I missing here? Should I be "do"ing differently or should I be "eval"ing instead and if so how? Thanks for any help...

    Read the article

  • What happens after a packet is captured?

    - by Rayne
    Hi all, I've been reading about what happens after packets are captured by NICs, and the more I read, the more I'm confused. Firstly, I've read that traditionally, after a packet is captured by the NIC, it gets copied to a block of memory in the kernel space, then to the user space for whatever application that then works on the packet data. Then I read about DMA, where the NIC directly copies the packet into memory, bypassing the CPU. So is the NIC - kernel memory - User space memory flow still valid? Also, do most NIC (e.g. Myricom) use DMA to improve packet capture rates? Secondly, does RSS (Receive Side Scaling) work similarly in both Windows and Linux systems? I can only find detailed explanations on how RSS works in MSDN articles, where they talk about how RSS (and MSI-X) works on Windows Server 2008. But the same concept of RSS and MSI-X should still apply for linux systems, right? Thank you. Regards, Rayne

    Read the article

  • MBR Booting from DOS

    - by eflukx
    For a project I would like to invoke the MBR on the first harddisk directly from DOS. I've written a small assembler program that loads the MBR in memory at 0:7c00h an does a far jump to it. I've put my util on a bootable floppy. The disk (HD0, 0x80) i'm trying to boot has a TrueCrypt boot loader on it. It shows up the TrueCrypt screen, but after typing in the password it crashes the system. When I run my little utlility (w00t.com) on a normal WinXP machine it seams to crash immedealty. Apparently I'm forgetting some crucial stuff the BIOS normally does, my guess is it's something trivial. Can someone with better bare-metal DOS and BIOS experience help me out? Heres my code: .MODEL tiny .386 _TEXT SEGMENT USE16 INCLUDE BootDefs.i ORG 100h start: ; http://vxheavens.com/lib/vbw05.html ; Before DOS has booted the BIOS stores the amount of usable lower memory ; in a word located at 0:413h in memory. We going to erase this value because ; we have booted dos before loading the bootsector, and dos is fat (and ugly). ; fake free memory ;push ds ;push 0 ;pop ds ;mov ax, TC_BOOT_LOADER_SEGMENT / 1024 * 16 + TC_BOOT_MEMORY_REQUIRED ;mov word ptr ds:[413h], ax ;ax = memory in K ;pop ds ;lea si, memory_patched_msg ;call print ;mov ax, cs mov ax, 0 mov es, ax ; read first sector to es:7c00h (== cs:7c00) mov dl, 80h mov cl, 1 mov al, 1 mov bx, 7c00h ;load sector to es:bx call read_sectors lea si, mbr_loaded_msg call print lea si, jmp_to_mbr_msg call print ;Set BIOS default values in environment cli mov dl, 80h ;(drive C) xor ax, ax mov ds, ax mov es, ax mov ss, ax mov sp, 0ffffh sti push es push 7c00h retf ;Jump to MBR code at 0:7c00h ; Print string print: xor bx, bx mov ah, 0eh cld @@: lodsb test al, al jz print_end int 10h jmp @B print_end: ret ; Read sectors of the first cylinder read_sectors: mov ch, 0 ; Cylinder mov dh, 0 ; Head ; DL = drive number passed from BIOS mov ah, 2 int 13h jnc read_ok lea si, disk_error_msg call print read_ok: ret memory_patched_msg db 'Memory patched', 13, 10, 7, 0 mbr_loaded_msg db 'MBR loaded', 13, 10, 7, 0 jmp_to_mbr_msg db 'Jumping to MBR code', 13, 10, 7, 0 disk_error_msg db 'Disk error', 13, 10, 7, 0 _TEXT ENDS END start

    Read the article

  • Singleton pattern and broken double checked locking in real world java application

    - by saugata
    I was reading the article Double-checked locking and the Singleton pattern, on how double checked locking is broken, and some related questions here on stackoverflow. I have used this pattern/idiom several times without any issues. Since I have been using Java 5, my first thought was that this has been rectified in Java 5 memory model. However the article says This article refers to the Java Memory Model before it was revised for Java 5.0; statements about memory ordering may no longer be correct. However, the double-checked locking idiom is still broken under the new memory model. I'm wondering if anyone has actually run into this problem in any application and under what conditions.

    Read the article

  • Performance required to improve Windows Experience Index?

    - by Ian Boyd
    Is there a guide on the metrics required to obtain a certain Windows Experience Index? A Microsoft guy said in January 2009: On the matter of transparency, it is indeed our plan to disclose in great detail how the scores are calculated, what the tests attempt to measure, why, and how they map to realistic scenarios and usage patterns. Has that amount of transparency happened? Is there a technet article somewhere? If my score was limited by my Memory subscore of 5.9. A nieve person would suggest: Buy a faster RAM Which is wrong of course. From the Windows help: If your computer has a 64-bit central processing unit (CPU) and 4 gigabytes (GB) or less random access memory (RAM), then the Memory (RAM) subscore for your computer will have a maximum of 5.9. You can buy the fastest, overclocked, liquid-cooled, DDR5 RAM on the planet; you'll still have a maximum Memory subscore of 5.9. So in general the knee-jerk advice "buy better stuff" is not helpful. What i am looking for is attributes required to achieve a certain score, or move beyond a current limitation. The information i've been able to compile so far, chiefly from 3 Windows blog entries, and an article: Memory subscore Score Conditions ======= ================================ 1.0 < 256 MB 2.0 < 500 MB 2.9 <= 512 MB 3.5 < 704 MB 3.9 < 944 MB 4.5 <= 1.5 GB 5.9 < 4.0GB-64MB on a 64-bit OS Windows Vista highest score 7.9 Windows 7 highest score Graphics Subscore Score Conditions ======= ====================== 1.0 doesn't support DX9 1.9 doesn't support WDDM 4.9 does not support Pixel Shader 3.0 5.9 doesn't support DX10 or WDDM1.1 Windows Vista highest score 7.9 Windows 7 highest score Gaming graphics subscore Score Result ======= ============================= 1.0 doesn't support D3D 2.0 supports D3D9, DX9 and WDDM 5.9 doesn't support DX10 or WDDM1.1 Windows Vista highest score 6.0-6.9 good framerates (e.g. 40-50fps) at normal resoltuions (e.g. 1280x1024) 7.0-7.9 even higher framerates at even higher resolutions 7.9 Windows 7 highest score Processor subscore Score Conditions ======= ========================================================================== 5.9 Windows Vista highest score 6.0-6.9 many quad core processors will be able to score in the high 6 low 7 ranges 7.0+ many quad core processors will be able to score in the high 6 low 7 ranges 7.9 8-core systems will be able to approach 8.9 Windows 7 highest score Primary hard disk subscore (note) Score Conditions ======= ======================================== 1.9 Limit for pathological drives that stop responding when pending writes 2.0 Limit for pathological drives that stop responding when pending writes 2.9 Limit for pathological drives that stop responding when pending writes 3.0 Limit for pathological drives that stop responding when pending writes 5.9 highest you're likely to see without SSD Windows Vista highest score 7.9 Windows 7 highest score Bonus Chatter You can find your WEI detailed test results in: C:\Windows\Performance\WinSAT\DataStore e.g. 2011-11-06 01.00.19.482 Disk.Assessment (Recent).WinSAT.xml <WinSAT> <WinSPR> <DiskScore>5.9</DiskScore> </WinSPR> <Metrics> <DiskMetrics> <AvgThroughput units="MB/s" score="6.4" ioSize="65536" kind="Sequential Read">89.95188</AvgThroughput> <AvgThroughput units="MB/s" score="4.0" ioSize="16384" kind="Random Read">1.58000</AvgThroughput> <Responsiveness Reason="UnableToAssess" Kind="Cap">TRUE</Responsiveness> </DiskMetrics> </Metrics> </WinSAT> Pre-emptive snarky comment: "WEI is useless, it has no relation to reality" Fine, how do i increase my hard-drive's random I/O throughput? Update - Amount of memory limits rating Some people don't believe Microsoft's statement that having less than 4GB of RAM on a 64-bit edition of Windows doesn't limit the rating to 5.9: And from xxx.Formal.Assessment (Recent).WinSAT.xml: <WinSPR> <LimitsApplied> <MemoryScore> <LimitApplied Friendly="Physical memory available to the OS is less than 4.0GB-64MB on a 64-bit OS : limit mem score to 5.9" Relation="LT">4227858432</LimitApplied> </MemoryScore> </LimitsApplied> </WinSPR> References Windows Vista Team Blog: Windows Experience Index: An In-Depth Look Understand and improve your computer's performance in Windows Vista Engineering Windows 7 Blog: Engineering the Windows 7 “Windows Experience Index”

    Read the article

  • Solving problems involving more complex data structures with CUDA

    - by Nils
    So I read a bit about CUDA and GPU programming. I noticed a few things such that access to global memory is slow (therefore shared memory should be used) and that the execution path of threads in a warp should not diverge. I also looked at the (dense) matrix multiplication example, described in the programmers manual and the nbody problem. And the trick with the implementation seems to be the same: Arrange the calculation in a grid (which it already is in case of the matrix mul); then subdivide the grid into smaller tiles; fetch the tiles into shared memory and let the threads calculate as long as possible, until it needs to reload data from the global memory into shared memory. In case of the nbody problem the calculation for each body-body interaction is exactly the same (page 682): bodyBodyInteraction(float4 bi, float4 bj, float3 ai) It takes two bodies and an acceleration vectors. The body vector has four components it's position and the weight. When reading the paper, the calculation is understood easily. But what is if we have a more complex object, with a dynamic data structure? For now just assume that we have an object (similar to the body object presented in the paper) which has a list of other objects attached and the number of objects attached is different in each thread. How could I implement that without having the execution paths of the threads to diverge? I'm also looking for literature which explains how different algorithms involving more complex data structures can be effectively implemented in CUDA.

    Read the article

  • how JVM arguments can be passed to apps started through java webstart in macOS.

    - by siva
    Hi, We have a application which is triggered from browser. This application consumes around 800 mb of memory. This works perfectly when invoked from any browsers in windows OS. The same application when triggered from MacOS throws an out of memory exception which occurs when the application is short of memory. Is there any way to increase the memory allocated for apps running in mac os environment. Also please let me know how JVM arguments can be passed to apps started through java webstart in macOS. Thanks in advance Regards, Siva

    Read the article

  • Cache consistency & spawning a thread

    - by Dave Keck
    Background I've been reading through various books and articles to learn about processor caches, cache consistency, and memory barriers in the context of concurrent execution. So far though, I have been unable to determine whether a common coding practice of mine is safe in the strictest sense. Assumptions The following pseudo-code is executed on a two-processor machine: int sharedVar = 0; myThread() { print(sharedVar); } main() { sharedVar = 1; spawnThread(myThread); sleep(-1); } main() executes on processor 1 (P1), while myThread() executes on P2. Initially, sharedVar exists in the caches of both P1 and P2 with the initial value of 0 (due to some "warm-up code" that isn't shown above.) Question Strictly speaking – preferably without assuming any particular CPU – is myThread() guaranteed to print 1? With my newfound knowledge of processor caches, it seems entirely possible that at the time of the print() statement, P2 may not have received the invalidation request for sharedVar caused by P1's assignment in main(). Therefore, it seems possible that myThread() could print 0. References These are the related articles and books I've been reading. (It wouldn't allow me to format these as links because I'm a new user - sorry.) Shared Memory Consistency Models: A Tutorial hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf Memory Barriers: a Hardware View for Software Hackers rdrop.com/users/paulmck/scalability/paper/whymb.2009.04.05a.pdf Linux Kernel Memory Barriers kernel.org/doc/Documentation/memory-barriers.txt Computer Architecture: A Quantitative Approach amazon.com/Computer-Architecture-Quantitative-Approach-4th/dp/0123704901/ref=dp_ob_title_bk

    Read the article

< Previous Page | 169 170 171 172 173 174 175 176 177 178 179 180  | Next Page >