Search Results

Search found 955 results on 39 pages for 'gpu accelration'.

Page 12/39 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

ATI Catalyst driver 12.8 is not using hardware acceleration on Precise

- by Jack Wright

I've been using Ubuntu and ATI Catalyst for years. On my clean install of Ubuntu 12.04 I've noticed that Catalyst 12.6 and then 12.8 are not actually using my HD5750 GPU for hardware acceleration - high CPU usage, zero GPU load. Everything installed correctly with no hassles, fglrxinfo and vainfo are correct as per this HowTo for Precise. I have an Ubuntu 10.04 with Catalyst 12.6 installation on the same hardware which does use the GPU - low CPU usage, high GPU load when transcodeing video files or playing video content. The VA-API drivers are not installed on the 10.04 build. They are not mentioned in this HowTo for Lucid. fgl_glxgears frame rates on Precise are a fifth of the rates on Lucid. LUCID jw@Kworld:~$ fgl_glxgears Using GLX_SGIX_pbuffer 16867 frames in 5.0 seconds = 3373.400 FPS 12523 frames in 5.0 seconds = 2504.600 FPS 13763 frames in 5.0 seconds = 2752.600 FPS PRECISE jw@NewWorld12:~$ fgl_glxgears Using GLX_SGIX_pbuffer 12905 frames in 5.0 seconds = 2581.000 FPS 3230 frames in 5.0 seconds = 646.000 FPS 517 frames in 5.0 seconds = 103.400 FPS 518 frames in 5.0 seconds = 103.600 FPS 6489 frames in 5.0 seconds = 1297.800 FPS This is glxgears running in fullscreen. In Lucid (10.04) I can't see the gears, they are spinning so fast, but in Precise (12.04) they are really sluggish. Has anyone else noticed a problem like this? Cheers, Jack.

Read the article
Get started with C++ AMP

- by Daniel Moth

With the imminent release of Visual Studio 2012, even if you do not classify yourself as a C++ developer, C++ AMP is something you should learn so you can understand how to speed up your loops by offloading to the GPU the computation performed in the loop (assuming you have large number of iterations/data). We have many C# customers who are using C++ AMP through pinvoke, and of course many more directly from C++. So regardless of your programming language, I hope you'll find helpful these short videos that help you get started with C++ AMP C++ AMP core API introduction... from scratch Tiling Introduction - C++ AMP Matrix Multiplication with C++ AMP GPU debugging in Visual Studio 2012 In particular the work we have done for parallel and GPU debugging in Visual Studio 2012 is market leading, so check it out! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
DirectCompute Lectures

- by Daniel Moth

Previously I shared resources to get you started with DirectCompute, for taking advantage of GPGPUs in an a way that doesn't tie you to a hardware vendor (e.g. nvidia, amd). I just stumbled upon and had to share a lecture series on channel9 on DirectCompute! Here are direct links to the episodes that are up there now: DirectCompute Expert Roundtable Discussion DirectCompute Lecture Series 101- Introduction to DirectCompute DirectCompute Lecture Series 110- Memory Patterns DirectCompute Lecture Series 120- Basics of DirectCompute Application Development DirectCompute Lecture Series 210- GPU Optimizations and Performance DirectCompute Lecture Series 230- GPU Accelerated Physics DirectCompute Lecture Series 250- Integration with the Graphics Pipeline Having watched these I recommend them all, but if you only want to watch a few, I suggest #2, #3, #4 and #5. Also, you should download the "WMV (High)" so you can see the code clearly and be able to Ctrl+Shift+G for fast playback… TIP: To subscribe to channel9 GPU content, use this RSS feed. Comments about this post welcome at the original blog.

Read the article
How can a computer render a CLI/console along with a GUI?

- by Nathaniel Bennett

I'm confused when looking into graphics - specifically with operating systems. I mean, how can a computer render a CLI/console along with a GUI? GUI's are completely different from text. And how can we have GUI windows that display text interfaces, ie how can we have CLI in modern Graphics Operating system - that's what I'm mainly trying to grip on to. How does graphics get rendered to display? Is there some sort of memory address that a GPU access which holds all pixel data, and there system's within OS's that gather the pixel position of windows and widgets, along with the Z Index and rasterize them to that memory address, which then the GPU loads to the screen? How about the CLI's integrated with Graphics? How does the OS tell the GPU that a certain part of the screen wants to display text while the rest wants to display pixel data?

Read the article
How Did we get from CLI to Graphics?

- by Nathaniel Bennett

I'm confused when looking into graphics - specifically with operating systems. I mean, how can a computer render a CLI/console along with a GUI. GUI's are completely different from Text. and How Can we have GUI windows that Display Text interfaces, ie how can we have CLI in modern Graphics Operating system - that's what I'm mainly trying to grip on to. How Do Graphic's get rendered to display? is there some sort of memory address that a GPU access which holds all pixel data, and there system's within OS's that Gather the pixel position of Windows and Widgets, along with the Z Index and rasterize them to that memory address, which then the GPU loads to the screen? How About the CLI's integrated with Graphics? how does the OS Tell the GPU that a certain part of the screen wants to display text while the rest, whats to display pixel data? it's all very confusing. Shed some light in it, will ya?

Read the article
Can I automatically make my Nvidia card's fan quieter?

- by Salim Fadhley

I have a machine with an Nvidia graphics card. Unfortunately the GPU fan is very loud. It's very annoying at times. We never use this machine for intense 3d work - that GPU is probably not working very hard at all. I'm pretty sure I can run it at a much lower fan-speed without causing any problems. The nvclock utility can be used to manually adjust the fan-speed of my Nvidia graphics card. I'd like to call this utility automatically when the machine boots up. Is there some kind of system service which I can use to automatically apply this kind of system-wide configuration? Even better, is there a system monitoring service which can poll the GPU temperature and adjust the various system fan-speeds accordingly? Thanks!

Read the article
Do I lose/gain performance for discarding pixels even if I don't use depth testing?

- by Gajoo

When I first searched for discard instruction, I've found experts saying using discard will result in performance drain. They said discarding pixels will break GPU's ability to use zBuffer properly because GPU have to first run Fragment shader for both objects to check if the one nearer to camera is discarded or not. For a 2D game I'm currently working on, I've disabled both depth-test and depth-write. I'm drawing all objects sorted by their depth and that's all, no need for GPU to do fancy things. now I'm wondering is it still bad if I discard pixels in my fragment shader?

Read the article
How to install nvidia optimus driver on ubuntu 12.10?

- by Adam

I have followed http://ubuntuportal.com/2012/01/bumblebee-3-0-tumblewed-nvidia-optimus-gpu-switching-for-linux-has-been-released-how-to-install-bumblebee-3-0-on-ubuntu.html this guide to install nvidia driver on my Dell Inspiron N5110 notebook (Intel HD Graphics 3000 + NVIDIA GeForce GT525M), but i always get some error while i want to start any program with the optirun command. Terminal says: adam@Adam-LT:~$ optirun firefox [ 1482.559417] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver [ 1482.559517] [ERROR]Aborting because fallback start is disabled. My laptop cooler always cools the laptop, which means that nvidia card is consuming power in the background. (Terminal sometimes says something daemon-server is not running.) Can you give me some solution for this?

Read the article
Any help please, Not reconizing my hard drive

- by Imperial0007

If any1 can help would be much appreciated.. I recently build my own PC would like to use for gaming etc.. (With Ubuntu of course as my OS) Installed Ubuntu via Flash Drive Everything is connected. Purchased a Graphic card/GPU GPU info;(XFX Double D R9 270 925MHz Boost 2GB DDR5 DP HDMI 2XDVI Graphic card) Now my problem is when i put the CD to install the GPU Drivers it would not recognize the HDD So why is the hard drive not being recognized HDD info;(ADATA USA Premier pro SP600 32GB SATA) I am able to enter the BIOS menu (If that helps) Any help would be much appreciated & Thanks in advanced

Read the article
Give a session on C++ AMP – here is how

- by Daniel Moth

Ever since presenting on C++ AMP at the AMD Fusion conference in June, then the Gamefest conference in August, and the BUILD conference in September, I've had numerous requests about my material from folks that want to re-deliver the same session. The C++ AMP session I put together has evolved over the 3 presentations to its final form that I used at BUILD, so that is the one I recommend you base yours on. Please get the slides and the recording from channel9 (I'll refer to slide numbers below). This is how I've been presenting the C++ AMP session: Context (slide 3, 04:18-08:18) Start with a demo, on my dual-GPU machine. I've been using the N-Body sample (for VS 11 Developer Preview). (slide 4) Use an nvidia slide that has additional examples of performance improvements that customers enjoy with heterogeneous computing. (slide 5) Talk a bit about the differences today between CPU and GPU hardware, leading to the fact that these will continue to co-exist and that GPUs are great for data parallel algorithms, but not much else today. One is a jack of all trades and the other is a number cruncher. (slide 6) Use the APU example from amd, as one indication that the hardware space is still in motion, emphasizing that the C++ AMP solution is a data parallel API, not a GPU API. It has a future proof design for hardware we have yet to see. (slide 7) Provide more meta-data, as blogged about when I first introduced C++ AMP. Code (slide 9-11) Introduce C++ AMP coding with a simplistic array-addition algorithm – the slides speak for themselves. (slide 12-13) index<N>, extent<N>, and grid<N>. (Slide 14-16) array<T,N>, array_view<T,N> and comparison between them. (Slide 17) parallel_for_each. (slide 18, 21) restrict. (slide 19-20) actual restrictions of restrict(direct3d) – the slides speak for themselves. (slide 22) bring it altogether with a matrix multiplication example. (slide 23-24) accelerator, and accelerator_view. (slide 26-29) Introduce tiling incl. tiled matrix multiplication [tiling probably deserves a whole session instead of 6 minutes!]. IDE (slide 34,37) Briefly touch on the concurrency visualizer. It supports GPU profiling, but enhancements specific to C++ AMP we hope will come at the Beta timeframe, which is when I'll be spending more time talking about it. (slide 35-36, 51:54-59:16) Demonstrate the GPU debugging experience in VS 11. Summary (slide 39) Re-iterate some of the points of slide 7, and add the point that the C++ AMP spec will be open for other compiler vendors to implement, even on other platforms (in fact, Microsoft is actively working on that). (slide 40) Links to content – see slide – including where all your questions should go: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads. "But I don't have time for a full blown session, I only need 2 (or just 1, or 3) C++ AMP slides to use in my session on related topic X" If all you want is a small number of slides, you can take some from the session above and customize them. But because I am so nice, I have created some slides for you, including talking points in the notes section. Download them here. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Give a session on C++ AMP – here is how

- by Daniel Moth

Ever since presenting on C++ AMP at the AMD Fusion conference in June, then the Gamefest conference in August, and the BUILD conference in September, I've had numerous requests about my material from folks that want to re-deliver the same session. The C++ AMP session I put together has evolved over the 3 presentations to its final form that I used at BUILD, so that is the one I recommend you base yours on. Please get the slides and the recording from channel9 (I'll refer to slide numbers below). This is how I've been presenting the C++ AMP session: Context (slide 3, 04:18-08:18) Start with a demo, on my dual-GPU machine. I've been using the N-Body sample (for VS 11 Developer Preview). (slide 4) Use an nvidia slide that has additional examples of performance improvements that customers enjoy with heterogeneous computing. (slide 5) Talk a bit about the differences today between CPU and GPU hardware, leading to the fact that these will continue to co-exist and that GPUs are great for data parallel algorithms, but not much else today. One is a jack of all trades and the other is a number cruncher. (slide 6) Use the APU example from amd, as one indication that the hardware space is still in motion, emphasizing that the C++ AMP solution is a data parallel API, not a GPU API. It has a future proof design for hardware we have yet to see. (slide 7) Provide more meta-data, as blogged about when I first introduced C++ AMP. Code (slide 9-11) Introduce C++ AMP coding with a simplistic array-addition algorithm – the slides speak for themselves. (slide 12-13) index<N>, extent<N>, and grid<N>. (Slide 14-16) array<T,N>, array_view<T,N> and comparison between them. (Slide 17) parallel_for_each. (slide 18, 21) restrict. (slide 19-20) actual restrictions of restrict(direct3d) – the slides speak for themselves. (slide 22) bring it altogether with a matrix multiplication example. (slide 23-24) accelerator, and accelerator_view. (slide 26-29) Introduce tiling incl. tiled matrix multiplication [tiling probably deserves a whole session instead of 6 minutes!]. IDE (slide 34,37) Briefly touch on the concurrency visualizer. It supports GPU profiling, but enhancements specific to C++ AMP we hope will come at the Beta timeframe, which is when I'll be spending more time talking about it. (slide 35-36, 51:54-59:16) Demonstrate the GPU debugging experience in VS 11. Summary (slide 39) Re-iterate some of the points of slide 7, and add the point that the C++ AMP spec will be open for other compiler vendors to implement, even on other platforms (in fact, Microsoft is actively working on that). (slide 40) Links to content – see slide – including where all your questions should go: http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads. "But I don't have time for a full blown session, I only need 2 (or just 1, or 3) C++ AMP slides to use in my session on related topic X" If all you want is a small number of slides, you can take some from the session above and customize them. But because I am so nice, I have created some slides for you, including talking points in the notes section. Download them here. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
CUDA not working in 64 bit windows 7

- by Programmer

I have cuda toolkit 4.0 installed in a 64 bit windows 7. I try building my cuda code, #include<iostream> #include"cuda_runtime.h" #include"cuda.h" __global__ void kernel(){ } int main(){ kernel<<<1,1>>>(); int c = 0; cudaGetDeviceCount(&c); cudaDeviceProp prop; cudaGetDeviceProperties(&prop, 0); std::cout<<"the name is"<<prop.name; std::cout<<"Hello World!"<<c<<std::endl; system("pause"); return 0; } but operation fails. Below is the build log: Build Log Rebuild started: Project: god, Configuration: Debug|Win32 Command Lines Creating temporary file "c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\BAT0000482007500.bat" with contents [ @echo off echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/sample.cu.obj" sample.cu "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/sample.cu.obj" "c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\sample.cu" if errorlevel 1 goto VCReportError goto VCEnd :VCReportError echo Project : error PRJ0019: A tool returned an error code from "Compiling with CUDA Build Rule..." exit 1 :VCEnd ] Creating command line """c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\BAT0000482007500.bat""" Creating temporary file "c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\RSP0000492007500.rsp" with contents [ /OUT:"C:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\Debug\god.exe" /LIBPATH:"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\lib\x64" /MANIFEST /MANIFESTFILE:"Debug\god.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\Debug\god.pdb" /DYNAMICBASE /NXCOMPAT /MACHINE:X86 cudart.lib cuda.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib ".\Debug\sample.cu.obj" ] Creating command line "link.exe @"c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\RSP0000492007500.rsp" /NOLOGO /ERRORREPORT:PROMPT" Output Window Compiling with CUDA Build Rule... "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/sample.cu.obj" sample.cu sample.cu sample.cu.obj : error LNK2019: unresolved external symbol _cudaLaunch@4 referenced in function "enum cudaError cdecl cudaLaunch(char *)" (??$cudaLaunch@D@@YA?AW4cudaError@@PAD@Z) sample.cu.obj : error LNK2019: unresolved external symbol ___cudaRegisterFunction@40 referenced in function "void __cdecl _sti_cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1(void)" (?sti__cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1@@YAXXZ) sample.cu.obj : error LNK2019: unresolved external symbol _cudaRegisterFatBinary@4 referenced in function "void __cdecl _sti_cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1(void)" (?sti__cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1@@YAXXZ) sample.cu.obj : error LNK2019: unresolved external symbol _cudaGetDeviceProperties@8 referenced in function _main sample.cu.obj : error LNK2019: unresolved external symbol _cudaGetDeviceCount@4 referenced in function _main sample.cu.obj : error LNK2019: unresolved external symbol _cudaConfigureCall@32 referenced in function _main C:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\Debug\god.exe : fatal error LNK1120: 7 unresolved externals Results Build log was saved at "file://c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\BuildLog.htm" god - 8 error(s), 0 warning(s) I will be highly obliged if someone could help me. Thanks

Read the article
What strategies are efficient to handle concurrent reads on heterogeneous multi-core architectures?

- by fabrizioM

I am tackling the challenge of using both the capabilities of a 8 core machine and a high-end GPU (Tesla 10). I have one big input file, one thread for each core, and one for the the GPU handling. The Gpu thread, to be efficient, needs a big number of lines from the input, while the Cpu thread needs only one line to proceed (storing multiple lines in a temp buffer was slower). The file doesn't need to be read sequentially. I am using boost. My strategy is to have a mutex on the input stream and each thread locks - unlocks it. This is not optimal because the gpu thread should have a higher precedence when locking the mutex, being the fastest and the most demanding one. I can come up with different solutions but before rush into implementation I would like to have some guidelines. What approach do you use / recommend ?

Read the article
Which one has a faster runtime performance: WPF or Winforms?

- by Joan Venge

I know WPF is more complex an flexible so could be thought to do more calculations. But since the rendering is done on the GPU, wouldn't it be faster than Winforms for the same application (functionally and visually)? I mean when you are not running any games or heavy 3d rendering, the GPU isn't doing heavy work, right? Whereas the CPU is always busy. Is this a valid assumption or is the GPU utilization of WPF a very minor operation in its pipeline?

Read the article
Setting up two screens in Xorg

- by viraptor

I'be got two Nvidia cards, but Xorg activates only one of them. The following config is based on the nvidia configurator output: Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" 0 0 Screen 1 "Screen1" LeftOf "Screen0" InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" Option "Xinerama" "0" EndSection Section "Module" Load "dbe" Load "extmod" Load "type1" Load "freetype" Load "glx" EndSection Section "InputDevice" Identifier "Mouse0" Driver "mouse" Option "Protocol" "auto" Option "Device" "/dev/psaux" Option "Emulate3Buttons" "no" Option "ZAxisMapping" "4 5" EndSection Section "InputDevice" Identifier "Keyboard0" Driver "keyboard" EndSection Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "HP LE2201w" HorizSync 24.0 - 83.0 VertRefresh 50.0 - 76.0 Option "DPMS" EndSection Section "Monitor" Identifier "Monitor1" VendorName "Unknown" ModelName "Acer AL2017" HorizSync 30.0 - 82.0 VertRefresh 56.0 - 76.0 Option "DPMS" EndSection Section "Device" Identifier "Card0" Driver "nvidia" VendorName "nVidia Corporation" BoardName "GeForce 6100 nForce 405" BusID "PCI:0:13:0" EndSection Section "Device" Identifier "Card1" Driver "nvidia" VendorName "nVidia Corporation" BoardName "GeForce 8400 GS" BusID "PCI:2:0:0" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "TwinView" "0" Option "metamodes" "nvidia-auto-select +0+0" SubSection "Display" Depth 24 EndSubSection EndSection Section "Screen" Identifier "Screen1" Device "Device1" Monitor "Monitor1" DefaultDepth 24 Option "TwinView" "0" Option "metamodes" "nvidia-auto-select +0+0" SubSection "Display" Depth 24 EndSubSection EndSection What I see in the log file is: (==) Log file: "/var/log/Xorg.0.log", Time: Fri Mar 19 11:08:08 2010 (==) Using config file: "/etc/X11/xorg.conf" (==) ServerLayout "Layout0" (**) |-->Screen "Screen0" (0) (**) | |-->Monitor "Monitor0" (==) No device specified for screen "Screen0". Using the first device section listed. (**) | |-->Device "Card0" (**) |-->Screen "Screen1" (1) (**) | |-->Monitor "Monitor1" (==) No device specified for screen "Screen1". Using the first device section listed. (**) | |-->Device "Card0" (**) |-->Input Device "Keyboard0" (**) |-->Input Device "Mouse0" (**) Option "Xinerama" "0" (==) Automatically adding devices (==) Automatically enabling devices even though later on both cards are detected: (--) PCI:*(0:0:13:0) 10de:03d1:1019:2601 nVidia Corporation C61 [GeForce 6100 nForce 405] rev 162, Mem @ 0xfb000000/16777216, 0xd0000000/268435456, 0xfc000000/16777216, BIOS @ 0x????????/131072 (--) PCI: (0:2:0:0) 10de:0422:0000:0000 nVidia Corporation G86 [GeForce 8400 GS] rev 161, Mem @ 0xf8000000/16777216, 0xe0000000/268435456, 0xf6000000/33554432, I/O @ 0x0000bc00/128, BIOS @ 0x????????/131072 [ --- some more logs --- ] (II) Mar 19 11:08:10 NVIDIA(0): NVIDIA GPU GeForce 6100 nForce 405 (C61) at PCI:0:13:0 (II) Mar 19 11:08:10 NVIDIA(0): (GPU-0) [ --- some more logs --- ] (II) Mar 19 11:08:12 NVIDIA(GPU-1): NVIDIA GPU GeForce 8400 GS (G86) at PCI:2:0:0 (GPU-1) Unfortunately later on only one card is initialised and one screen is active. Xrandr shows only one screen too. Any ideas on how to fix it?

Read the article
parallel_for_each from amp.h – part 1

- by Daniel Moth

This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are: (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality. That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
When to unload graphics object from main memory?

- by piotrek

I writing my resource mangaer, and I consider about how it can work for graphics objects (like textures, meshes). I think about this : I want to load texture (in pseudocode): Texture t = resMgr.GetTex("image.png"); and GetTex make something like this: load texture from disk to main memory create texture object (load it to gpu memory) unload texture from main memory I consider about 3 step, does game engines that you know unload meshes/textures after load them into gpu memory ?

Read the article
Intel z77 vs h77 for intensive compiling, gaming [closed]

- by Bilal Akhtar

I'm in the market for a desktop motherboard (preferably ATX) that functions well with Intel i7-3770 Ivy Bridge processor at 3.4 GHz with LGA1155 socket. That processor is very fast, and it should handle all my tasks. My question is about the type of motherboard chipset I should choose to accompany it. I plan to use my rig for compiling and developing Debian package and other OS components, web development, occasional Android apps, chroots, VMs, FlightGear, other gaming but nothing serious, and heavy multitasking, all on Ubuntu. I do NOT plan to overclock, and I never will, so that's not a cause of concern for me. That said, I'm down to three chipset choices: Intel H77 Intel Z68 Intel Z77 I'm planning to go for H77 since I don't need any of the new features in Z77. I don't plan to use a second GPU and I will never overclock my CPU/GPU. My question is, will H77 based MoBos handle all my tasks well? Intel advertises that chipset as "everyday computing" but other sites say it's base functionality is the same as Z77. Intel rather advertises Z77 for "serious multitaskers, hardcore gamers and overclocking enthusiasts". But the problem with all Z77 motherboards I've seen is, they're way too expensive and their main feature seems to be overclocking, which won't be useful to me. Will I lose any raw CPU/GPU performance or HDD R/w with the H77 when comparing it to a Z77? Will heat, etc be an issue too? From what I've seen, Z77 motherboards have larger heat sinks when compared to H77 ones. Will that be an issue too, if I go with an H77 motherboard with no heat sinks for the chipset? The CPU will have a fan in both cases, of course. tl;dr When it comes to CPU/GPU performance and HDD r/w, is the Intel H77 chipset slower than the Z77? I don't care about overclocking or multiple GPUs, and for the processor, I'm set on Ivy Bridge i7-3770.

Read the article
ACER ASPIRE V3-571G-9435 Fan not kicking in leading to overclocking

- by brythespy

This laptop has always had this problem. The temperatures kick up to the thermal ceiling of 99C for the CPU (i7-3610QM) and 94C for the GPU (GT 640M). Problem is, the FAN doesn't give a damn. It's actually QUIETER when the temperatures are that high, than when it's at 60C or so. I figured it was a problem with the BIOS, so I updated that, no change. So maybe it was a problem with windows? Nope, same result on gaming with Ubuntu. The major problem of this, is that after gaming for ten minutes the CPU throttles itself to 1197MHz(as opposed to 3193), and the GPU goes down to 135MHz( as opposed to 843MHz). The problem is that the fan won't kick in like I know it can, because when the laptop is in POST, like at BIOS setup, the fan is like a vacuum cleaner it's so loud! I don't really care about noise, so I'd love to have the fan like that all the time as long as the temperatures don't fly through the roof... So, things I've tried so far, to avoid possible duplicate answers. Checked for dust: It's been this way since the laptop was new, and I've since then taken it apart. No dust buildup. Background stuff running?: No, problem persists across OS'es, and it happens while gaming anyways Manually underclocking both CPU/GPU: Using windows, I can force the CPU to stay at 1.1GHz, but the temperature STILL easily hits 99C after 5 min of gaming Contacted Acer support?: No help at all. They told me to update and reset the BIOS, which I have done multiple times. There are only about 6 changeable things anyway, none of which should affect the FAN control Third party fan control program?: None detect the fan So, I'm screwed until I can afford to replace this laptop, but I am very satisfied with performance in games... Whenever the CPU/GPU aren't being throttled. Anyone that can offer advice to solve this problem would be greatly appreciated. Hell, if you solved my problem I'd send you some monies through paypal.

Read the article
SDL2 sprite batching and texture atlases

- by jms

I have been programming a 2D game in C++, using the SDL2 graphics API for rendering. My game concept currently features effects that could result in even tens of thousands of sprites being drawn simultaneously to the screen. I'd like to know what can be done for increasing rendering efficiency if the need arises, preferably using the SDL2 API only. I have previously given a quick look at OpenGL-based 2D rendering, and noticed that SDL2 lacks a command like int SDL_RenderCopyMulti(SDL_Renderer* renderer, SDL_Texture* texture, const SDL_Rect* srcrects, SDL_Rect* dstrects, int count) Which would permit SDL to benefit from two common techniques used for efficient 2D graphics: Texture batching: Sorting sprites by the texture used, and then simultaneously rendering as many sprites that use the same texture as possible, changing only the source area on the texture and the destination area on the render target between sprites. This allows the encapsulation of the whole operation in a single GPU command, reducing the overhead drastically from multiple distinct calls. Texture atlases: Instead of creating one texture for each frame of each animation of each sprite, combining multiple animations and even multiple sprites into a single large texture. This lessens the impact of changing the current texture when switching between sprites, as the correct texture is often ready to be used from the previous draw call. Furthemore the GPU is optimized for handling large textures, in contrast to the many tiny textures typically used for sprites. My question: Would SDL2 still get somewhat faster from any rudimentary sprite sorting or from combining multiple images into one texture thanks to automatic video driver optimizations? If I will encounter performance issues related to 2D rendering in the future, will I be forced to switch to OpenGL for lower level control over the GPU? Edit: Are there any plans to include such functionality in the near future?

Read the article
Attend my Fusion sessions

- by Daniel Moth

The inaugural Fusion conference was 1 year ago in June 2011 and I was there doing a demo in the keynote, and also presenting a breakout session. If you look at the abstract and title for that session you won't see the term "C++ AMP" in there because the technology wasn't announced and we didn't want to spill the beans ahead of the keynote, where the technology was announced. It was only an announcement, we did not give any bits out, and in fact the first bits came three months later in September 2011 with the Beta following in February 2012. So it really feels great 1 year later, to be back at Fusion presenting two sessions on C++ AMP, demonstrating our progress from that announcement, to the Visual Studio 2012 Release Candidate that came out last week. If you are attending Fusion (in person or virtually later), be sure to watch my two-part session. Part 1 is PT-3601 on Tuesday 4pm and part 2 is PT-3602 on Wednesday 4pm. Here is the shared abstract for both parts: Harnessing GPU Compute with C++ AMP C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 2012. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release. See you there! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
Attend my GTC sessions

- by Daniel Moth

The last GTC conference in the US was 2 years ago and I was there as an attendee. You may recall from that blog post that we were running UX studies at the time. It really feels great 2 years later, to be back at GTC presenting two sessions on C++ AMP, demonstrating our progress that includes input from those early studies. If you are attending GTC (in person or virtually later), be sure to watch my two-part session. Part 1 is S0242 on Wednesday 5pm and part 2 is S0244 on Thursday 10am. Here is the shared abstract for both parts: Harnessing GPU Compute with C++ AMP C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11 Beta. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release. See you there! Comments about this post by Daniel Moth welcome at the original blog.

Read the article
high performance with xen, vmware or virtualbox

- by Marchosius

I was wondering which is the best method to go about if I want to play win based games. I do not want to go with the dual boot method as this will cost me time to restart, login and run a os to do my work or pass the time, and some of my apps rely on win and my graphics to run. for example Daz3d, Photoshop, Flash etc. Now I read about HVM(hardware virtual machines) and then I know about the 3D virtualisation of VMware and VirtualBox. How ever the 2 later virtualise the 3D not using the full power of the GPU. So this option wont perform perfect for latest games like D3. I was wondering if anyone have experience in HVM(like xen if i am not mistaken) and tried something similar to access the full power of the GPU and successfully run newer games and other products relying on the GPU? Will be the first time setting up a HVM, no experience in this so don't know what to expect. This will help a lot as I do not want to revert back to win or as mentioned do dual boot.

Read the article
Can't use nvidia card/driver on optimus notebook

- by Mr. Pixel

I installed (once again) the latest official nvidia driver for my GT540m on Ubuntu 11.10. Even though everything seems OK with my xorg.conf file (I've manually added BusID "PCI:1:0:0", since lspci shows 01:00.0 for my GPU). The problem is, when I use the xorg.conf file generated by Xorg -configure, Xorg automatically loads the Intel GPU. So I removed everything that was not related to my nvidia card, basically leaving my xorg.conf with one screen and one device (with the nvidia driver and the above-mentioned BusID), and Xorg fails to start. The log says something like "Devices on GT540m [newline] none" And a few lines later, something like "NVIDIA(0) found a screen, but have no device for it". When I don't set the BusID, it doesn't seem to detect my card either. Thank you for any suggestion. PS: If possible, I'd like to avoid bumblebee or any similar "hybrid graphics" solution, last time I tried I ended up reinstalling Ubuntu. Edit: Allow me to clarify the problem. I have a notebook with a GT540m graphics card, and an integrated intel gpu. I want to use the graphics card with full hardware acceleration and its official driver, as I do under windows.

Read the article
Simple OpenGL program major slow down at high resolution

- by Grieverheart

I have created a small OpenGL 3.3 (Core) program using freeglut. The whole geometry is two boxes and one plane with some textures. I can move around like in an FPS and that's it. The problem is I face a big slow down of fps when I make my window large (i.e. above 1920x1080). I have monitors GPU usage when in full-screen and it shows GPU load of nearly 100% and Memory Controller load of ~85%. When at 600x600, these numbers are at about 45%, my CPU is also at full load. I use deferred rendering at the moment but even when forward rendering, the slow down was nearly as severe. I can't imagine my GPU is not powerful enough for something this simple when I play many games at 1080p (I have a GeForce GT 120M btw). Below are my shaders, First Pass #VS #version 330 core uniform mat4 ModelViewMatrix; uniform mat3 NormalMatrix; uniform mat4 MVPMatrix; uniform float scale; layout(location = 0) in vec3 in_Position; layout(location = 1) in vec3 in_Normal; layout(location = 2) in vec2 in_TexCoord; smooth out vec3 pass_Normal; smooth out vec3 pass_Position; smooth out vec2 TexCoord; void main(void){ pass_Position = (ModelViewMatrix * vec4(scale * in_Position, 1.0)).xyz; pass_Normal = NormalMatrix * in_Normal; TexCoord = in_TexCoord; gl_Position = MVPMatrix * vec4(scale * in_Position, 1.0); } #FS #version 330 core uniform sampler2D inSampler; smooth in vec3 pass_Normal; smooth in vec3 pass_Position; smooth in vec2 TexCoord; layout(location = 0) out vec3 outPosition; layout(location = 1) out vec3 outDiffuse; layout(location = 2) out vec3 outNormal; void main(void){ outPosition = pass_Position; outDiffuse = texture(inSampler, TexCoord).xyz; outNormal = pass_Normal; } Second Pass #VS #version 330 core uniform float scale; layout(location = 0) in vec3 in_Position; void main(void){ gl_Position = mat4(1.0) * vec4(scale * in_Position, 1.0); } #FS #version 330 core struct Light{ vec3 direction; }; uniform ivec2 ScreenSize; uniform Light light; uniform sampler2D PositionMap; uniform sampler2D ColorMap; uniform sampler2D NormalMap; out vec4 out_Color; vec2 CalcTexCoord(void){ return gl_FragCoord.xy / ScreenSize; } vec4 CalcLight(vec3 position, vec3 normal){ vec4 DiffuseColor = vec4(0.0); vec4 SpecularColor = vec4(0.0); vec3 light_Direction = -normalize(light.direction); float diffuse = max(0.0, dot(normal, light_Direction)); if(diffuse 0.0){ DiffuseColor = diffuse * vec4(1.0); vec3 camera_Direction = normalize(-position); vec3 half_vector = normalize(camera_Direction + light_Direction); float specular = max(0.0, dot(normal, half_vector)); float fspecular = pow(specular, 128.0); SpecularColor = fspecular * vec4(1.0); } return DiffuseColor + SpecularColor + vec4(0.1); } void main(void){ vec2 TexCoord = CalcTexCoord(); vec3 Position = texture(PositionMap, TexCoord).xyz; vec3 Color = texture(ColorMap, TexCoord).xyz; vec3 Normal = normalize(texture(NormalMap, TexCoord).xyz); out_Color = vec4(Color, 1.0) * CalcLight(Position, Normal); } Is it normal for the GPU to be used that much under the described circumstances? Is it due to poor performance of freeglut? I understand that the problem could be specific to my code, but I can't paste the whole code here, if you need more info, please tell me.

Read the article

Search Results

Search found 955 results on 39 pages for 'gpu accelration'.

Page 12/39 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

- by Jack Wright

- by Daniel Moth

- by Daniel Moth

- by Nathaniel Bennett

- by Nathaniel Bennett

- by Salim Fadhley

- by Gajoo

- by Adam

- by Imperial0007

- by Daniel Moth

- by Daniel Moth

- by Programmer

- by fabrizioM

- by Joan Venge

- by viraptor

- by Daniel Moth

- by piotrek

- by Bilal Akhtar

- by brythespy

- by jms

- by Daniel Moth

- by Daniel Moth

- by Marchosius

- by Mr. Pixel

- by Grieverheart

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >