Search Results

Search found 18 results on 1 pages for 'nvcc'.

Page 1/1 | 1 

  • nvcc not found, but only when using sudo

    - by dsp_099
    I can't get ANYTHING working on linux. I'm trying to compile CudaMiner. sudo make: ypt-jane.o `test -f 'scrypt-jane.cpp' || echo './'`scrypt-jane.cpp mv -f .deps/cudaminer-scrypt-jane.Tpo .deps/cudaminer-scrypt-jane.Po nvcc -g -O2 -Xptxas "-abi=no -v" -arch=compute_10 --maxrregcount=64 --ptxas-options=-v -I./compat/jansson -o salsa_kernel.o -c salsa_kernel.cu /bin/bash: nvcc: command not found make[2]: *** [salsa_kernel.o] Error 127 make[2]: Leaving directory `/var/progs/CudaMiner' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/var/progs/CudaMiner' make: *** [all] Error 2 So, kind of interesting: nvcc: nvcc fatal : No input files specified; use option --help for more information Whereas sudo nvcc: sudo: nvcc: command not found Huh?? I have identical exports listed in ~/.bashrc AND /etc/bash.bashrc. (Nvcc is located in: /usr/local/cuda-5.0/bin/nvcc) I also tried changing the current path, to no avail: $ sudo bash -c 'echo $PATH' /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin $ PATH=$PATH:/usr/local/cuda-5.0/bin/nvcc $ sudo bash -c 'echo $PATH' /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin Thanks in advance!

    Read the article

  • Why can't nvcc find my Visual C++ installation?

    - by Jack Lloyd
    I'm running Windows 7 Pro x64 on a Core i5 with a NVIDIA 3100m, which is CUDA compatible. I've tried installing both the 32-bit and 64-bit CUDA toolkits from NVIDIA, unfortunately from with either of them I cannot compile anything; nvcc says "cannot find a supported cl version. Only MSVC 8.0 and MSVC 9.0 are supported". I have the x86 and x86-64 compilers installed via the Windows 7 SDK (compiler version 15.00.30729.01 for both arches). Both compilers are operating correctly; I've built and tested C and C++ code using them. I've tried running nvcc from command shells set up for both 32 bit and 64 bit compilation, and using the -ccbin command line option to nvcc to point it at the Visual C++ install directory. What is the right way of handling this setup? Is there some way I make nvcc be more verbose about what is going on? The -v flag isn't terrible helpful. Ideally some way to make it show what it is finding versus what it's expecting to find. Will this work better if I install Visual C++ Express instead? Or is only a commercial version of VC++ supported for use with CUDA?

    Read the article

  • What are the default values for arch and code options when using nvcc?

    - by Auron
    When compiling your CUDA code, you have to select for which architecture your code is being generated. nvcc provides two parameters to specify this architecture, basically: arch specifies the virtual arquictecture, which can be compute_10, compute_11, etc. code specifies the real architecture, which can be sm_10, sm_11, etc. So a command like this: nvcc x.cu -arch=compute_13 -code=sm_13 Will generate 'cubin' code for devices with 1.3 compute capability. Please correct me if I'm wrong. Which I would like to know is which are the default values for these two parameters? Which is the default architecture that nvcc uses when no value for arch or code is specified?

    Read the article

  • Why won't OpenCV compile in NVCC?

    - by zenna
    Hi there I am trying to integrate CUDA and openCV in a project. Problem is openCV won't compile when NVCC is used, while a normal c++ project compiles just fine. This seems odd to me, as I thought NVCC passed all host code to the c/c++ compiler, in this case the visual studio compiler. The errors I get are? c:\opencv2.0\include\opencv\cxoperations.hpp(1137): error: no operator "=" matches these operands operand types are: const cv::Range = cv::Range c:\opencv2.0\include\opencv\cxoperations.hpp(2469): error: more than one instance of overloaded function "std::abs" matches the argument list: function "abs(long double)" function "abs(float)" function "abs(double)" function "abs(long)" function "abs(int)" argument types are: (ptrdiff_t) So my question is why the difference considering the same compiler (should be) is being used and secondly how I could remedy this.

    Read the article

  • CUDA linking error - Visual Express 2008 - nvcc fatal due to (null) configuration file

    - by Josh
    Hi, I've been searching extensively for a possible solution to my error for the past 2 weeks. I have successfully installed the Cuda 64-bit compiler (tools) and SDK as well as the 64-bit version of Visual Studio Express 2008 and Windows 7 SDK with Framework 3.5. I'm using windows XP 64-bit. I have confirmed that VSE is able to compile in 64-bit as I have all of the 64-bit options available to me using the steps on the following website: (since Visual Express does not inherently include the 64-bit packages) http://jenshuebel.wordpress.com/2009/02/12/visual-c-2008-express-edition-and-64-bit-targets/ I have confirmed the 64-bit compile ability since the "x64" is available from the pull-down menu under "Tools-Options-VC++ Directories" and compiling in 64-bit does not result in the entire project being "skipped". I have included all the needed directories for 64-bit cuda tools, 64 SDK and Visual Express (\VC\bin\amd64). Here's the error message I receive when trying to compile in 64-bit: 1>------ Build started: Project: New, Configuration: Release x64 ------ 1>Compiling with CUDA Build Rule... 1>"C:\CUDA\bin64\nvcc.exe" -arch sm_10 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 --compile -o "x64\Release\template.cu.obj" "c:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\CUDA_Walkthrough_DeviceKernels\template.cu" 1>nvcc fatal : Visual Studio configuration file '(null)' could not be found for installation at 'C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..' 1>Linking... 1>LINK : fatal error LNK1181: cannot open input file '.\x64\Release\template.cu.obj' 1>Build log was saved at "file://c:\Documents and Settings\Administrator\My Documents\Visual Studio 2008\Projects\New\New\x64\Release\BuildLog.htm" 1>New - 1 error(s), 0 warning(s) ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ========== Here's the simple code I'm trying to compile/run in 64-bit: #include <stdlib.h> #include <stdio.h> #include <string.h> #include <math.h> #include <cuda.h> void mypause () { printf ( "Press [Enter] to continue . . ." ); fflush ( stdout ); getchar(); } __global__ void VecAdd1_Kernel(float* A, float* B, float* C, int N) { int i = blockDim.x*blockIdx.x+threadIdx.x; if (i<N) C[i] = A[i] + B[i]; //result should be a 16x1 array of 250s } __global__ void VecAdd2_Kernel(float* B, float* C, int N) { int i = blockDim.x*blockIdx.x+threadIdx.x; if (i<N) C[i] = C[i] + B[i]; //result should be a 16x1 array of 400s } int main() { int N = 16; float A[16];float B[16]; size_t size = N*sizeof(float); for(int i=0; i<N; i++) { A[i] = 100.0; B[i] = 150.0; } // Allocate input vectors h_A and h_B in host memory float* h_A = (float*)malloc(size); float* h_B = (float*)malloc(size); float* h_C = (float*)malloc(size); //Initialize Input Vectors memset(h_A,0,size);memset(h_B,0,size); h_A = A;h_B = B; printf("SUM = %f\n",A[1]+B[1]); //simple check for initialization //Allocate vectors in device memory float* d_A; cudaMalloc((void**)&d_A,size); float* d_B; cudaMalloc((void**)&d_B,size); float* d_C; cudaMalloc((void**)&d_C,size); //Copy vectors from host memory to device memory cudaMemcpy(d_A,h_A,size,cudaMemcpyHostToDevice); cudaMemcpy(d_B,h_B,size,cudaMemcpyHostToDevice); //Invoke kernel int threadsPerBlock = 256; int blocksPerGrid = (N+threadsPerBlock-1)/threadsPerBlock; VecAdd1(blocksPerGrid, threadsPerBlock,d_A,d_B,d_C,N); VecAdd2(blocksPerGrid, threadsPerBlock,d_B,d_C,N); //Copy results from device memory to host memory //h_C contains the result in host memory cudaMemcpy(h_C,d_C,size,cudaMemcpyDeviceToHost); for(int i=0; i<N; i++) //output result from the kernel "VecAdd" { printf("%f ", h_C[i] ); printf("\n"); } printf("\n"); cudaFree(d_A); cudaFree(d_B); cudaFree(d_C); free(h_A); free(h_B); free(h_C); mypause(); return 0; }

    Read the article

  • identifier ... is undefined when trying to run pure C code in Cuda using nvcc

    - by Lostsoul
    I'm new and learning Cuda. A approach that I'm trying to use to learn is to write code in C and once I know its working start converting it to Cuda since I read that nvcc compiles Cuda code but complies everything else using plain old c. My code works in c(using gcc) but when I try to compile it using nvcc(after changing the file name from main.c to main.cu) I get main.cu(155): error: identifier "num_of_rows" is undefined main.cu(155): error: identifier "num_items_in_row" is undefined 2 errors detected in the compilation of "/tmp/tmpxft_00002898_00000000-4_main.cpp1.ii". Basically in my main method I send data to a function like this: process_list(count, countListItem, list); the first two items are ints and the last item(list) is a matrix. Then I create my function like this: void process_list(int num_of_rows, int num_items_in_row, int current_list[num_of_rows][num_items_in_row]) { This line is where I get my errors when using nvcc(line 155). I need to convert this code to cuda anyway so no need to troubleshoot this specific issue(plus code is quite large) but I'm wondering if I'm wrong about nvcc treating the C part of your code like plain C. In the book cuda by example I just used nvcc to compile but do I need any extra flags when just using pure c?

    Read the article

  • CUDA: cudaMemcpy only works in emulation mode.

    - by Jason
    I am just starting to learn how to use CUDA. I am trying to run some simple example code: float *ah, *bh, *ad, *bd; ah = (float *)malloc(sizeof(float)*4); bh = (float *)malloc(sizeof(float)*4); cudaMalloc((void **) &ad, sizeof(float)*4); cudaMalloc((void **) &bd, sizeof(float)*4); ... initialize ah ... /* copy array on device */ cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostToDevice); cudaMemcpy(bd,ad,sizeof(float)*N,cudaMemcpyDeviceToDevice); cudaMemcpy(bh,bd,sizeof(float)*N,cudaMemcpyDeviceToHost); When I run in emulation mode (nvcc -deviceemu) it runs fine (and actually copies the array). But when I run it in regular mode, it runs w/o error, but never copies the data. It's as if the cudaMemcpy lines are just ignored. What am I doing wrong? Thank you very much, Jason

    Read the article

  • C++ exceptions binary compatibility

    - by aaa
    hi. my project uses 2 different C++ compilers, g++ and nvcc (cuda compiler). I have noticed exception thrown from nvcc object files are not caught in g++ object files. are C++ exceptions supposed to be binary compatible in the same machine? what can cause such behavior?

    Read the article

  • How can I use GLUT with CUDA on MACOSX?

    - by omegatai
    Hi, I'm having problems compiling a CUDA program that uses GLUT on MacOsX. Here is the command line I use to compile the source: nvcc main.c -o main -Xlinker "-L/System/Library/Frameworks/OpenGL.framework/Libraries -lGL -lGLU" "-L/System/Library/Frameworks/GLUT.framework" And here is the errors I get: Undefined symbols: "_glutInitWindowSize", referenced from: _main in tmpxft_00001612_00000000-1_main.o "_glutInitWindowPosition", referenced from: _main in tmpxft_00001612_00000000-1_main.o "_glutDisplayFunc", referenced from: _main in tmpxft_00001612_00000000-1_main.o "_glutInitDisplayMode", referenced from: _main in tmpxft_00001612_00000000-1_main.o "_glutCreateWindow", referenced from: _main in tmpxft_00001612_00000000-1_main.o "_glutMainLoop", referenced from: _main in tmpxft_00001612_00000000-1_main.o "_glutInit", referenced from: _main in tmpxft_00001612_00000000-1_main.o ld: symbol(s) not found collect2: ld returned 1 exit status I am aware that I haven't specified any lib for GLUT but I just can't find it! Does anybody know where it is? By the way, there doesn't seem to be a way to use the GLUT.framework when compiling with nvcc. Thanks a lot, omegatai

    Read the article

  • CUDA not working in 64 bit windows 7

    - by Programmer
    I have cuda toolkit 4.0 installed in a 64 bit windows 7. I try building my cuda code, #include<iostream> #include"cuda_runtime.h" #include"cuda.h" __global__ void kernel(){ } int main(){ kernel<<<1,1>>>(); int c = 0; cudaGetDeviceCount(&c); cudaDeviceProp prop; cudaGetDeviceProperties(&prop, 0); std::cout<<"the name is"<<prop.name; std::cout<<"Hello World!"<<c<<std::endl; system("pause"); return 0; } but operation fails. Below is the build log: Build Log Rebuild started: Project: god, Configuration: Debug|Win32 Command Lines Creating temporary file "c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\BAT0000482007500.bat" with contents [ @echo off echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/sample.cu.obj" sample.cu "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/sample.cu.obj" "c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\sample.cu" if errorlevel 1 goto VCReportError goto VCEnd :VCReportError echo Project : error PRJ0019: A tool returned an error code from "Compiling with CUDA Build Rule..." exit 1 :VCEnd ] Creating command line """c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\BAT0000482007500.bat""" Creating temporary file "c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\RSP0000492007500.rsp" with contents [ /OUT:"C:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\Debug\god.exe" /LIBPATH:"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\lib\x64" /MANIFEST /MANIFESTFILE:"Debug\god.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\Debug\god.pdb" /DYNAMICBASE /NXCOMPAT /MACHINE:X86 cudart.lib cuda.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib ".\Debug\sample.cu.obj" ] Creating command line "link.exe @"c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\RSP0000492007500.rsp" /NOLOGO /ERRORREPORT:PROMPT" Output Window Compiling with CUDA Build Rule... "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include" -maxrregcount=0 --compile -o "Debug/sample.cu.obj" sample.cu sample.cu sample.cu.obj : error LNK2019: unresolved external symbol _cudaLaunch@4 referenced in function "enum cudaError cdecl cudaLaunch(char *)" (??$cudaLaunch@D@@YA?AW4cudaError@@PAD@Z) sample.cu.obj : error LNK2019: unresolved external symbol ___cudaRegisterFunction@40 referenced in function "void __cdecl _sti_cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1(void)" (?sti__cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1@@YAXXZ) sample.cu.obj : error LNK2019: unresolved external symbol _cudaRegisterFatBinary@4 referenced in function "void __cdecl _sti_cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1(void)" (?sti__cudaRegisterAll_52_tmpxft_00001c68_00000000_8_sample_compute_10_cpp1_ii_b81a68a1@@YAXXZ) sample.cu.obj : error LNK2019: unresolved external symbol _cudaGetDeviceProperties@8 referenced in function _main sample.cu.obj : error LNK2019: unresolved external symbol _cudaGetDeviceCount@4 referenced in function _main sample.cu.obj : error LNK2019: unresolved external symbol _cudaConfigureCall@32 referenced in function _main C:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\Debug\god.exe : fatal error LNK1120: 7 unresolved externals Results Build log was saved at "file://c:\Users\t-sudhk\Documents\Visual Studio 2008\Projects\god\god\Debug\BuildLog.htm" god - 8 error(s), 0 warning(s) I will be highly obliged if someone could help me. Thanks

    Read the article

  • NVIDIA CUDA SDK Examples Compilation Unsupported Architecture 'computer_20'

    - by Andrew Bolster
    On compilation of the CUDA SDK, I'm getting a nvcc fatal : Unsupported gpu architecture 'compute_20' My toolkit is 2.3 and on a shared system (i.e cant really upgrade) and the driver version is also 2.3, running on 4 Tesla C1060s If it helps, the problem is being called in radixsort. It appears that a few people online have had this problem but i havent found anywhere that actually gives a solution.

    Read the article

  • Creating .lib files in CUDA Toolkit 5

    - by user1683586
    I am taking my first faltering steps with CUDA Toolkit 5.0 RC using VS2010. Separate compilation has me confused. I tried to set up a project as a Static Library (.lib), but when I try to build it, it does not create a device-link.obj and I don't understand why. For instance, there are 2 files: A caller function that uses a function f #include "thrust\host_vector.h" #include "thrust\device_vector.h" using namespace thrust::placeholders; extern __device__ double f(double x); struct f_func { __device__ double operator()(const double& x) const { return f(x); } }; void test(const int len, double * data, double * res) { thrust::device_vector<double> d_data(data, data + len); thrust::transform(d_data.begin(), d_data.end(), d_data.begin(), f_func()); thrust::copy(d_data.begin(),d_data.end(), res); } And a library file that defines f __device__ double f(double x) { return x+2.0; } If I set the option generate relocatable device code to No, the first file will not compile due to unresolved extern function f. If I set it to -rdc, it will compile, but does not produce a device-link.obj file and so the linker fails. If I put the definition of f into the first file and delete the second it builds successfully, but now it isn't separate compilation anymore. How can I build a static library like this with separate source files? [Updated here] I called the first caller file "caller.cu" and the second "libfn.cu". The compiler lines that VS2010 outputs (which I don't fully understand) are (for caller): nvcc.exe -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\caller.cu.obj" "G:\Test_Linking\caller.cu" -clean and the same for libfn, then: nvcc.exe -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2010 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\caller.cu.obj" "G:\Test_Linking\caller.cu" and again for libfn.

    Read the article

  • cmake, gcc, cuda and -m32 wtf

    - by Nils
    Hi all I figured out that CUDA does not work in 64bit mode on my mac (or couldn't get it running so far). Therefore I decided to compile everything for 32bit. I use cmake 2.8 and added the following options add_definitions(-Wall -m32) set(CUDA_64_BIT_DEVICE_CODE OFF) set(CMAKE_MODULE_LINKER_FLAGS -m32) However when it tries to link it it does something like this: /usr/bin/c++ -mmacosx-version-min=10.6 -Wl,-search_paths_first -headerpad_max_install_names CMakeFiles/SimpleTestsCUDA.dir/BlockMatrix.cpp.o CMakeFiles/SimpleTestsCUDA.dir/Matrix.cpp.o ./SimpleTestsCUDA_generated_SimpleTests.cu.o ./SimpleTestsCUDA_generated_BlockMatrix.cu.o -o SimpleTestsCUDA /usr/local/cuda/lib/libcudart.dylib /usr/local/cuda/lib/libcuda.dylib Which fails with a lot of "file is not of required architecture" warnings from ld. Now if I add manually -m32 to the command above it works. However I have no idea how to teach cmake to add -m32 to every gcc (or ld) invocation. So far it does it for nvcc and gcc, but not for linking..

    Read the article

  • cmake: Target-specific preprocessor definitions for CUDA targets seems not to work

    - by Nils
    I'm using cmake 2.8.1 on Mac OSX 10.6 with CUDA 3.0. So I added a CUDA target which needs BLOCK_SIZE set to some number in order to compile. cuda_add_executable(SimpleTestsCUDA SimpleTests.cu BlockMatrix.cpp Matrix.cpp ) set_target_properties(SimpleTestsCUDA PROPERTIES COMPILE_FLAGS -DBLOCK_SIZE=3) When running make VERBOSE=1 I noticed that nvcc is invoked w/o -DBLOCK_SIZE=3, which results in an error, because BLOCK_SIZE is used in the code, but defined nowhere. Now I used the same definition for a CPU target (using add_executable(...)) and there it worked. So now the questions: How do I figure out what cmake does with the set_target_properties line if it points to a CUDA target? Googling around didn't help so far and a workaround would be cool..

    Read the article

  • Debuging CUDA kernels called from .NET code in VS2008, emulation mode

    - by Danny Varod
    CUDA has an option to compile code in emulation mode, which is supported in the .rules file they provide. I have C# .NET 3.5 SP1 code that calls a native dll, using DllImport, the native dll is compiled via VS2008 using nvcc and its function is to transfer memory from and to CUDA and to invoke CUDA kernels. When the CUDA kernels are correct, everything runs fine, but when there is a bug, I can only step in to the code until the title of the kernels and see the parameters they receive. (I enabled debugging native code in the startup-project's debug options.) I have tried compiling with emulation mode, however I get the CUDA error "mixed device execution" when calling the CUDA memcopy host--device. I tried switching the alloc+dealloc+memcopy with their equivalent non-CUDA versions, but then the same error occurs when invoking the kernels. What did I do wrong in my attempt to using the debug-emulation mode? P.S. I tried this on Vista x64 SP1 + VS2008, with the same solution complied in both x86 and x64, neither worked in emulation mode, both worked in non-emulation mode.

    Read the article

  • malloc works, cudaHostAlloc segfaults?

    - by Mikhail
    I am new to CUDA and I want to use cudaHostAlloc. I was able to isolate my problem to this following code. Using malloc for host allocation works, using cudaHostAlloc results in a segfault, possibly because the area allocated is invalid? When I dump the pointer in both cases it is not null, so cudaHostAlloc returns something... works in_h = (int*) malloc(length*sizeof(int)); //works for (int i = 0;i<length;i++) in_h[i]=2; doesn't work cudaHostAlloc((void**)&in_h,length*sizeof(int),cudaHostAllocDefault); for (int i = 0;i<length;i++) in_h[i]=2; //segfaults Standalone Code #include <stdio.h> void checkDevice() { cudaDeviceProp info; int deviceName; cudaGetDevice(&deviceName); cudaGetDeviceProperties(&info,deviceName); if (!info.deviceOverlap) { printf("Compute device can't use streams and should be discared."); exit(EXIT_FAILURE); } } int main() { checkDevice(); int *in_h; const int length = 10000; cudaHostAlloc((void**)&in_h,length*sizeof(int),cudaHostAllocDefault); printf("segfault comming %d\n",in_h); for (int i = 0;i<length;i++) { in_h[i]=2; } free(in_h); return EXIT_SUCCESS; } ~ Invocation [id129]$ nvcc fun.cu [id129]$ ./a.out segfault comming 327641824 Segmentation fault (core dumped) Details Program is run in interactive mode on a cluster. I was told that an invocation of the program from the compute node pushes it to the cluster. Have not had any trouble with other home made toy cuda codes.

    Read the article

  • Error while compiling Hello world program for CUDA

    - by footy
    I am using Ubuntu 12.10 and have sucessfully installed CUDA 5.0 and its sample kits too. I have also run sudo apt-get install nvidia-cuda-toolkit Below is my hello world program for CUDA: #include <stdio.h> /* Core input/output operations */ #include <stdlib.h> /* Conversions, random numbers, memory allocation, etc. */ #include <math.h> /* Common mathematical functions */ #include <time.h> /* Converting between various date/time formats */ #include <cuda.h> /* CUDA related stuff */ __global__ void kernel(void) { } /* MAIN PROGRAM BEGINS */ int main(void) { /* Dg = 1; Db = 1; Ns = 0; S = 0 */ kernel<<<1,1>>>(); /* PRINT 'HELLO, WORLD!' TO THE SCREEN */ printf("\n Hello, World!\n\n"); /* INDICATE THE TERMINATION OF THE PROGRAM */ return 0; } /* MAIN PROGRAM ENDS */ The following error occurs when I compile it with nvcc -g hello_world_cuda.cu -o hello_world_cuda.x /tmp/tmpxft_000033f1_00000000-13_hello_world_cuda.o: In function `main': /home/adarshakb/Documents/hello_world_cuda.cu:16: undefined reference to `cudaConfigureCall' /tmp/tmpxft_000033f1_00000000-13_hello_world_cuda.o: In function `__cudaUnregisterBinaryUtil': /usr/include/crt/host_runtime.h:172: undefined reference to `__cudaUnregisterFatBinary' /tmp/tmpxft_000033f1_00000000-13_hello_world_cuda.o: In function `__sti____cudaRegisterAll_51_tmpxft_000033f1_00000000_4_hello_world_cuda_cpp1_ii_b81a68a1': /tmp/tmpxft_000033f1_00000000-1_hello_world_cuda.cudafe1.stub.c:1: undefined reference to `__cudaRegisterFatBinary' /tmp/tmpxft_000033f1_00000000-1_hello_world_cuda.cudafe1.stub.c:1: undefined reference to `__cudaRegisterFunction' /tmp/tmpxft_000033f1_00000000-13_hello_world_cuda.o: In function `cudaError cudaLaunch<char>(char*)': /usr/lib/nvidia-cuda-toolkit/include/cuda_runtime.h:958: undefined reference to `cudaLaunch' collect2: ld returned 1 exit status I am also making sure that I use gcc and g++ version 4.4 ( As 4.7 there is some problem with CUDA)

    Read the article

  • C++ ambiguous template instantiation

    - by aaa
    the following gives me ambiguous template instantiation with nvcc (combination of EDG front-end and g++). Is it really ambiguous, or is compiler wrong? I also post workaround à la boost::enable_if template<typename T> struct disable_if_serial { typedef void type; }; template<> struct disable_if_serial<serial_tag> { }; template<int M, int N, typename T> __device__ //static typename disable_if_serial<T>::type void add_evaluate_polynomial1(double *R, const double (&C)[M][N], double x, const T &thread) { // ... } template<size_t M, size_t N> __device__ static void add_evaluate_polynomial1(double *R, const double (&C)[M][N], double x, const serial_tag&) { for (size_t i = 0; i < M; ++i) add_evaluate_polynomial1(R, C, x, i); } // ambiguous template instantiation here. add_evaluate_polynomial1(R, C, x, serial_tag());

    Read the article

1