Search Results

Search found 16947 results on 678 pages for 'kernel programming'.

Page 594/678 | < Previous Page | 590 591 592 593 594 595 596 597 598 599 600 601  | Next Page >

  • Canceling a WSK I/O operation when driver is unloading

    - by eaducac
    I've been learning how to write drivers with the Windows DDK recently. After creating a few test drivers experimenting with system threads and synchronization, I decided to step it up a notch and write a driver that actually does something, albeit something useless. Currently, my driver connects to my other computer using Winsock Kernel and just loops and echoes back whatever I send to it until it gets the command "exit", which causes it to break out of the loop. In my loop, after I call WskReceive() to get some data from the other computer, I use KeWaitForMultipleObjects() to wait for either of two SynchronizationEvents. BlockEvent gets set by my IRP's CompletionRoutine() to let my thread know that it's received some data from the socket. EndEvent gets set by my DriverUnload() routine to tell the thread that it's being unloaded and it needs to terminate. When I send the "exit" command, the thread terminates with no problems, and I can safely unload the driver afterward. If I try to stop the driver while it's still waiting on data from the other computer, however, it blue screens with the error DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS. After I get the EndEvent but before I exit the loop, I've tried canceling the IRP with IoCancelIrp() and completing it with IoCompleteRequest(), but both of those give me DRIVER_IRQL_NOT_LESS_OR_EQUAL errors. I then tried calling WskDisconnect(), hoping that would cause the receive operation to complete, but that took me back to the CANCELLING_PENDING_OPERATIONS error. How do I cancel my pending I/O operation from my WSK socket at the IRQL I'm running at when the driver is unloaded?

    Read the article

  • How can I write a unit test to determine whether an object can be garbage collected?

    - by driis
    In relation to my previous question, I need to check whether a component that will be instantiated by Castle Windsor, can be garbage collected after my code has finished using it. I have tried the suggestion in the answers from the previous question, but it does not seem to work as expected, at least for my code. So I would like to write a unit test that tests whether a specific object instance can be garbage collected after some of my code has run. Is that possible to do in a reliable way ? EDIT I currently have the following test based on Paul Stovell's answer, which succeeds: [TestMethod] public void ReleaseTest() { WindsorContainer container = new WindsorContainer(); container.Kernel.ReleasePolicy = new NoTrackingReleasePolicy(); container.AddComponentWithLifestyle<ReleaseTester>(LifestyleType.Transient); Assert.AreEqual(0, ReleaseTester.refCount); var weakRef = new WeakReference(container.Resolve<ReleaseTester>()); Assert.AreEqual(1, ReleaseTester.refCount); GC.Collect(); GC.WaitForPendingFinalizers(); Assert.AreEqual(0, ReleaseTester.refCount, "Component not released"); } private class ReleaseTester { public static int refCount = 0; public ReleaseTester() { refCount++; } ~ReleaseTester() { refCount--; } } Am I right assuming that, based on the test above, I can conclude that Windsor will not leak memory when using the NoTrackingReleasePolicy ?

    Read the article

  • Linux C debugging library to detect memory corruptions

    - by calandoa
    When working sometimes ago on an embedded system with a simple MMU, I used to program dynamically this MMU to detect memory corruptions. For instance, at some moment at runtime, the foo variable was overwritten with some unexpected data (probably by a dangling pointer or whatever). So I added the additional debugging code : at init, the memory used by foo was indicated as a forbidden region to the MMU; each time foo was accessed on purpose, access to the region was allowed just before then forbidden just after; a MMU irq handler was added to dump the master and the address responsible of the violation. This was actually some kind of watchpoint, but directly self-handled by the code itself. Now, I would like to reuse the same trick, but on a x86 platform. The problem is that I am very far from understanding how is working the MMU on this platform, and how it is used by Linux, but I wonder if any library/tool/system call already exist to deal with this problem. Note that I am aware that various tools exist like Valgrind or GDB to manage memory problems, but as far as I know, none of these tools car be dynamically reconfigured by the debugged code. I am mainly interested for user space under Linux, but any info on kernel mode or under Windows is also welcome!

    Read the article

  • erlide, which eclipse/which packages?

    - by KevinDTimm
    I have downloaded eclipse 3.4 (java version) for MacOSX (carbon). I have tried to 'update' to the erlide, but see many (duplicated) options (many erlide, options that say 'only for erl SDK updates', etc.) Sometimes I get 403 errors when attempting to access http://erlide.org/update and http://erlide.sourceforge.net/update. Finally, when I get some set of options installed, I either get errors like : Loading of /Users/kevindtimm/Documents/eclipse-java-ganymede-SR2-macosx-carbon/eclipse/plugins/org.erlide.kernel.common_0.8.1.201005250801/ebin/erlide_kernel_common.beam failed: badfile (hello_world@ktmac)1> =ERROR REPORT==== 24-Nov-2010::19:17:32 === beam/beam_load.c(1768): Error loading function erlide_kernel_common:monitor/0: op put_string u u x: please re-compile this module with an R14B compiler or, when I've done different installations of erlide, I get no response in the console to : hello:hello(). Does anybody have a good reference for how to load this plug-in and which items I should install? -module(hello). -export([hello/0]). hello() -> io:write("Hello World\n"). [edit] I have installed eclipse 3.6 (c++) as requested below, and the following code still can't find hello:hello(). %%file_comment -module(hello). %% %% Include files %% %% %% Exported Functions %% -export([hello/0]). %% %% API Functions %% %% %% Local Functions %% hello() -> io:write("Hello World\n"). [/edit]

    Read the article

  • How to optimize Conway's game of life for CUDA?

    - by nlight
    I've written this CUDA kernel for Conway's game of life: global void gameOfLife(float* returnBuffer, int width, int height) { unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y; float p = tex2D(inputTex, x, y); float neighbors = 0; neighbors += tex2D(inputTex, x+1, y); neighbors += tex2D(inputTex, x-1, y); neighbors += tex2D(inputTex, x, y+1); neighbors += tex2D(inputTex, x, y-1); neighbors += tex2D(inputTex, x+1, y+1); neighbors += tex2D(inputTex, x-1, y-1); neighbors += tex2D(inputTex, x-1, y+1); neighbors += tex2D(inputTex, x+1, y-1); __syncthreads(); float final = 0; if(neighbors < 2) final = 0; else if(neighbors 3) final = 0; else if(p != 0) final = 1; else if(neighbors == 3) final = 1; __syncthreads(); returnBuffer[x + y*width] = final; } I am looking for errors/optimizations. Parallel programming is quite new to me and I am not sure if I get how to do it right. The rest of the app is: Memcpy input array to a 2d texture inputTex stored in a CUDA array. Output is memcpy-ed from global memory to host and then dealt with. As you can see a thread deals with a single pixel. I am unsure if that is the fastest way as some sources suggest doing a row or more per thread. If I understand correctly NVidia themselves say that the more threads, the better. I would love advice on this on someone with practical experience.

    Read the article

  • Why does my javascript file sometimes compressed while sometimes not?(IIS Gzip problem)

    - by Kevin Yang
    i enable gzip for javascript file in my iis settings, here 's the corresponding config section. <httpCompression directory="%SystemDrive%\inetpub\temp\IIS Temporary Compressed Files"> <scheme name="gzip" dll="%Windir%\system32\inetsrv\gzip.dll" staticCompressionLevel="10" dynamicCompressionLevel="8" /> <dynamicTypes> <add mimeType="text/*" enabled="true" /> <add mimeType="message/*" enabled="true" /> <add mimeType="application/soap+msbin1" enabled="true" /> <add mimeType="*/*" enabled="false" /> </dynamicTypes> <staticTypes> <add mimeType="text/*" enabled="true" /> <add mimeType="message/*" enabled="true" /> <add mimeType="application/javascript" enabled="true" /> <add mimeType="application/x-javascript" enabled="true" /> <add mimeType="*/*" enabled="false" /> </staticTypes> </httpCompression> currently, when i download my js file, it seems that sometimes server return the gzip one, and sometimes not. i dont know why, and how to debug that. If a file is already gzipped, it should be cached in local disk, and next time someone visit that file again, iis kernel should return the cache gzip file directly without compressing it again. Is that right?

    Read the article

  • Can I prevent a Linux user space pthread yielding in critical code?

    - by KermitG
    I am working on an user space app for an embedded Linux project using the 2.6.24.3 kernel. My app passes data between two file nodes by creating 2 pthreads that each sleep until a asynchronous IO operation completes at which point it wakes and runs a completion handler. The completion handlers need to keep track of how many transfers are pending and maintain a handful of linked lists that one thread will add to and the other will remove. // sleep here until events arrive or time out expires for(;;) { no_of_events = io_getevents(ctx, 1, num_events, events, &timeout); // Process each aio event that has completed or thrown an error for (i=0; i<no_of_events; i++) { // Get pointer to completion handler io_complete = (io_callback_t) events[i].data; // Get pointer to data object iocb = (struct iocb *) events[i].obj; // Call completion handler and pass it the data object io_complete(ctx, iocb, events[i].res, events[i].res2); } } My question is this... Is there a simple way I can prevent the currently active thread from yielding whilst it runs the completion handler rather than going down the mutex/spin lock route? Or failing that can Linux be configured to prevent yielding a pthread when a mutex/spin lock is held?

    Read the article

  • Building a custom Linux Live CD

    - by Mike Heinz
    Can anyone point me to a good tutorial on creating a bootable Linux CD from scratch? I need help with a fairly specialized problem: my firm sells an expansion card that requires custom firmware. Currently we use an extremely old live CD image of RH7.2 that we update with current firmware. Manufacturing puts the cards in a machine, boots off the CD, the CD writes the firmware, they power off and pull the cards. Because of this cycle, it's essential that the CD boot and shut down as quickly as possible. The problem is that with the next generation of cards, I have to update the CD to a 2.6 kernel. It's easy enough to acquire a pre-existing live CD - but those all are designed for showing off Linux on the desktop - which means they take forever to boot. Can anyone fix me up with a current How-To? Update: So, just as a final update for anyone reading this later - the tool I ended up using was "livecd-creator". My reason for choosing this tool was that it is available for RedHat-based distributions like CentOs, Fedora and RHEL - which are all distributions that my company supports already. In addition, while the project is very poorly documented it is extremely customizable. I was able to create a minimal LiveCD and edit the boot sequence so that it booted directly into the firmware updater instead of a bash shell. The whole job would have only taken an hour or two if there had been a README explaining the configuration file!

    Read the article

  • What can a company possibly gain by making Android phones hard to root?

    - by Chinmay Kanchi
    As someone who recently got a HTC Hero, I had to jump through several hoops to get root access on the phone to install custom firmware. Now, Android is open-source and fairly easy to build and hack on an emulator. It seems to be against the spirit of open-source to lock down a phone so you can't hack the phone itself. Now, often, there are understandable (though not always justifiable) reasons for locking a device down. For example, it might have proprietary software on it or you might want to retain control of the platform. However, Android by its open-source nature makes such concerns moot. Everyone and their dog has access to the userland code, and HTC is forced by the GPL to release kernel sources for each of their devices. So, I fail to see any motivation for alienating the hackers, when there is no possible benefit (in my mind) to be had from doing this. Any idea why a company would want to do this? Is it just short-sightedness or am I missing possible commercial implications of this?

    Read the article

  • How are two-dimensional arrays formatted in memory?

    - by Chris Cooper
    In C, I know I can dynamically allocate a two-dimensional array on the heap using the following code: int** someNumbers = malloc(arrayRows*sizeof(int*)); for (i = 0; i < arrayRows; i++) { someNumbers[i] = malloc(arrayColumns*sizeof(int)); } Clearly, this actually creates a one-dimensional array of pointers to a bunch of separate one-dimensional arrays of integers, and "The System" can figure you what I mean when I ask for: someNumbers[4][2]; But when I statically declare a 2D array, as in the following line...: int someNumbers[ARRAY_ROWS][ARRAY_COLUMNS]; ...does a similar structure get created on the stack, or is it of another form completely? (i.e. is it a 1D array of pointers? If not, what is it, and how do references to it get figured out?) Also, when I said, "The System," what is actually responsible for figuring that out? The kernel? Or does the C compiler sort it out while compiling?

    Read the article

  • OpenCL Matrix Multiplication - Getting wrong answer

    - by Yash
    here's a simple OpenCL Matrix Multiplication kernel which is driving me crazy: __kernel void matrixMul( __global int* C, __global int* A, __global int* B, int wA, int wB){ int row = get_global_id(1); //2D Threas ID x int col = get_global_id(0); //2D Threas ID y //Perform dot-product accumulated into value int value; for ( int k = 0; k < wA; k++ ){ value += A[row*wA + k] * B[k*wB+col]; } C[row*wA+col] = value; //Write to the device memory } Where (inputs) A = [72 45 75 61] B = [26 53 46 76] Output I am getting: C = [3942 7236 3312 5472] But the output should be: C = [3943 7236 4756 8611] The problem I am facing here is that for any dimension array the elements of the first row of the resulting matrix is correct. The elements of all the other rows of the resulting matrix is wrong. By the way I am using pyopencl. I don't know what I mistake I am doing here. I have spent the entire day with no luck. Please help me with this

    Read the article

  • Strange results while measuring delta time on Linux

    - by pachanga
    Folks, could you please explain why I'm getting very strange results from time to time using the the following code: #include <unistd.h> #include <sys/time.h> #include <time.h> #include <stdio.h> int main() { struct timeval start, end; long mtime, seconds, useconds; while(1) { gettimeofday(&start, NULL); usleep(2000); gettimeofday(&end, NULL); seconds = end.tv_sec - start.tv_sec; useconds = end.tv_usec - start.tv_usec; mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5; if(mtime > 10) printf("WTF: %ld\n", mtime); } return 0; } (You can compile and run it with: gcc test.c -o out -lrt && ./out) What I'm experiencing is sporadic big values of mtime variable almost every second or even more often, e.g: $ gcc test.c -o out -lrt && ./out WTF: 14 WTF: 11 WTF: 11 WTF: 11 WTF: 14 WTF: 13 WTF: 13 WTF: 11 WTF: 16 How can this be possible? Is it OS to blame? Does it do too much context switching? But my box is idle( load average: 0.02, 0.02, 0.3). Here is my Linux kernel version: $ uname -a Linux kurluka 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:56 UTC 2010 i686 GNU/Linux

    Read the article

  • What can cause a spontaneous EPIPE error without either end calling close() or crashing?

    - by Hongli
    I have an application that consists of two processes (let's call them A and B), connected to each other through Unix domain sockets. Most of the time it works fine, but some users report the following behavior: A sends a request to B. This works. A now starts reading the reply from B. B sends a reply to A. The corresponding write() call returns an EPIPE error, and as a result B close() the socket. However, A did not close() the socket, nor did it crash. A's read() call returns 0, indicating end-of-file. A thinks that B prematurely closed the connection. Users have also reported variations of this behavior, e.g.: A sends a request to B. This works partially, but before the entire request is sent A's write() call returns EPIPE, and as a result A close() the socket. However B did not close() the socket, nor did it crash. B reads a partial request and then suddenly gets an EOF. The problem is I cannot reproduce this behavior locally at all. I've tried OS X and Linux. The users are on a variety of systems, mostly OS X and Linux. Things that I've already tried and considered: Double close() bugs (close() is called twice on the same file descriptor): probably not as that would result in EBADF errors, but I haven't seen them. Increasing the maximum file descriptor limit. One user reported that this worked for him, the rest reported that it did not. What else can possibly cause behavior like this? I know for certain that neither A nor B close() the socket prematurely, and I know for certain that neither of them have crashed because both A and B were able to report the error. It is as if the kernel suddenly decided to pull the plug from the socket for some reason.

    Read the article

  • Time gaps between host clEnqueue_xxx calls

    - by dialer
    Consider these OpenCL calls (3 memcpy DtoH, 4313 cl_float elements each): clEnqueueReadBuffer(CommandQueue, SpectrumAbsMem, CL_FALSE, 0, SpectrumMemSize, SpectrumAbs, 0, NULL, NULL); clEnqueueReadBuffer(CommandQueue, SpectrumReMem, CL_FALSE, 0, SpectrumMemSize, SpectrumRe, 0, NULL, NULL); clEnqueueReadBuffer(CommandQueue, SpectrumImMem, CL_FALSE, 0, SpectrumMemSize, SpectrumIm, 0, NULL, NULL); When I analyze these with the NVIDIA visual profiler, I see that the actual memcpy operation only takes 8 us, but there is a significant gap of around 130 us after each memcpy. I'm already using the supposedly asynchronous method (the CL_FALSE in the argument list). When I use only one operation, but with three times the size, the operation is way faster. Why is the time gap between the actual memcpy operations so huge, whereas the gap between the kernel execution (exactly before these three operations) and the first memcpy is only 7us? Can I get rid of it, or do I need to accumulate more data before starting a memcpy? If so, is there a convenient way how I could combine mutliple arrays into a single contiguous block of memory, but still have a cl_mem object as a separate device memory pointer to each section?

    Read the article

  • what can cause large discrepancy between minor GC time and total pause time?

    - by cxcg
    We have a latency-sensitive application, and are experiencing some GC-related pauses we don't fully understand. We occasionally have a minor GC that results in application pause times that are much longer than the reported GC time itself. Here is an example log snippet: 485377.257: [GC 485378.857: [ParNew: 105845K-621K(118016K), 0.0028070 secs] 136492K-31374K(1035520K), 0.0028720 secs] [Times: user=0.01 sys=0.00, real=1.61 secs] Total time for which application threads were stopped: 1.6032830 seconds The total pause time here is orders of magnitude longer than the reported GC time. These are isolated and occasional events: the immediately preceding and succeeding minor GC events do not show this large discrepancy. The process is running on a dedicated machine, with lots of free memory, 8 cores, running Red Hat Enterprise Linux ES Release 4 Update 8 with kernel 2.6.9-89.0.1EL-smp. We have observed this with (32 bit) JVM versions 1.6.0_13 and 1.6.0_18. We are running with these flags: -server -ea -Xms512m -Xmx512m -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:MaxNewSize=128m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:-TraceClassUnloading Can anybody offer some explanation as to what might be going on here, and/or some avenues for further investigation?

    Read the article

  • Why is a non-blocking TCP connect() occasionally so slow on Linux?

    - by pts
    I was trying to measure the speed of a TCP server I'm writing, and I've noticed that there might be a fundamental problem of measuring the speed of the connect() calls: if I connect in a non-blocking way, connect() operations become very slow after a few seconds. Here is the example code in Python: #! /usr/bin/python2.4 import errno import os import select import socket import sys def NonBlockingConnect(sock, addr): while True: try: return sock.connect(addr) except socket.error, e: if e.args[0] not in (errno.EINPROGRESS, errno.EALREADY): raise os.write(2, '^') if not select.select((), (sock,), (), 0.5)[1]: os.write(2, 'P') def InfiniteClient(addr): while True: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) sock.setblocking(0) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # sock.connect(addr) NonBlockingConnect(sock, addr) sock.close() os.write(2, '.') def InfiniteServer(server_socket): while True: sock, addr = server_socket.accept() sock.close() server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_socket.bind(('127.0.0.1', 45454)) server_socket.listen(128) if os.fork(): # Parent. InfiniteServer(server_socket) else: addr = server_socket.getsockname() server_socket.close() InfiniteClient(addr) With NonBlockingConnect, most connect() operations are fast, but in every few seconds there happens to be one connect() operation which takes at least 2 seconds (as indicated by 5 consecutive P letters on the output). By using sock.connect instead of NonBlockingConnect all connect operations seem to be fast. How is it possible to get rid of these slow connect()s? I'm running Ubuntu Karmic desktop with the standard PAE kernel: Linux narancs 2.6.31-20-generic-pae #57-Ubuntu SMP Mon Feb 8 10:23:59 UTC 2010 i686 GNU/Linux

    Read the article

  • Freeing of allocated memory in Solaris/Linux

    - by user355159
    Hi, I have written a small program and compiled it under Solaris/Linux platform to measure the performance of applying this code to my application. The program is written in such a way, initially using sbrk(0) system call, i have taken base address of the heap region. After that i have allocated an 1.5GB of memory using malloc system call, Then i used memcpy system call to copy 1.5GB of content to the allocated memory area. Then, I freed the allocated memory. After freeing, i used again sbrk(0) system call to view the heap size. This is where i little confused. In solaris, eventhough, i freed the memory allocated (of nearly 1.5GB) the heap size of the process is huge. But i run the same application in linux, after freeing, i found that the heap size of the process is equal to the size of the heap memory before allocation of 1.5GB. I know Solaris does not frees memory immediately, but i don't know how to tune the solaris kernel to immediately free the memory after free() system call. Also, please explain why the same problem does not comes under Linux? Can anyone help me out of this? Thanks, Santhosh.

    Read the article

  • Rails: associate model with two of its kind

    - by Patrick Oscity
    hi all, im trying to do this: class Event < ActiveRecord::Base belongs_to :previous_event has_one :event, :as => :previous_event, :foreign_key => "previous_event_id" belongs_to :next_event has_one :event, :as => :next_event, :foreign_key => "next_event_id" end because i want to enable the user to repeat events and change multiple oncoming events at the same time. what am i doing wrong, or is there another way of doing this? somehow i need to know about the previous and the next event, don't i? when i'm testing this in the consolewith Event.all[1].previous_event, i get the following error: NameError: uninitialized constant Event::PreviousEvent from /Library/Ruby/Gems/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:105:in `const_missing' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:2199:in `compute_type' from /Library/Ruby/Gems/1.8/gems/activesupport-2.3.5/lib/active_support/core_ext/kernel/reporting.rb:11:in `silence_warnings' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/base.rb:2195:in `compute_type' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/reflection.rb:156:in `send' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/reflection.rb:156:in `klass' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/belongs_to_association.rb:49:in `find_target' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_proxy.rb:239:in `load_target' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations/association_proxy.rb:112:in `reload' from /Library/Ruby/Gems/1.8/gems/activerecord-2.3.5/lib/active_record/associations.rb:1250:in `previous_event' from (irb):2 what is going wrong here? thanks a ton for your help.

    Read the article

  • .NET Best Way to move many files to and from various directories??

    - by Dan
    I've created a program that moves files to and from various directories. An issue I've come across is when you're trying to move a file and some other program is still using it. And you get an error. Leaving it there isn't an option, so I can only think of having to keep trying to move it over and over again. This though slows the entire program down, so I create a new thread and let it deal with the problem file and move on to the next. The bigger problem is when you have too many of these problem files and the program now has so many threads trying to move these files, that it just crashes with some kernel.dll error. Here's a sample of the code I use to move the files: Public Sub MoveIt() Try File.Move(_FileName, _CopyToFileName) Catch ex As Exception Threading.Thread.Sleep(5000) MoveIt() End Try End Sub As you can see.. I try to move the file, and if it errors, I wait and move it again.. over and over again.. I've tried using FileInfo as well, but that crashes WAY sooner than just using the File object. So has anyone found a fool proof way of moving files without it ever erroring? Note: it takes a lot of files to make it crash. It'll be fine on the weekend, but by the end of the day on monday, it's done.

    Read the article

  • How to properly cast a global memory array using the uint4 vector in CUDA to increase memory throughput?

    - by charis
    There are generally two techniques to increase the memory throughput of the global memory on a CUDA kernel; memory accesses coalescence and accessing words of at least 4 bytes. With the first technique accesses to the same memory segment by threads of the same half-warp are coalesced to fewer transactions while be accessing words of at least 4 bytes this memory segment is effectively increased from 32 bytes to 128. To access 16-byte instead of 1-byte words when there are unsigned chars stored in the global memory, the uint4 vector is commonly used by casting the memory array to uint4: uint4 *text4 = ( uint4 * ) d_text; var = text4[i]; In order to extract the 16 chars from var, i am currently using bitwise operations. For example: s_array[j * 16 + 0] = var.x & 0x000000FF; s_array[j * 16 + 1] = (var.x >> 8) & 0x000000FF; s_array[j * 16 + 2] = (var.x >> 16) & 0x000000FF; s_array[j * 16 + 3] = (var.x >> 24) & 0x000000FF; My question is, is it possible to recast var (or for that matter *text4) to unsigned char in order to avoid the additional overhead of the bitwise operations?

    Read the article

  • cuda 5.0 namespaces for contant memory variable usage

    - by Psypher
    In my program I want to use a structure containing constant variables and keep it on device all long as the program executes to completion. I have several header files containing the declaration of 'global' functions and their respective '.cu' files for their definitions. I kept this scheme because it helps me contain similar code in one place. e.g. all the 'device' functions required to complete 'KERNEL_1' are separated from those 'device' functions required to complete 'KERNEL_2' along with kernels definitions. I had no problems with this scheme during compilation and linking. Until I encountered constant variables. I want to use the same constant variable through all kernels and device functions but it doesn't seem to work. ########################################################################## CODE EXAMPLE ########################################################################### filename: 'common.h' -------------------------------------------------------------------------- typedef struct { double height; double weight; int age; } __CONSTANTS; __constant__ __CONSTANTS d_const; --------------------------------------------------------------------------- filename: main.cu --------------------------------------------------------------------------- #include "common.h" #include "gpukernels.h" int main(int argc, char **argv) { __CONSTANTS T; T.height = 1.79; T.weight = 73.2; T.age = 26; cudaMemcpyToSymbol(d_consts, &T, sizeof(__CONSTANTS)); test_kernel <<< 1, 16 >>>(); cudaDeviceSynchronize(); } --------------------------------------------------------------------------- filename: gpukernels.h --------------------------------------------------------------------------- __global__ void test_kernel(); --------------------------------------------------------------------------- filename: gpukernels.cu --------------------------------------------------------------------------- #include <stdio.h> #include "gpukernels.h" #include "common.h" __global__ void test_kernel() { printf("Id: %d, height: %f, weight: %f\n", threadIdx.x, d_const.height, d_const.weight); } When I execute this code, the kernel executes, displays the thread ids, but the constant values are displayed as zeros. How can I fix this?

    Read the article

  • Setting pixel values in Nvidia NPP ImageCPU objects?

    - by solvingPuzzles
    In the Nvidia Performance Primitives (NPP) image processing examples in the CUDA SDK distribution, images are typically stored on the CPU as ImageCPU objects, and images are stored on the GPU as ImageNPP objects. boxFilterNPP.cpp is an example from the CUDA SDK that uses these ImageCPU and ImageNPP objects. When using a filter (convolution) function like nppiFilter, it makes sense to define a filter as an ImageCPU object. However, I see no clear way setting the values of an ImageCPU object. npp::ImageCPU_32f_C1 hostKernel(3,3); //allocate space for 3x3 convolution kernel //want to set hostKernel to [-1 0 1; -1 0 1; -1 0 1] hostKernel[0][0] = -1; //this doesn't compile hostKernel(0,0) = -1; //this doesn't compile hostKernel.at(0,0) = -1; //this doesn't compile How can I manually put values into an ImageCPU object? Notes: I didn't actually use nppiFilter in the code snippet; I'm just mentioning nppiFilter as a motivating example for writing values into an ImageCPU object. The boxFilterNPP.cpp example doesn't involve writing directly to an ImageCPU object, because nppiFilterBox is a special case of nppiFilter that uses a built-in gaussian smoothing filter (probably something like [1 1 1; 1 1 1; 1 1 1]).

    Read the article

  • Slowing process creation under Java?

    - by oconnor0
    I have a single, large heap (up to 240GB, though in the 20-40GB range for most of this phase of execution) JVM [1] running under Linux [2] on a server with 24 cores. We have tens of thousands of objects that have to be processed by an external executable & then load the data created by those executables back into the JVM. Each executable produces about half a megabyte of data (on disk) that when read right in, after the process finishes, is, of course, larger. Our first implementation was to have each executable handle only a single object. This involved the spawning of twice as many executables as we had objects (since we called a shell script that called the executable). Our CPU utilization would start off high, but not necessarily 100%, and slowly worsen. As we began measuring to see what was happening we noticed that the process creation time [3] continually slows. While starting at sub-second times it would eventually grow to take a minute or more. The actual processing done by the executable usually takes less than 10 seconds. Next we changed the executable to take a list of objects to process in an attempt to reduce the number of processes created. With batch sizes of a few hundred (~1% of our current sample size), the process creation times start out around 2 seconds & grow to around 5-6 seconds. Basically, why is it taking so long to create these processes as execution continues? [1] Oracle JDK 1.6.0_22 [2] Red Hat Enterprise Linux Advanced Platform 5.3, Linux kernel 2.6.18-194.26.1.el5 #1 SMP [3] Creation of the ProcessBuilder object, redirecting the error stream, and starting it.

    Read the article

  • Internet Explorer 8 crashes on Citrix, Windows 2003

    - by Workshop Alex
    Difficult to decide where this Q fits best. It's server-related and programming-related. But it's a user-problem so I'll put it here, first... I work on a (Delphi) application that uses an Internet Explorer component to show information to the user. It's not a web application, just a desktop application which creates HTML pages to display them within a browser component. Some of the information on these webpages are retrieved from a web server, while other information is provided "live" by the application itself. It works quite well, but it adds an IE-process (child process) next to my application. And this IE process seems to eat a lot of system resources. For normal users, this is not a real problem, so it's not an issue that I want to fix in the code. But one customer of this application uses it with about 100 users on a Citrix/Windows 2003 environment and they complain about problems with the application. IE8 tends to crash, not show, hang or cause other mayhap. Then again, I've warned them that -officially- I won't support any Citrix environment. But I'm willing to help them to find a solution to fix this, and if need be I could make minor changes to my code to help fix this issie. (If possible.) But basically, I need a solution that any user/administrator on this Citrix environment can follow/use. Any ideas on how to resolve this resource problem?

    Read the article

  • SBS 2008 BPA Warnings After Migration From SBS 2003

    - by Nicholas Piasecki
    We just finished a we-know-just-enough-to-be-dangerous migration from SBS 2003 to SBS 2008, and things seem to have gone relatively smoothly. After running the SBS 2008 Best Practices Analyzer on the destination server, we've got three warning messages, and I can't tell if they're important or not. First, the easy one: SMTP Port (TCP 25 Status): The Edgetransport.exe process should listen on SMTP port 25, but that port is owned by the process. I don't think that this one is a big deal--e-mail is flowing through the SMTP connector. Since there are two spaces between "the" and "process," I'm assuming that for some reason BPA just couldn't figure out the owning process name and this is just some sloppy programming when displaying the message. (Indeed, on subsequent runs of the BPA this message goes away, and other times it comes back.) Now, two more scary sounding ones: No DNS name server records: There are no DNS name server (NS) resource records in the _msdcs sub-domain in the forward lookup zone for Windows SBS 2008. and, similarly, No DNS name server records: There are no DNS name server (NS) resource records in the _msdcs zone for Windows SBS 2008. Now for these two, everything appears to be functioning correctly--but I'm assuming this is a weird state as a result of the SBS 2003 to 2008 migration. Can anyone provide any pointers on how to fix it, or whether or not it can be safely ignored? Thanks!

    Read the article

< Previous Page | 590 591 592 593 594 595 596 597 598 599 600 601  | Next Page >