Search Results

Search found 48586 results on 1944 pages for 'page performance'.

Page 275/1944 | < Previous Page | 271 272 273 274 275 276 277 278 279 280 281 282  | Next Page >

  • Why aren't we programming on the GPU???

    - by Chris
    So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the same algorithm on my GPU: 104.7 fps And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations: Serial implemetation on my home PC: 81.2 seconds All 4 CPU cores on my home PC (OpenMP): 24.5 seconds 32 processors on my school's super computer (MPI with master-worker): 1.92 seconds My home GPU (CUDA): 0.310 seconds 4 GPUs on my school's super computer (CUDA with static domain decomposition): 0.0547 seconds So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it. Also, what kinds of other speedups have you seen by offloading your computations to the GPU?

    Read the article

  • Slow query with unexpected index scan

    - by zerkms
    Hello I have this query: SELECT * FROM sample INNER JOIN test ON sample.sample_number = test.sample_number INNER JOIN result ON test.test_number = result.test_number WHERE sampled_date BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00' the biggest table here is RESULT, contains 11.1M records. The left 2 tables about 1M. this query works slowly (more than 10 minutes) and returns about 800 records. executing plan shows clustered index scan (over it's PRIMARY KEY (result.result_number, which actually doesn't take part in query)) over all 11M records. RESULT.TEST_NUMBER is a clustered primary key. if I change 2010-03-17 09:00 to 2010-03-17 10:00 - i get about 40 records. it executes for 300ms. and plan shows index seek (over result.test_number index) if i replace * in SELECT clause to result.test_number (covered with index) - then all become fast in first case too. this points to hdd IO issues, but doesn't clarifies changing plan. so, any ideas? UPDATE: sampled_date is in table sample and covered by index. other fields from this query: test.sample_number is covered by index and result.test_number too. UPDATE 2: obviously than sql server in any reasons don't want to use index. i did a small experiment: i remove INNER JOIN with result, select all test.test_number and after that do SELECT * FROM RESULT WHERE TEST_NUMBER IN (...) this, of course, works fast. but i cannot get what is the difference and why query optimizer choose such inappropriate way to select data in 1st case.

    Read the article

  • Was Joel right about XML being slow?

    - by Will
    A long time ago Joel explained how various every-day coding things were slow, and this led to XML as a data store being slow: http://www.joelonsoftware.com/articles/fog0000000319.html Are those every-day coding things - strcat and malloc - still slow in a std::string and dlmalloc world? What else has changed in modern processors and mainstream frameworks? And is XML still slow? You can't find an RDBMS that doesn't claim some kind of native XML support these days; haven't they got it faster - a single pass to index it for example - yet?

    Read the article

  • jQuery datepicker causes page overflow

    - by nc3b
    I am using the datepicker control from jQuery-ui 1.8. from-date is a text input. I am attaching a very simple datepicker: $('#from-date').datepicker(); This causes the page to overflow (vertical scrollbar), which I am trying to avoid. As soon as I click the from-date, the datepicker control appears, and the scrollbar dissapears. After dismissing the datepicker, the scrollbar doesn't appear anymore. The text field is inside a div that has overflow:auto and a fixed height and width. I suspect it's a z-index issue. What am I doing wrong ? How would I debug this ?

    Read the article

  • WPF slow to start on x64 in .NET Framework 4.0

    - by Robert Fraser
    I've noticed that if I build my WPF application for Any CPU/x64, it takes MUCH longer to start (on the order of about 20 seconds) or to load new controls than it does if started on x86 (in release & debug modes, inside or outside of VS). This occurs with even the simplest WPF apps. The problem is discussed in this MSDN thread, but no answer was provided there. This happens only with .NET 4.0 -- in 3.5 SP1, x64 was just as fast as x86. Interestingly, Microsoft seems to know about this problem since the default for a new WPF project in VS2010 is x86. Is this a real bug or am I just doing it wrong? EDIT: Possibly related to this: http://stackoverflow.com/questions/2788215/slow-databinding-setup-time-in-c-net-4-0. I'm using data binding heavily.

    Read the article

  • Problem with room/screen/menu controller in python game: old rooms are not removed from memory

    - by Jordan Magnuson
    I'm literally banging my head against a wall here (as in, yes, physically, at my current location, I am damaging my cranium). Basically, I've got a Python/Pygame game with some typical game "rooms", or "screens." EG title screen, high scores screen, and the actual game room. Something bad is happening when I switch between rooms: the old room (and its various items) are not removed from memory, or from my event listener. Not only that, but every time I go back to a certain room, my number of event listeners increases, as well as the RAM being consumed! (So if I go back and forth between the title screen and the "game room", for instance, the number of event listeners and the memory usage just keep going up and up. The main issue is that all the event listeners start to add up and really drain the CPU. I'm new to Python, and don't know if I'm doing something obviously wrong here, or what. I will love you so much if you can help me with this! Below is the relevant source code. Complete source code at http://www.necessarygames.com/my_games/betraveled/betraveled_src0328.zip MAIN.PY class RoomController(object): """Controls which room is currently active (eg Title Screen)""" def __init__(self, screen, ev_manager): self.room = None self.screen = screen self.ev_manager = ev_manager self.ev_manager.register_listener(self) self.room = self.set_room(config.room) def set_room(self, room_const): #Unregister old room from ev_manager if self.room: self.room.ev_manager.unregister_listener(self.room) self.room = None #Set new room based on const if room_const == config.TITLE_SCREEN: return rooms.TitleScreen(self.screen, self.ev_manager) elif room_const == config.GAME_MODE_ROOM: return rooms.GameModeRoom(self.screen, self.ev_manager) elif room_const == config.GAME_ROOM: return rooms.GameRoom(self.screen, self.ev_manager) elif room_const == config.HIGH_SCORES_ROOM: return rooms.HighScoresRoom(self.screen, self.ev_manager) def notify(self, event): if isinstance(event, ChangeRoomRequest): if event.game_mode: config.game_mode = event.game_mode self.room = self.set_room(event.new_room) #Run game def main(): pygame.init() screen = pygame.display.set_mode(config.screen_size) ev_manager = EventManager() spinner = CPUSpinnerController(ev_manager) room_controller = RoomController(screen, ev_manager) pygame_event_controller = PyGameEventController(ev_manager) spinner.run() EVENT_MANAGER.PY class EventManager: #This object is responsible for coordinating most communication #between the Model, View, and Controller. def __init__(self): from weakref import WeakKeyDictionary self.last_listeners = {} self.listeners = WeakKeyDictionary() self.eventQueue= [] self.gui_app = None #---------------------------------------------------------------------- def register_listener(self, listener): self.listeners[listener] = 1 #---------------------------------------------------------------------- def unregister_listener(self, listener): if listener in self.listeners: del self.listeners[listener] #---------------------------------------------------------------------- def clear(self): del self.listeners[:] #---------------------------------------------------------------------- def post(self, event): # if isinstance(event, MouseButtonLeftEvent): # debug(event.name) #NOTE: copying the list like this before iterating over it, EVERY tick, is highly inefficient, #but currently has to be done because of how new listeners are added to the queue while it is running #(eg when popping cards from a deck). Should be changed. See: http://dr0id.homepage.bluewin.ch/pygame_tutorial08.html #and search for "Watch the iteration" print 'Number of listeners: ' + str(len(self.listeners)) for listener in list(self.listeners): #NOTE: If the weakref has died, it will be #automatically removed, so we don't have #to worry about it. listener.notify(event) def notify(self, event): pass #------------------------------------------------------------------------------ class PyGameEventController: """...""" def __init__(self, ev_manager): self.ev_manager = ev_manager self.ev_manager.register_listener(self) self.input_freeze = False #---------------------------------------------------------------------- def notify(self, incoming_event): if isinstance(incoming_event, UserInputFreeze): self.input_freeze = True elif isinstance(incoming_event, UserInputUnFreeze): self.input_freeze = False elif isinstance(incoming_event, TickEvent) or isinstance(incoming_event, BoardCreationTick): #Share some time with other processes, so we don't hog the cpu pygame.time.wait(5) #Handle Pygame Events for event in pygame.event.get(): #If this event manager has an associated PGU GUI app, notify it of the event if self.ev_manager.gui_app: self.ev_manager.gui_app.event(event) #Standard event handling for everything else ev = None if event.type == QUIT: ev = QuitEvent() elif event.type == pygame.MOUSEBUTTONDOWN and not self.input_freeze: if event.button == 1: #Button 1 pos = pygame.mouse.get_pos() ev = MouseButtonLeftEvent(pos) elif event.type == pygame.MOUSEBUTTONDOWN and not self.input_freeze: if event.button == 2: #Button 2 pos = pygame.mouse.get_pos() ev = MouseButtonRightEvent(pos) elif event.type == pygame.MOUSEBUTTONUP and not self.input_freeze: if event.button == 2: #Button 2 Release pos = pygame.mouse.get_pos() ev = MouseButtonRightReleaseEvent(pos) elif event.type == pygame.MOUSEMOTION: pos = pygame.mouse.get_pos() ev = MouseMoveEvent(pos) #Post event to event manager if ev: self.ev_manager.post(ev) # elif isinstance(event, BoardCreationTick): # #Share some time with other processes, so we don't hog the cpu # pygame.time.wait(5) # # #If this event manager has an associated PGU GUI app, notify it of the event # if self.ev_manager.gui_app: # self.ev_manager.gui_app.event(event) #------------------------------------------------------------------------------ class CPUSpinnerController: def __init__(self, ev_manager): self.ev_manager = ev_manager self.ev_manager.register_listener(self) self.clock = pygame.time.Clock() self.cumu_time = 0 self.keep_going = True #---------------------------------------------------------------------- def run(self): if not self.keep_going: raise Exception('dead spinner') while self.keep_going: time_passed = self.clock.tick() fps = self.clock.get_fps() self.cumu_time += time_passed self.ev_manager.post(TickEvent(time_passed, fps)) if self.cumu_time >= 1000: self.cumu_time = 0 self.ev_manager.post(SecondEvent(fps=fps)) pygame.quit() #---------------------------------------------------------------------- def notify(self, event): if isinstance(event, QuitEvent): #this will stop the while loop from running self.keep_going = False EXAMPLE CLASS USING EVENT MANAGER class Timer(object): def __init__(self, ev_manager, time_left): self.ev_manager = ev_manager self.ev_manager.register_listener(self) self.time_left = time_left self.paused = False def __repr__(self): return str(self.time_left) def pause(self): self.paused = True def unpause(self): self.paused = False def notify(self, event): #Pause Event if isinstance(event, Pause): self.pause() #Unpause Event elif isinstance(event, Unpause): self.unpause() #Second Event elif isinstance(event, SecondEvent): if not self.paused: self.time_left -= 1

    Read the article

  • Displaying JFreeChart in a web page using Struts2

    - by Kingshuk
    I am using Struts2. I need to display JFreeChart in a web page. Can any body help me on that? Edit: it is getting displayed in binary format. public String execute() throws Exception { System.out.println("Refresh bar Chart"); response.setContentType("image/png"); OutputStream outstream = response.getOutputStream(); try { JFreeChart chart = getChartViewer(); ChartUtilities.writeChartAsPNG(outstream, chart, 500, 300); System.out.println("Created bar Chart"); return SUCCESS; } finally { outstream.close(); response.flushBuffer(); } }

    Read the article

  • Fancybox, getting Fancybox to bind using LIVE() to items being loaded onto the page after load

    - by nobosh
    I have a page that loads and after it loads, it pulls in a list of LIs to populate a news feed. quick view quick view quick view I'm trying to get fancy box to trigger when a user clicks on quick view but haven't had any luck. Any Ideas? $(document).ready(function() { $('.quickview').fancybox(); }); also tried: $(document).ready(function() { $('a.quickview').live('click', function() { $(this).fancybox(); }); }); http://fancybox.net/ Thanks for any ideas...

    Read the article

  • SQL Quey slow in .NET application but instantaneous in SQL Server Management Studio

    - by user203882
    Here is the SQL SELECT tal.TrustAccountValue FROM TrustAccountLog AS tal INNER JOIN TrustAccount ta ON ta.TrustAccountID = tal.TrustAccountID INNER JOIN Users usr ON usr.UserID = ta.UserID WHERE usr.UserID = 70402 AND ta.TrustAccountID = 117249 AND tal.trustaccountlogid = ( SELECT MAX (tal.trustaccountlogid) FROM TrustAccountLog AS tal INNER JOIN TrustAccount ta ON ta.TrustAccountID = tal.TrustAccountID INNER JOIN Users usr ON usr.UserID = ta.UserID WHERE usr.UserID = 70402 AND ta.TrustAccountID = 117249 AND tal.TrustAccountLogDate < '3/1/2010 12:00:00 AM' ) Basicaly there is a Users table a TrustAccount table and a TrustAccountLog table. Users: Contains users and their details TrustAccount: A User can have multiple TrustAccounts. TrustAccountLog: Contains an audit of all TrustAccount "movements". A TrustAccount is associated with multiple TrustAccountLog entries. Now this query executes in milliseconds inside SQL Server Management Studio, but for some strange reason it takes forever in my C# app and even timesout (120s) sometimes. Here is the code in a nutshell. It gets called multiple times in a loop and the statement gets prepared. cmd.CommandTimeout = Configuration.DBTimeout; cmd.CommandText = "SELECT tal.TrustAccountValue FROM TrustAccountLog AS tal INNER JOIN TrustAccount ta ON ta.TrustAccountID = tal.TrustAccountID INNER JOIN Users usr ON usr.UserID = ta.UserID WHERE usr.UserID = @UserID1 AND ta.TrustAccountID = @TrustAccountID1 AND tal.trustaccountlogid = (SELECT MAX (tal.trustaccountlogid) FROM TrustAccountLog AS tal INNER JOIN TrustAccount ta ON ta.TrustAccountID = tal.TrustAccountID INNER JOIN Users usr ON usr.UserID = ta.UserID WHERE usr.UserID = @UserID2 AND ta.TrustAccountID = @TrustAccountID2 AND tal.TrustAccountLogDate < @TrustAccountLogDate2 ))"; cmd.Parameters.Add("@TrustAccountID1", SqlDbType.Int).Value = trustAccountId; cmd.Parameters.Add("@UserID1", SqlDbType.Int).Value = userId; cmd.Parameters.Add("@TrustAccountID2", SqlDbType.Int).Value = trustAccountId; cmd.Parameters.Add("@UserID2", SqlDbType.Int).Value = userId; cmd.Parameters.Add("@TrustAccountLogDate2", SqlDbType.DateTime).Value =TrustAccountLogDate; // And then... reader = cmd.ExecuteReader(); if (reader.Read()) { double value = (double)reader.GetValue(0); if (System.Double.IsNaN(value)) return 0; else return value; } else return 0;

    Read the article

  • Code Trivia: optimize the code for multiple nested loops

    - by CodeToGlory
    I came across this code today and wondering what are some of the ways we can optimize it. Obviously the model is hard to change as it is legacy, but interested in getting opinions. Changed some names around and blurred out some core logic to protect. private static Payment FindPayment(Order order, Customer customer, int paymentId) { Payment payment = Order.Payments.FindById(paymentId); if (payment != null) { if (payment.RefundPayment == null) { return payment; } if (String.Compare(payment.RefundPayment, "refund", true) != 0 ) { return payment; } } Payment finalPayment = null; foreach (Payment testpayment in Order.payments) { if (testPayment.Customer.Name != customer.Name){continue;} if (testPayment.Cancelled) { continue; } if (testPayment.RefundPayment != null) { if (String.Compare(testPayment.RefundPayment, "refund", true) == 0 ) { continue; } } if (finalPayment == null) { finalPayment = testPayment; } else { if (testPayment.Value > finalPayment.Value) { finalPayment = testPayment; } } } if (finalPayment == null) { return payment; } return finalPayment; } Making this a wiki so code enthusiasts can answer without worrying about points.

    Read the article

  • Need to center image in web page via CSS

    - by Robot
    I'd like to center an image in a page both vertically and horizontally even when the browser is resized. Currently, I use this CSS: .centeredImage { position: fixed; top: 50%; left: 50%; margin-top: -50px; margin-left: -150px; } And this HTML: <img class="centeredImage" src="images/logo.png"> It centers in FF but not IE (image center is placed at upper left corner). Any ideas? -Robot

    Read the article

  • Fastest gap sequence for shell sort ?

    - by Tony
    According to Marcin Ciura's Optimal (best known) sequence of increments for shell sort algorithm. The best sequence for shellsort is 1, 4, 10, 23, 57, 132, 301, 701... But how can I generate such a sequence ? In Marcin Ciura's paper he said : Both Knuth’s and Hibbard’s sequences are relatively bad, because they are defined by simple linear recurrences but most algorithm books I searched , they all tend to use Knuth’s sequence : k = 3k + 1 ; because it's easy to generate , what's your way of generating shellsort sequence ?

    Read the article

  • Any recommended java profiling tutorial?

    - by Wing C. Chen
    Is there any recommended java application profiling tutorial? I am now using jProfiler and eclipse TPTP with my profiling. However, although equipped with wonderful weapons, as a newbie in java profiling, I am still missing the general theory and skill in pinpointing the bottleneck. So would you please provide me with some recommended tutorial for java profiling?

    Read the article

  • Working with a large data object between ruby processes

    - by Gdeglin
    I have a Ruby hash that reaches approximately 10 megabytes if written to a file using Marshal.dump. After gzip compression it is approximately 500 kilobytes. Iterating through and altering this hash is very fast in ruby (fractions of a millisecond). Even copying it is extremely fast. The problem is that I need to share the data in this hash between Ruby on Rails processes. In order to do this using the Rails cache (file_store or memcached) I need to Marshal.dump the file first, however this incurs a 1000 millisecond delay when serializing the file and a 400 millisecond delay when serializing it. Ideally I would want to be able to save and load this hash from each process in under 100 milliseconds. One idea is to spawn a new Ruby process to hold this hash that provides an API to the other processes to modify or process the data within it, but I want to avoid doing this unless I'm certain that there are no other ways to share this object quickly. Is there a way I can more directly share this hash between processes without needing to serialize or deserialize it? Here is the code I'm using to generate a hash similar to the one I'm working with: @a = [] 0.upto(500) do |r| @a[r] = [] 0.upto(10_000) do |c| if rand(10) == 0 @a[r][c] = 1 # 10% chance of being 1 else @a[r][c] = 0 end end end @c = Marshal.dump(@a) # 1000 milliseconds Marshal.load(@c) # 400 milliseconds Update: Since my original question did not receive many responses, I'm assuming there's no solution as easy as I would have hoped. Presently I'm considering two options: Create a Sinatra application to store this hash with an API to modify/access it. Create a C application to do the same as #1, but a lot faster. The scope of my problem has increased such that the hash may be larger than my original example. So #2 may be necessary. But I have no idea where to start in terms of writing a C application that exposes an appropriate API. A good walkthrough through how best to implement #1 or #2 may receive best answer credit.

    Read the article

  • Set username credential for a new channel without creating a new factory

    - by Ramon
    I have a backend service and front-end services. They communicate via the trusted subsystem pattern. I want to transfer a username from the frontend to the backend and do this via username credentials as found here: http://msdn.microsoft.com/en-us/library/ms730288.aspx This does not work in our scenerio where the front-end builds a backend service channel factory via: channelFactory = new ChannelFactory<IBackEndService>(.....); Creating a new channel is done via die channel factory. I can only set the credentials one time after that I get an exception that the username object is read-only. channelFactory.Credentials.Username.Username = "myCoolFrontendUser"; var channel = channelFactory.CreateChannel(); Is there a way to create the channel factory only one time as this is expensive to create and then specify username credential when creating a channel?

    Read the article

  • Building a directory tree from a list of file paths

    - by Abignale
    I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths. The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow. The input data is usually sorted alphabetically, so the list would be something like: C:\Users\Aaron\AppData\Amarok\Afile C:\Users\Aaron\AppData\Amarok\Afile2 C:\Users\Aaron\AppData\Amarok\Afile3 C:\Users\Aaron\AppData\Blender\alibrary.dll C:\Users\Aaron\AppData\Blender\and_so_on.txt From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas. Edit: It would be better if this tree were lazy loaded from the top down if possible.

    Read the article

  • VisualStudio 2010 Settings Page - Collection Settings

    - by Ed Eichman
    I have a list of Filemaker database names Each DB name has a list of field/attribute pairs associated with it I have a windows form application in C# 4.0 (vs2010) that wants to use the above data I would like to maintain the list either in the Visual Studio settings page, or in one of the standard visual studio settings files using the standard .NET settings calls I would like to avoid writing my own custom settings, xml, xds (to avoid the "Could not find schema information for the element/attribute " errors) I just have a slightly complicated INI file! I don't want to complicate my life! Do any easy solutions exist? Unless someone has a brighter idea, I am simply going to write string settings with names that indicate it's a FM DB (e.g. "fmdbAddresses"), and values that concat my field/attribute pairs (e.g. "gUserResult=skipField|gAddressID=convertToInt|gAddressID=uniqueIx")

    Read the article

  • Problem with WPF perfomance.

    - by Polaris
    I have form which has many tabs. Every tab has many controls textboxes, comboboxes, datagrids and e .t.c. I bind form to one data source in such way this.DataContext=MyClassInstance But with this way my form opening very slow. about one minute. When I comment above code, form opens very quickly. All My controls I bound to the class properties in XAML. Please tell me the way to bind every tab when it's activated, or bind controls in background thread or any other idea which can help me to speed up my form. Thanks in advance.

    Read the article

  • Perfmon File Analysis Tools

    - by Daniel Pollard
    I have a bunch of perfmon files that have captured information over a period of time. Whats the best tool to crunch this information? Idealy I'd like to be able to see avg stats per hour for the object counters that have been monitored.

    Read the article

  • Customizing Number of Page Links Rendered With will_paginate?

    - by Moe
    I'm using the will_paginate gem for my Rails project and it's working beautifully. Unfortunately, on paginated result sets with a larger number of pages, the link section is too wide. Instead of: « Previous 1 2 … 5 6 7 8 9 10 11 12 13 … 18 19 Next » I would like to show: « Previous 1 2 … 5 6 7 8 9 … 18 19 Next » How can I reduce the number of page links being rendered? I looked at the will_paginate docs on github but couldn't find a solution. Thanks! Moe

    Read the article

  • CUDA: Memory copy to GPU 1 is slower in multi-GPU

    - by zenna
    My company has a setup of two GTX 295, so a total of 4 GPUs in a server, and we have several servers. We GPU 1 specifically was slow, in comparison to GPU 0, 2 and 3 so I wrote a little speed test to help find the cause of the problem. //#include <stdio.h> //#include <stdlib.h> //#include <cuda_runtime.h> #include <iostream> #include <fstream> #include <sstream> #include <string> #include <cutil.h> __global__ void test_kernel(float *d_data) { int tid = blockDim.x*blockIdx.x + threadIdx.x; for (int i=0;i<10000;++i) { d_data[tid] = float(i*2.2); d_data[tid] += 3.3; } } int main(int argc, char* argv[]) { int deviceCount; cudaGetDeviceCount(&deviceCount); int device = 0; //SELECT GPU HERE cudaSetDevice(device); cudaEvent_t start, stop; unsigned int num_vals = 200000000; float *h_data = new float[num_vals]; for (int i=0;i<num_vals;++i) { h_data[i] = float(i); } float *d_data = NULL; float malloc_timer; cudaEventCreate(&start); cudaEventCreate(&stop); cudaEventRecord( start, 0 ); cudaMemcpy(d_data, h_data, sizeof(float)*num_vals,cudaMemcpyHostToDevice); cudaMalloc((void**)&d_data, sizeof(float)*num_vals); cudaEventRecord( stop, 0 ); cudaEventSynchronize( stop ); cudaEventElapsedTime( &malloc_timer, start, stop ); cudaEventDestroy( start ); cudaEventDestroy( stop ); float mem_timer; cudaEventCreate(&start); cudaEventCreate(&stop); cudaEventRecord( start, 0 ); cudaMemcpy(d_data, h_data, sizeof(float)*num_vals,cudaMemcpyHostToDevice); cudaEventRecord( stop, 0 ); cudaEventSynchronize( stop ); cudaEventElapsedTime( &mem_timer, start, stop ); cudaEventDestroy( start ); cudaEventDestroy( stop ); float kernel_timer; cudaEventCreate(&start); cudaEventCreate(&stop); cudaEventRecord( start, 0 ); test_kernel<<<1000,256>>>(d_data); cudaEventRecord( stop, 0 ); cudaEventSynchronize( stop ); cudaEventElapsedTime( &kernel_timer, start, stop ); cudaEventDestroy( start ); cudaEventDestroy( stop ); printf("cudaMalloc took %f ms\n",malloc_timer); printf("Copy to the GPU took %f ms\n",mem_timer); printf("Test Kernel took %f ms\n",kernel_timer); cudaMemcpy(h_data,d_data, sizeof(float)*num_vals,cudaMemcpyDeviceToHost); delete[] h_data; return 0; } The results are GPU0 cudaMalloc took 0.908640 ms Copy to the GPU took 296.058777 ms Test Kernel took 326.721283 ms GPU1 cudaMalloc took 0.913568 ms Copy to the GPU took[b] 663.182251 ms[/b] Test Kernel took 326.710785 ms GPU2 cudaMalloc took 0.925600 ms Copy to the GPU took 296.915039 ms Test Kernel took 327.127930 ms GPU3 cudaMalloc took 0.920416 ms Copy to the GPU took 296.968384 ms Test Kernel took 327.038696 ms As you can see, the cudaMemcpy to the GPU is well double the amount of time for GPU1. This is consistent between all our servers, it is always GPU1 that is slow. Any ideas why this may be? All servers are running windows XP.

    Read the article

< Previous Page | 271 272 273 274 275 276 277 278 279 280 281 282  | Next Page >