threads - Page 46 - Developer IT

Java: "implements Runnable" vs. "extends Thread"

- by user65374

From what time I've spent with threads in Java, I've found these two ways to write threads. public class ThreadA implements Runnable { public void run() { //Code } } //with a "new Thread(threadA).start()" call public class ThreadB extends Thread { public ThreadB() { super("ThreadB"); } public void run() { //Code } } //with a "threadB.start()" call Is there any significant difference in these two blocks of code?

Read the article

ThreadQueue problems in "Accelerated C# 2008"

- by Singlet

Example for threading queue book "Accelerated C# 2008" (CrudeThreadPool class) not work correctly. If I insert long job in WorkFunction() on 2-processor machine executing for next task don't run before first is over. How to solve this problem? I want to load the processor to 100 percent public class CrudeThreadPool { static readonly int MAX_WORK_THREADS = 4; static readonly int WAIT_TIMEOUT = 2000; public delegate void WorkDelegate(); public CrudeThreadPool() { stop = 0; workLock = new Object(); workQueue = new Queue(); threads = new Thread[ MAX_WORK_THREADS ]; for( int i = 0; i < MAX_WORK_THREADS; ++i ) { threads[i] = new Thread( new ThreadStart(this.ThreadFunc) ); threads[i].Start(); } } private void ThreadFunc() { lock( workLock ) { int shouldStop = 0; do { shouldStop = Interlocked.Exchange( ref stop, stop ); if( shouldStop == 0 ) { WorkDelegate workItem = null; if( Monitor.Wait(workLock, WAIT_TIMEOUT) ) { // Process the item on the front of the queue lock( workQueue ) { workItem =(WorkDelegate) workQueue.Dequeue(); } workItem(); } } } while( shouldStop == 0 ); } } public void SubmitWorkItem( WorkDelegate item ) { lock( workLock ) { lock( workQueue ) { workQueue.Enqueue( item ); } Monitor.Pulse( workLock ); } } public void Shutdown() { Interlocked.Exchange( ref stop, 1 ); } private Queue workQueue; private Object workLock; private Thread[] threads; private int stop; } public class EntryPoint { static void WorkFunction() { Console.WriteLine( "WorkFunction() called on Thread {0}",Thread.CurrentThread.GetHashCode() ); //some long job double s = 0; for (int i = 0; i < 100000000; i++) s += Math.Sin(i); } static void Main() { CrudeThreadPool pool = new CrudeThreadPool(); for( int i = 0; i < 10; ++i ) { pool.SubmitWorkItem( new CrudeThreadPool.WorkDelegate( EntryPoint.WorkFunction) ); } pool.Shutdown(); } }

Read the article

how can a global function RETURN a value or BREAK out like C/C++ does

- by user1684726

Recently i've been doing string comparing jobs on CUDA, and i wonder how can a global function return a value when it finds the exact string that i'm looking for. I mean, i need the global function which contains a great amout of threads to find a certain string among a big big string-pool simultaneously, and i hope that once the exact string is caught, the global funtion can stop all the threads and return back to the main funtion, and tells me "he did it"! B.T.W., I'm using CUDA C .How could i possibly achieve that, waiting for help.

Read the article

MySQL Memory usage

- by Rob Stevenson-Leggett

Our MySQL server seems to be using a lot of memory. I've tried looking for slow queries and queries with no index and have halved the peak CPU usage and Apache memory usage but the MySQL memory stays constantly at 2.2GB (~51% of available memory on the server). Here's the graph from Plesk. Running top in the SSH window shows the same figures. Does anyone have any ideas on why the memory usage is constant like this and not peaks and troughs with usage of the app? Here's the output of the MySQL Tuning Primer script: -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.0.77-log x86_64 Uptime = 1 days 14 hrs 4 min 21 sec Avg. qps = 22 Total Questions = 3059456 Threads Connected = 13 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1 sec. You have 6 out of 3059477 that take longer than 1 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html WORKER THREADS Current thread_cache_size = 0 Current threads_cached = 0 Current threads_per_sec = 2 Historic threads_per_sec = 0 Threads created per/sec are overrunning threads cached You should raise thread_cache_size MAX CONNECTIONS Current max_connections = 100 Current threads_connected = 14 Historic max_used_connections = 20 The number of used connections is 20% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS Current InnoDB index space = 6 M Current InnoDB data space = 18 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 8 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 2.07 G Configured Max Per-thread Buffers : 274 M Configured Max Global Buffers : 2.01 G Configured Max Memory Limit : 2.28 G Physical Memory : 3.84 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 4 M Current key_buffer_size = 7 M Key cache miss rate is 1 : 40 Key buffer free ratio = 81 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is supported but not enabled Perhaps you should set the query_cache_size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 132.00 K You have had 16 queries where a join could not use an index properly You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. If you are unable to optimize your queries you may want to increase your join_buffer_size to accommodate larger joins in one pass. Note! This script will still suggest raising the join_buffer_size when ANY joins not using indexes are found. OPEN FILES LIMIT Current open_files_limit = 1024 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_cache value = 64 tables You have a total of 426 tables You have 64 open tables. Current table_cache hit rate is 1% , while 100% of your table cache is in use You should probably increase your table_cache TEMP TABLES Current max_heap_table_size = 16 M Current tmp_table_size = 32 M Of 15134 temp tables, 9% were created on disk Effective in-memory tmp_table_size is limited to max_heap_table_size. Created disk tmp tables ratio seems fine TABLE SCANS Current read_buffer_size = 128 K Current table scan ratio = 2915 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 142213 Your table locking seems to be fine The app is a facebook game with about 50-100 concurrent users. Thanks, Rob

Read the article

Launching a WPF Window in a Separate Thread, Part 1

- by Reed

Typically, I strongly recommend keeping the user interface within an application’s main thread, and using multiple threads to move the actual “work” into background threads. However, there are rare times when creating a separate, dedicated thread for a Window can be beneficial. This is even acknowledged in the MSDN samples, such as the Multiple Windows, Multiple Threads sample. However, doing this correctly is difficult. Even the referenced MSDN sample has major flaws, and will fail horribly in certain scenarios. To ease this, I wrote a small class that alleviates some of the difficulties involved. The MSDN Multiple Windows, Multiple Threads Sample shows how to launch a new thread with a WPF Window, and will work in most cases. The sample code (commented and slightly modified) works out to the following: // Create a thread Thread newWindowThread = new Thread(new ThreadStart( () => { // Create and show the Window Window1 tempWindow = new Window1(); tempWindow.Show(); // Start the Dispatcher Processing System.Windows.Threading.Dispatcher.Run(); })); // Set the apartment state newWindowThread.SetApartmentState(ApartmentState.STA); // Make the thread a background thread newWindowThread.IsBackground = true; // Start the thread newWindowThread.Start(); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This sample creates a thread, marks it as single threaded apartment state, and starts the Dispatcher on that thread. That is the minimum requirements to get a Window displaying and handling messages correctly, but, unfortunately, has some serious flaws. The first issue – the created thread will run continuously until the application shuts down, given the code in the sample. The problem is that the ThreadStart delegate used ends with running the Dispatcher. However, nothing ever stops the Dispatcher processing. The thread was created as a Background thread, which prevents it from keeping the application alive, but the Dispatcher will continue to pump dispatcher frames until the application shuts down. In order to fix this, we need to call Dispatcher.InvokeShutdown after the Window is closed. This would require modifying the above sample to subscribe to the Window’s Closed event, and, at that point, shutdown the Dispatcher: // Create a thread Thread newWindowThread = new Thread(new ThreadStart( () => { Window1 tempWindow = new Window1(); // When the window closes, shut down the dispatcher tempWindow.Closed += (s,e) => Dispatcher.CurrentDispatcher.BeginInvokeShutdown(DispatcherPriority.Background); tempWindow.Show(); // Start the Dispatcher Processing System.Windows.Threading.Dispatcher.Run(); })); // Setup and start thread as before This eliminates the first issue. Now, when the Window is closed, the new thread’s Dispatcher will shut itself down, which in turn will cause the thread to complete. The above code will work correctly for most situations. However, there is still a potential problem which could arise depending on the content of the Window1 class. This is particularly nasty, as the code could easily work for most windows, but fail on others. The problem is, at the point where the Window is constructed, there is no active SynchronizationContext. This is unlikely to be a problem in most cases, but is an absolute requirement if there is code within the constructor of Window1 which relies on a context being in place. While this sounds like an edge case, it’s fairly common. For example, if a BackgroundWorker is started within the constructor, or a TaskScheduler is built using TaskScheduler.FromCurrentSynchronizationContext() with the expectation of synchronizing work to the UI thread, an exception will be raised at some point. Both of these classes rely on the existence of a proper context being installed to SynchronizationContext.Current, which happens automatically, but not until Dispatcher.Run is called. In the above case, SynchronizationContext.Current will return null during the Window’s construction, which can cause exceptions to occur or unexpected behavior. Luckily, this is fairly easy to correct. We need to do three things, in order, prior to creating our Window: Create and initialize the Dispatcher for the new thread manually Create a synchronization context for the thread which uses the Dispatcher Install the synchronization context Creating the Dispatcher is quite simple – The Dispatcher.CurrentDispatcher property gets the current thread’s Dispatcher and “creates a new Dispatcher if one is not already associated with the thread.” Once we have the correct Dispatcher, we can create a SynchronizationContext which uses the dispatcher by creating a DispatcherSynchronizationContext. Finally, this synchronization context can be installed as the current thread’s context via SynchronizationContext.SetSynchronizationContext. These three steps can easily be added to the above via a single line of code: // Create a thread Thread newWindowThread = new Thread(new ThreadStart( () => { // Create our context, and install it: SynchronizationContext.SetSynchronizationContext( new DispatcherSynchronizationContext( Dispatcher.CurrentDispatcher)); Window1 tempWindow = new Window1(); // When the window closes, shut down the dispatcher tempWindow.Closed += (s,e) => Dispatcher.CurrentDispatcher.BeginInvokeShutdown(DispatcherPriority.Background); tempWindow.Show(); // Start the Dispatcher Processing System.Windows.Threading.Dispatcher.Run(); })); // Setup and start thread as before This now forces the synchronization context to be in place before the Window is created and correctly shuts down the Dispatcher when the window closes. However, there are quite a few steps. In my next post, I’ll show how to make this operation more reusable by creating a class with a far simpler API…

Read the article

Computer crashes on resume from standby almost every time

- by Los Frijoles

I am running Ubuntu 12.04 on a Core i5 2500K and ASRock Z68 Pro3-M motherboard (no graphics card, hd is a WD Green 1TB, and cd drive is some cheap lite-on drive). Since installing 12.04, my computer has been freezing after resume, but not every time. When I start to resume, it starts going normally with a blinking cursor on the screen and then sometimes it will continue on to the gnome 3 unlock screen. Most of the time, however, it will blink for a little bit and then the monitor will flip modes and shut off due to no signal. Pressing keys on the keyboard gets no response (num lock light doesn't respond, Ctrl-Alt-F1 fails to drop it into a terminal, Ctrl-Alt-Backspace doesn't work) and so I assume the computer is crashed. The worst part is, the logs look entirely normal. Here is my system log during one of these crashes and my subsequent hard poweroff and restart: Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-2, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-2, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-1, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[12419]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory Jun 6 21:54:43 kcuzner-desktop udevd[10448]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory Jun 6 22:09:01 kcuzner-desktop CRON[9061]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jun 6 22:17:01 kcuzner-desktop CRON[22142]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 22:39:01 kcuzner-desktop CRON[26909]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jun 6 22:54:21 kcuzner-desktop kernel: [57905.560822] show_signal_msg: 36 callbacks suppressed Jun 6 22:54:21 kcuzner-desktop kernel: [57905.560828] chromium-browse[9139]: segfault at 0 ip 00007f3a78efade0 sp 00007fff7e2d2c18 error 4 in chromium-browser[7f3a76604000+412b000] Jun 6 23:05:43 kcuzner-desktop kernel: [58586.415158] chromium-browse[21025]: segfault at 0 ip 00007f3a78efade0 sp 00007fff7e2d2c18 error 4 in chromium-browser[7f3a76604000+412b000] Jun 6 23:09:01 kcuzner-desktop CRON[13542]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jun 6 23:12:43 kcuzner-desktop kernel: [59006.317590] usb 2-1.7: USB disconnect, device number 8 Jun 6 23:12:43 kcuzner-desktop kernel: [59006.319672] sd 7:0:0:0: [sdg] Synchronizing SCSI cache Jun 6 23:12:43 kcuzner-desktop kernel: [59006.319737] sd 7:0:0:0: [sdg] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jun 6 23:17:01 kcuzner-desktop CRON[26580]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 6 23:19:04 kcuzner-desktop acpid: client connected from 29925[0:0] Jun 6 23:19:04 kcuzner-desktop acpid: 1 client rule loaded Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30131 of process 30131 (n/a) owned by '104' high priority at nice level -11. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 1 threads of 1 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30162 of process 30131 (n/a) owned by '104' RT at priority 5. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 2 threads of 1 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30163 of process 30131 (n/a) owned by '104' RT at priority 5. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 3 threads of 1 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop bluetoothd[1140]: Endpoint registered: sender=:1.239 path=/MediaEndpoint/HFPAG Jun 6 23:19:07 kcuzner-desktop bluetoothd[1140]: Endpoint registered: sender=:1.239 path=/MediaEndpoint/A2DPSource Jun 6 23:19:07 kcuzner-desktop bluetoothd[1140]: Endpoint registered: sender=:1.239 path=/MediaEndpoint/A2DPSink Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Successfully made thread 30166 of process 30166 (n/a) owned by '104' high priority at nice level -11. Jun 6 23:19:07 kcuzner-desktop rtkit-daemon[1835]: Supervising 4 threads of 2 processes of 1 users. Jun 6 23:19:07 kcuzner-desktop pulseaudio[30166]: [pulseaudio] pid.c: Daemon already running. Jun 6 23:19:10 kcuzner-desktop acpid: client 2942[0:0] has disconnected Jun 6 23:19:10 kcuzner-desktop acpid: client 29925[0:0] has disconnected Jun 6 23:19:10 kcuzner-desktop acpid: client connected from 1286[0:0] Jun 6 23:19:10 kcuzner-desktop acpid: 1 client rule loaded Jun 6 23:19:31 kcuzner-desktop bluetoothd[1140]: Endpoint unregistered: sender=:1.239 path=/MediaEndpoint/HFPAG Jun 6 23:19:31 kcuzner-desktop bluetoothd[1140]: Endpoint unregistered: sender=:1.239 path=/MediaEndpoint/A2DPSource Jun 6 23:19:31 kcuzner-desktop bluetoothd[1140]: Endpoint unregistered: sender=:1.239 path=/MediaEndpoint/A2DPSink Jun 6 23:28:12 kcuzner-desktop kernel: imklog 5.8.6, log source = /proc/kmsg started. Jun 6 23:28:12 kcuzner-desktop rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1053" x-info="http://www.rsyslog.com"] start Jun 6 23:28:12 kcuzner-desktop rsyslogd: rsyslogd's groupid changed to 103 Jun 6 23:28:12 kcuzner-desktop rsyslogd: rsyslogd's userid changed to 101 Jun 6 23:28:12 kcuzner-desktop rsyslogd-2039: Could not open output pipe '/dev/xconsole' [try http://www.rsyslog.com/e/2039 ] Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Ericsson MBM Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Sierra Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Generic Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Huawei Jun 6 23:28:12 kcuzner-desktop modem-manager[1070]: <info> Loaded plugin Linktop Jun 6 23:28:12 kcuzner-desktop bluetoothd[1072]: Failed to init gatt_example plugin Jun 6 23:28:12 kcuzner-desktop bluetoothd[1072]: Listening for HCI events on hci0 Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> NetworkManager (version 0.9.4.0) is starting... Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> Read config file /etc/NetworkManager/NetworkManager.conf Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> VPN: loaded org.freedesktop.NetworkManager.pptp Jun 6 23:28:12 kcuzner-desktop NetworkManager[1080]: <info> DNS: loaded plugin dnsmasq Jun 6 23:28:12 kcuzner-desktop kernel: [ 0.000000] Initializing cgroup subsys cpuset Jun 6 23:28:12 kcuzner-desktop kernel: [ 0.000000] Initializing cgroup subsys cpu Sorry it's so huge; the restart happens at 23:28:12 I believe and all I see is that chromium segfaulted a few times. I wouldn't think a segfault from an individual program on the computer would crash it, but could that be the issue?

Read the article

Cost Comparison Hard Disk Drive to Solid State Drive on Price per Gigabyte - dispelling a myth!

- by tonyrogerson

It is often said that Hard Disk Drive storage is significantly cheaper per GiByte than Solid State Devices – this is wholly inaccurate within the database space. People need to look at the cost of the complete solution and not just a single component part in isolation to what is really required to meet the business requirement. Buying a single Hitachi Ultrastar 600GB 3.5” SAS 15Krpm hard disk drive will cost approximately £239.60 (http://scan.co.uk, 22nd March 2012) compared to an OCZ 600GB Z-Drive R4 CM84 PCIe costing £2,316.54 (http://scan.co.uk, 22nd March 2012); I’ve not included FusionIO ioDrive because there is no public pricing available for it – something I never understand and personally when companies do this I immediately think what are they hiding, luckily in FusionIO’s case the product is proven though is expensive compared to OCZ enterprise offerings. On the face of it the single 15Krpm hard disk has a price per GB of £0.39, the SSD £3.86; this is what you will see in the press and this is what sales people will use in comparing the two technologies – do not be fooled by this bullshit people! What is the requirement? The requirement is the database will have a static size of 400GB kept static through archiving so growth and trim will balance the database size, the client requires resilience, there will be several hundred call centre staff querying the database where queries will read a small amount of data but there will be no hot spot in the data so the randomness will come across the entire 400GB of the database, estimates predict that the IOps required will be approximately 4,000IOps at peak times, because it’s a call centre system the IO latency is important and must remain below 5ms per IO. The balance between read and write is 70% read, 30% write. The requirement is now defined and we have three of the most important pieces of the puzzle – space required, estimated IOps and maximum latency per IO. Something to consider with regard SQL Server; write activity requires synchronous IO to the storage media specifically the transaction log; that means the write thread will wait until the IO is completed and hardened off until the thread can continue execution, the requirement has stated that 30% of the system activity will be write so we can expect a high amount of synchronous activity. The hardware solution needs to be defined; two possible solutions: hard disk or solid state based; the real question now is how many hard disks are required to achieve the IO throughput, the latency and resilience, ditto for the solid state. Hard Drive solution On a test on an HP DL380, P410i controller using IOMeter against a single 15Krpm 146GB SAS drive, the throughput given on a transfer size of 8KiB against a 40GiB file on a freshly formatted disk where the partition is the only partition on the disk thus the 40GiB file is on the outer edge of the drive so more sectors can be read before head movement is required: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 3,733 IOps at an average latency of 34.06ms (34 MiB/s). The same test was done on the same disk but the test file was 130GiB: For 100% sequential IO at a queue depth of 16 with 8 worker threads 43,537 IOps at an average latency of 2.93ms (340 MiB/s), for 100% random IO at the same queue depth and worker threads 528 IOps at an average latency of 217.49ms (4 MiB/s). From the result it is clear random performance gets worse as the disk fills up – I’m currently writing an article on short stroking which will cover this in detail. Given the work load is random in nature looking at the random performance of the single drive when only 40 GiB of the 146 GB is used gives near the IOps required but the latency is way out. Luckily I have tested 6 x 15Krpm 146GB SAS 15Krpm drives in a RAID 0 using the same test methodology, for the same test above on a 130 GiB for each drive added the performance boost is near linear, for each drive added throughput goes up by 5 MiB/sec, IOps by 700 IOps and latency reducing nearly 50% per drive added (172 ms, 94 ms, 65 ms, 47 ms, 37 ms, 30 ms). This is because the same 130GiB is spread out more as you add drives 130 / 1, 130 / 2, 130 / 3 etc. so implicit short stroking is occurring because there is less file on each drive so less head movement required. The best latency is still 30 ms but we have the IOps required now, but that’s on a 130GiB file and not the 400GiB we need. Some reality check here: a) the drive randomness is more likely to be 50/50 and not a full 100% but the above has highlighted the effect randomness has on the drive and the more a drive fills with data the worse the effect. For argument sake let us assume that for the given workload we need 8 disks to do the job, for resilience reasons we will need 16 because we need to RAID 1+0 them in order to get the throughput and the resilience, RAID 5 would degrade performance. Cost for hard drives: 16 x £239.60 = £3,833.60 For the hard drives we will need disk controllers and a separate external disk array because the likelihood is that the server itself won’t take the drives, a quick spec off DELL for a PowerVault MD1220 which gives the dual pathing with 16 disks 146GB 15Krpm 2.5” disks is priced at £7,438.00, note its probably more once we had two controller cards to sit in the server in, racking etc. Minimum cost taking the DELL quote as an example is therefore: {Cost of Hardware} / {Storage Required} £7,438.60 / 400 = £18.595 per GB £18.59 per GiB is a far cry from the £0.39 we had been told by the salesman and the myth. Yes, the storage array is composed of 16 x 146 disks in RAID 10 (therefore 8 usable) giving an effective usable storage availability of 1168GB but the actual storage requirement is only 400 and the extra disks have had to be purchased to get the IOps up. Solid State Drive solution A single card significantly exceeds the IOps and latency required, for resilience two will be required. ( £2,316.54 * 2 ) / 400 = £11.58 per GB With the SSD solution only two PCIe sockets are required, no external disk units, no additional controllers, no redundant controllers etc. Conclusion I hope by showing you an example that the myth that hard disk drives are cheaper per GiB than Solid State has now been dispelled - £11.58 per GB for SSD compared to £18.59 for Hard Disk. I’ve not even touched on the running costs, compare the costs of running 18 hard disks, that’s a lot of heat and power compared to two PCIe cards!Just a quick note: I've left a fair amount of information out due to this being a blog! If in doubt, email me :)I'll also deal with the myth that SSD's wear out at a later date as well - that's just way over done still, yes, 5 years ago, but now - no.

Read the article

Das T5-4 TPC-H Ergebnis naeher betrachtet

- by Stefan Hinker

Inzwischen haben vermutlich viele das neue TPC-H Ergebnis der SPARC T5-4 gesehen, das am 7. Juni bei der TPC eingereicht wurde. Die wesentlichen Punkte dieses Benchmarks wurden wie gewohnt bereits von unserer Benchmark-Truppe auf "BestPerf" zusammengefasst. Es gibt aber noch einiges mehr, das eine naehere Betrachtung lohnt. Skalierbarkeit Das TPC raet von einem Vergleich von TPC-H Ergebnissen in unterschiedlichen Groessenklassen ab. Aber auch innerhalb der 3000GB-Klasse ist es interessant: SPARC T4-4 mit 4 CPUs (32 Cores mit 3.0 GHz) liefert 205,792 QphH. SPARC T5-4 mit 4 CPUs (64 Cores mit 3.6 GHz) liefert 409,721 QphH. Das heisst, es fehlen lediglich 1863 QphH oder 0.45% zu 100% Skalierbarkeit, wenn man davon ausgeht, dass die doppelte Anzahl Kerne das doppelte Ergebnis liefern sollte. Etwas anspruchsvoller, koennte man natuerlich auch einen Faktor von 2.4 erwarten, wenn man die hoehere Taktrate mit beruecksichtigt. Das wuerde die Latte auf 493901 QphH legen. Dann waere die SPARC T5-4 bei 83%. Damit stellt sich die Frage: Was hat hier nicht skaliert? Vermutlich der Plattenspeicher! Auch hier lohnt sich eine naehere Betrachtung: Plattenspeicher Im Bericht auf BestPerf und auch im Full Disclosure Report der TPC stehen einige interessante Details zum Plattenspeicher und der Konfiguration. In der Konfiguration der SPARC T4-4 wurden 12 2540-M2 Arrays verwendet, die jeweils ca. 1.5 GB/s Durchsatz liefert, insgesamt also eta 18 GB/s. Dabei waren die Arrays offensichtlich mit jeweils 2 Kabeln pro Array direkt an die 24 8GBit FC-Ports des Servers angeschlossen. Mit den 2x 8GBit Ports pro Array koennte man so ein theoretisches Maximum von 2GB/s erreichen. Tatsaechlich wurden 1.5GB/s geliefert, was so ziemlich dem realistischen Maximum entsprechen duerfte. Fuer den Lauf mit der SPARC T5-4 wurden doppelt so viele Platten verwendet. Dafuer wurden die 2540-M2 Arrays mit je einem zusaetzlichen Plattentray erweitert. Mit dieser Konfiguration wurde dann (laut BestPerf) ein Maximaldurchsatz von 33 GB/s erreicht - nicht ganz das doppelte des SPARC T4-4 Laufs. Um tatsaechlich den doppelten Durchsatz (36 GB/s) zu liefern, haette jedes der 12 Arrays 3 GB/s ueber seine 4 8GBit Ports liefern muessen. Im FDR stehen nur 12 dual-port FC HBAs, was die Verwendung der Brocade FC Switches erklaert: Es wurden alle 4 8GBit ports jedes Arrays an die Switches angeschlossen, die die Datenstroeme dann in die 24 16GBit HBA ports des Servers buendelten. Das theoretische Maximum jedes Storage-Arrays waere nun 4 GB/s. Wenn man jedoch den Protokoll- und "Realitaets"-Overhead mit einrechnet, sind die tatsaechlich gelieferten 2.75 GB/s gar nicht schlecht. Mit diesen Zahlen im Hinterkopf ist die Verdopplung des SPARC T4-4 Ergebnisses eine gute Leistung - und gleichzeitig eine gute Erklaerung, warum nicht bis zum 2.4-fachen skaliert wurde. Nebenbei bemerkt: Weder die SPARC T4-4 noch die SPARC T5-4 hatten in der gemessenen Konfiguration irgendwelche Flash-Devices. Mitbewerb Seit die T4 Systeme auf dem Markt sind, bemuehen sich unsere Mitbewerber redlich darum, ueberall den Eindruck zu hinterlassen, die Leistung des SPARC CPU-Kerns waere weiterhin mangelhaft. Auch scheinen sie ueberzeugt zu sein, dass (ueber)grosse Caches und hohe Taktraten die einzigen Schluessel zu echter Server Performance seien. Wenn ich mir nun jedoch die oeffentlichen TPC-H Ergebnisse ansehe, sehe ich dies: TPC-H @3000GB, Non-Clustered Systems System QphH SPARC T5-4 3.6 GHz SPARC T5 4/64 – 2048 GB 409,721.8 SPARC T4-4 3.0 GHz SPARC T4 4/32 – 1024 GB 205,792.0 IBM Power 780 4.1 GHz POWER7 8/32 – 1024 GB 192,001.1 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64 – 512 GB 162,601.7 Kurz zusammengefasst: Mit 32 Kernen (mit 3 GHz und 4MB L3 Cache), liefert die SPARC T4-4 mehr QphH@3000GB ab als IBM mit ihrer 32 Kern Power7 (bei 4.1 GHz und 32MB L3 Cache) und auch mehr als HP mit einem 64 Kern Intel Xeon System (2.27 GHz und 24MB L3 Cache). Ich frage mich, wo genau SPARC hier mangelhaft ist? Nun koennte man natuerlich argumentieren, dass beide Ergebnisse nicht gerade neu sind. Nun, in Ermangelung neuerer Ergebnisse kann man ja mal ein wenig spekulieren: IBMs aktueller Performance Report listet die o.g. IBM Power 780 mit einem rPerf Wert von 425.5. Ein passendes Nachfolgesystem mit Power7+ CPUs waere die Power 780+ mit 64 Kernen, verfuegbar mit 3.72 GHz. Sie wird mit einem rPerf Wert von 690.1 angegeben, also 1.62x mehr. Wenn man also annimmt, dass Plattenspeicher nicht der limitierende Faktor ist (IBM hat mit 177 SSDs getestet, sie duerfen das gerne auf 400 erhoehen) und IBMs eigene Leistungsabschaetzung zugrunde legt, darf man ein theoretisches Ergebnis von 311398 QphH@3000GB erwarten. Das waere dann allerdings immer noch weit von dem Ergebnis der SPARC T5-4 entfernt, und gerade in der von IBM so geschaetzen "per core" Metric noch weniger vorteilhaft. In der x86-Welt sieht es nicht besser aus. Leider gibt es von Intel keine so praktischen rPerf-Tabellen. Daher muss ich hier fuer eine Schaetzung auf SPECint_rate2006 zurueckgreifen. (Ich bin kein grosser Fan von solchen Kreuz- und Querschaetzungen. Insb. SPECcpu ist nicht besonders geeignet, um Datenbank-Leistung abzuschaetzen, da fast kein IO im Spiel ist.) Das o.g. HP System wird bei SPEC mit 1580 CINT2006_rate gelistet. Das bis einschl. 2013-06-14 beste Resultat fuer den neuen Intel Xeon E7-4870 mit 8 CPUs ist 2180 CINT2006_rate. Das ist immerhin 1.38x besser. (Wenn man nur die Taktrate beruecksichtigen wuerde, waere man bei 1.32x.) Hier weiter zu rechnen, ist muessig, aber fuer die ungeduldigen Leser hier eine kleine tabellarische Zusammenfassung: TPC-H @3000GB Performance Spekulationen System QphH* Verbesserung gegenueber der frueheren Generation SPARC T4-4 32 cores SPARC T4 205,792 2x SPARC T5-464 cores SPARC T5 409,721 IBM Power 780 32 cores Power7 192,001 1.62x IBM Power 780+ 64 cores Power7+ 311,398* HP ProLiant DL980 G764 cores Intel Xeon X7560 162,601 1.38x HP ProLiant DL980 G780 cores Intel Xeon E7-4870 224,348* * Keine echten Resultate - spekulative Werte auf der Grundlage von rPerf (Power7+) oder SPECint_rate2006 (HP) Natuerlich sind IBM oder HP herzlich eingeladen, diese Werte zu widerlegen. Aber stand heute warte ich noch auf aktuelle Benchmark Veroffentlichungen in diesem Datensegment. Was koennen wir also zusammenfassen? Es gibt einige Hinweise, dass der Plattenspeicher der begrenzende Faktor war, der die SPARC T5-4 daran hinderte, auf jenseits von 2x zu skalieren Der Mythos, dass SPARC Kerne keine Leistung bringen, ist genau das - ein Mythos. Wie sieht es umgekehrt eigentlich mit einem TPC-H Ergebnis fuer die Power7+ aus? Cache ist nicht der magische Performance-Schalter, fuer den ihn manche Leute offenbar halten. Ein System, eine CPU-Architektur und ein Betriebsystem jenseits einer gewissen Grenze zu skalieren ist schwer. In der x86-Welt scheint es noch ein wenig schwerer zu sein. Was fehlt? Nun, das Thema Preis/Leistung ueberlasse ich gerne den Verkaeufern ;-) Und zu guter Letzt: Nein, ich habe mich nicht ins Marketing versetzen lassen. Aber manchmal kann ich mich einfach nicht zurueckhalten... Disclosure Statements The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org, results as of 6/7/13. Prices are in USD. SPARC T5-4 409,721.8 QphH@3000GB, $3.94/QphH@3000GB, available 9/24/13, 4 processors, 64 cores, 512 threads; SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads. SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of June 18, 2013 from www.spec.org. HP ProLiant DL980 G7 (2.27 GHz, Intel Xeon X7560): 1580 SPECint_rate2006; HP ProLiant DL980 G7 (2.4 GHz, Intel Xeon E7-4870): 2180 SPECint_rate2006,

Read the article

Why are my Opteron cores running at only 75% capacity each? (25% CPU idle)

- by Tim Cooper

We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our Java threads are mainly using objects that are private to that thread, sometimes reading memory that other threads will be reading, and very very occasionally writing or locking shared objects. We can't explain why the CPU cores are 25% idle. Below is a dump of "top": top - 23:06:38 up 1 day, 23 min, 3 users, load average: 10.84, 10.27, 9.62 Tasks: 676 total, 1 running, 675 sleeping, 0 stopped, 0 zombie Cpu(s): 64.5%us, 1.3%sy, 0.0%ni, 32.9%id, 1.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132138168k total, 131652664k used, 485504k free, 92340k buffers Swap: 5701624k total, 230252k used, 5471372k free, 13444344k cached ... top - 22:37:39 up 23:54, 3 users, load average: 7.83, 8.70, 9.27 Tasks: 678 total, 1 running, 677 sleeping, 0 stopped, 0 zombie Cpu0 : 75.8%us, 2.0%sy, 0.0%ni, 22.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 77.2%us, 1.3%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 77.3%us, 1.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 77.8%us, 1.0%sy, 0.0%ni, 21.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 76.9%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 76.3%us, 2.0%sy, 0.0%ni, 21.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 12.6%us, 3.0%sy, 0.0%ni, 84.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 8.6%us, 2.0%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 77.0%us, 2.0%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 77.6%us, 1.7%sy, 0.0%ni, 20.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 75.7%us, 2.0%sy, 0.0%ni, 21.4%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 76.6%us, 2.3%sy, 0.0%ni, 21.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 76.2%us, 2.6%sy, 0.0%ni, 15.9%id, 5.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 76.6%us, 2.0%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu16 : 73.6%us, 2.6%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu17 : 74.5%us, 2.3%sy, 0.0%ni, 23.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu18 : 73.9%us, 2.3%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu19 : 72.9%us, 2.6%sy, 0.0%ni, 24.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu20 : 72.8%us, 2.6%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu21 : 72.7%us, 2.3%sy, 0.0%ni, 25.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu22 : 72.5%us, 2.6%sy, 0.0%ni, 24.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu23 : 73.0%us, 2.3%sy, 0.0%ni, 24.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu24 : 74.7%us, 2.7%sy, 0.0%ni, 22.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu25 : 74.5%us, 2.6%sy, 0.0%ni, 22.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu26 : 73.7%us, 2.0%sy, 0.0%ni, 24.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu27 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu28 : 74.1%us, 2.3%sy, 0.0%ni, 23.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu29 : 74.0%us, 2.0%sy, 0.0%ni, 24.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu30 : 73.2%us, 2.3%sy, 0.0%ni, 24.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu31 : 73.1%us, 2.0%sy, 0.0%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132138168k total, 131711704k used, 426464k free, 88336k buffers Swap: 5701624k total, 229572k used, 5472052k free, 13745596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13865 root 20 0 122g 112g 3.1g S 2334.3 89.6 20726:49 java 27139 jayen 20 0 15428 1728 952 S 2.6 0.0 0:04.21 top 27161 sysadmin 20 0 15428 1712 940 R 1.0 0.0 0:00.28 top 33 root 20 0 0 0 0 S 0.3 0.0 0:06.24 ksoftirqd/7 131 root 20 0 0 0 0 S 0.3 0.0 0:09.52 events/0 1858 root 20 0 0 0 0 S 0.3 0.0 1:35.14 kondemand/0 A dump of the java stack confirms that none of the threads are anywhere near the few places where locks are used, nor are they anywhere near any disk or network i/o. I had trouble finding a clear explanation of what 'top' means by "idle" versus "wait", but I get the impression that "idle" means "no more threads that need to be run" but this doesn't make sense in our case. We're using a "Executors.newFixedThreadPool(30)". There are a large number of tasks pending and each task lasts for 10 seconds or so. I suspect that the explanation requires a good understanding of NUMA. Is the "idle" state what you see when a CPU is waiting for a non-local access? If not, then what is the explanation?

Read the article

parallel_for_each from amp.h – part 1

- by Daniel Moth

This posts assumes that you've read my other C++ AMP posts on index<N> and extent<N>, as well as about the restrict modifier. It also assumes you are familiar with C++ lambdas (if not, follow my links to C++ documentation). Basic structure and parameters Now we are ready for part 1 of the description of the new overload for the concurrency::parallel_for_each function. The basic new parallel_for_each method signature returns void and accepts two parameters: a grid<N> (think of it as an alias to extent) a restrict(direct3d) lambda, whose signature is such that it returns void and accepts an index of the same rank as the grid So it looks something like this (with generous returns for more palatable formatting) assuming we are dealing with a 2-dimensional space: // some_code_A parallel_for_each( g, // g is of type grid<2> [ ](index<2> idx) restrict(direct3d) { // kernel code } ); // some_code_B The parallel_for_each will execute the body of the lambda (which must have the restrict modifier), on the GPU. We also call the lambda body the "kernel". The kernel will be executed multiple times, once per scheduled GPU thread. The only difference in each execution is the value of the index object (aka as the GPU thread ID in this context) that gets passed to your kernel code. The number of GPU threads (and the values of each index) is determined by the grid object you pass, as described next. You know that grid is simply a wrapper on extent. In this context, one way to think about it is that the extent generates a number of index objects. So for the example above, if your grid was setup by some_code_A as follows: extent<2> e(2,3); grid<2> g(e); ...then given that: e.size()==6, e[0]==2, and e[1]=3 ...the six index<2> objects it generates (and hence the values that your lambda would receive) are: (0,0) (1,0) (0,1) (1,1) (0,2) (1,2) So what the above means is that the lambda body with the algorithm that you wrote will get executed 6 times and the index<2> object you receive each time will have one of the values just listed above (of course, each one will only appear once, the order is indeterminate, and they are likely to call your code at the same exact time). Obviously, in real GPU programming, you'd typically be scheduling thousands if not millions of threads, not just 6. If you've been following along you should be thinking: "that is all fine and makes sense, but what can I do in the kernel since I passed nothing else meaningful to it, and it is not returning any values out to me?" Passing data in and out It is a good question, and in data parallel algorithms indeed you typically want to pass some data in, perform some operation, and then typically return some results out. The way you pass data into the kernel, is by capturing variables in the lambda (again, if you are not familiar with them, follow the links about C++ lambdas), and the way you use data after the kernel is done executing is simply by using those same variables. In the example above, the lambda was written in a fairly useless way with an empty capture list: [ ](index<2> idx) restrict(direct3d), where the empty square brackets means that no variables were captured. If instead I write it like this [&](index<2> idx) restrict(direct3d), then all variables in the some_code_A region are made available to the lambda by reference, but as soon as I try to use any of those variables in the lambda, I will receive a compiler error. This has to do with one of the direct3d restrictions, where only one type can be capture by reference: objects of the new concurrency::array class that I'll introduce in the next post (suffice for now to think of it as a container of data). If I write the lambda line like this [=](index<2> idx) restrict(direct3d), all variables in the some_code_A region are made available to the lambda by value. This works for some types (e.g. an integer), but not for all, as per the restrictions for direct3d. In particular, no useful data classes work except for one new type we introduce with C++ AMP: objects of the new concurrency::array_view class, that I'll introduce in the post after next. Also note that if you capture some variable by value, you could use it as input to your algorithm, but you wouldn’t be able to observe changes to it after the parallel_for_each call (e.g. in some_code_B region since it was passed by value) – the exception to this rule is the array_view since (as we'll see in a future post) it is a wrapper for data, not a container. Finally, for completeness, you can write your lambda, e.g. like this [av, &ar](index<2> idx) restrict(direct3d) where av is a variable of type array_view and ar is a variable of type array - the point being you can be very specific about what variables you capture and how. So it looks like from a large data perspective you can only capture array and array_view objects in the lambda (that is how you pass data to your kernel) and then use the many threads that call your code (each with a unique index) to perform some operation. You can also capture some limited types by value, as input only. When the last thread completes execution of your lambda, the data in the array_view or array are ready to be used in the some_code_B region. We'll talk more about all this in future posts… (a)synchronous Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality. That's all for now, we'll revisit the parallel_for_each description, once we introduce properly array and array_view – coming next. Comments about this post by Daniel Moth welcome at the original blog.

Read the article

Yet another C# Deadlock Debugging Question

- by Roo

Hi All, I have a multi-threaded application build in C# using VS2010 Professional. It's quite a large application and we've experienced the classing GUI cross-threading and deadlock issues before, but in the past month we've noticed the appears to lock up when left idle for around 20-30 minutes. The application is irresponsive and although it will repaint itself when other windows are dragged in front of the application and over it, the GUI still appears to be locked... interstingly (unlike if the GUI thread is being used for a considerable amount of time) the Close, Maximise and minimise buttons are also irresponsive and when clicked the little (Not Responding...) text is not displayed in the title of the application i.e. Windows still seems to think it's running fine. If I break/pause the application using the debugger, and view the threads that are running. There are 3 threads of our managed code that are running, and a few other worker threads whom the source code cannot be displayed for. The 3 threads that run are: The main/GUI thread A thread that loops indefinitely A thread that loops indefinitely If I step into threads 2 and 3, they appear to be looping correctly. They do not share locks (even with the main GUI thread) and they are not using the GUI thread at all. When stepping into the main/GUI thread however, it's broken on Application.Run... This problem screams deadlock to me, but what I don't understand is if it's deadlock, why can't I see the line of code the main/GUI thread is hanging on? Any help will be greatly appreciated! Let me know if you need more information... Cheers, Roo -----------------------------------------------------SOLUTION-------------------------------------------------- Okay, so the problem is now solved. Thanks to everyone for their suggestions! Much appreciated! I've marked the answer that solved my initial problem of determining where on the main/UI thread the application hangs (I handn't turned off the "Enable Just My Code" option). The overall issue I was experiencing was indeed Deadlock, however. After obtaining the call-stack and popping the top half of it into Google I came across this which explains exactly what I was experiencing... http://timl.net/ This references a lovely guide to debugging the issue... http://www.aaronlerch.com/blog/2008/12/15/debugging-ui/ This identified a control I was constructing off the GUI thread. I did know this, however, and was marshalling calls correctly, but what I didn't realise was that behind the scenes this Control was subscribing to an event or set of events that are triggered when e.g. a Windows session is unlocked or the screensaver exits. These calls are always made on the main/UI thread and were blocking when it saw the call was made on the incorrect thread. Kim explains in more detail here... http://krgreenlee.blogspot.com/2007/09/onuserpreferencechanged-hang.html In the end I found an alternative solution which did not require this Control off the main/UI thread. That appears to have solved the problem and the application no longer hangs. I hope this helps anyone who's confronted by a similar problem. Thanks again to everyone on here who helped! (and indirectly, the delightful bloggers I've referenced above!) Roo -----------------------------------------------------SOLUTION II-------------------------------------------------- Aren't threading issues delightful...you think you've solved it, and a month down the line it pops back up again. I still believe the solution above resolved an issue that would cause simillar behaviour, but we encountered the problem again. As we spent a while debugging this, I thought I'd update this question with our (hopefully) final solution: The problem appears to have been a bug in the Infragistics components in the WinForms 2010.1 release (no hot fixes). We had been running from around the time the freeze issue appeared (but had also added a bunch of other stuff too). After upgrading to WinForms 2010.3, we've yet to reproduce the issue (deja vu). See my question here for a bit more information: 'http://stackoverflow.com/questions/4077822/net-4-0-and-the-dreaded-onuserpreferencechanged-hang'. Hans has given a nice summary of the general issue. I hope this adds a little to the suggestions/information surrounding the nutorious OnUserPreferenceChanged Hang (or whatever you'd like to call it). Cheers, Roo

Read the article

TStringList and TThread that does not free all of its memory

- by VanillaH

Version used: Delphi 7. I'm working on a program that does a simple for loop on a Virtual ListView. The data is stored in the following record: type TList=record Item:Integer; SubItem1:String; SubItem2:String; end; Item is the index. SubItem1 the status of the operations (success or not). SubItem2 the path to the file. The for loop loads each file, does a few operations and then, save it. The operations take place in a TStringList. Files are about 2mb each. Now, if I do the operations on the main form, it works perfectly. Multi-threaded, there is a huge memory problem. Somehow, the TStringList doesn't seem to be freed completely. After 3-4k files, I get an EOutofMemory exception. Sometimes, the software is stuck to 500-600mb, sometimes not. In any case, the TStringList always return an EOutofMemory exception and no file can be loaded anymore. On computers with more memory, it takes longer to get the exception. The same thing happens with other components. For instance, if I use THTTPSend from Synapse, well, after a while, the software cannot create any new threads because the memory consumption is too high. It's around 500-600mb while it should be, max, 100mb. On the main form, everything works fine. I guess the mistake is on my side. Maybe I don't understand threads enough. I tried to free everything on the Destroy event. I tried FreeAndNil procedure. I tried with only one thread at a time. I tried freeing the thread manually (no FreeOnTerminate...) No luck. So here is the thread code. It's only the basic idea; not the full code with all the operations. If I remove the LoadFile prodecure, everything works good. A thread is created for each file, according to a thread pool. unit OperationsFiles; interface uses Classes, SysUtils, Windows; type TOperationFile = class(TThread) private Position : Integer; TPath, StatusMessage: String; FileStringList: TStringList; procedure UpdateStatus; procedure LoadFile; protected procedure Execute; override; public constructor Create(Path: String; LNumber: Integer); end; implementation uses Form1; procedure TOperationFile.LoadFile; begin try FileStringList.LoadFromFile(TPath); // Operations... StatusMessage := 'Success'; except on E : Exception do StatusMessage := E.ClassName; end; end; constructor TOperationFile.Create(Path : String; LNumber: Integer); begin inherited Create(False); TPath := Path; Position := LNumber; FreeOnTerminate := True; end; procedure TOperationFile.UpdateStatus; begin FileList[Position].SubItem1 := StatusMessage; Form1.ListView4.UpdateItems(Position,Position); end; procedure TOperationFile.Execute; begin FileStringList:= TStringList.Create; LoadFile; Synchronize(UpdateStatus); FileStringList.Free; end; end. What could be the problem? I thought at one point that, maybe, too many threads are created. If a user loads 1 million files, well, ultimately, 1 million threads is going to be created -- although, only 50 threads are created and running at the same time. Thanks for your input.

Read the article

11gR2 ???:Oracle Cluster Health Monitor(CHM)??

- by JaneZhang(???)

Cluster Health Monitor(????CHM)???Oracle?????,?????????????(CPU????SWAP????I/O?????)??????CHM?????????? ??????????????????????Hang?????(Eviction)????????????????,??????CHM????????????????????,????????????? CHM???????????: 11.2.0.2 ?????? Oracle Grid Infrastructure for Linux (???Linux Itanium) ?Solaris (Sparc 64 ? x86-64) 11.2.0.3 ????? Oracle Grid Infrastructure for AIX ? Windows (???Windows Itanium)? ????,???????????CHM?????(ora.crf)???: $ crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS Cluster Resources ora.crf ONLINE ONLINE rac1 CHM????????: 1). System Monitor Service(osysmond):?????????????,osysmond????????????????cluster logger service,???????????????????CHM????? $ ps -ef|grep osysmond root 7984 1 0 Jun05 ? 01:16:14 /u01/app/11.2.0/grid/bin/osysmond.bin 2). Cluster Logger Service(ologgerd):???????,ologgerd ???????(master),???????(standby)??ologgerd???????????????,?????????? ???: $ ps -ef|grep ologgerd root 8257 1 0 Jun05 ? 00:38:26 /u01/app/11.2.0/grid/bin/ologgerd -M -d /u01/app/11.2.0/grid/crf/db/rac2 ???: $ ps -ef|grep ologgerd root 8353 1 0 Jun05 ? 00:18:47 /u01/app/11.2.0/grid/bin/ologgerd -m rac2 -r -d /u01/app/11.2.0/grid/crf/db/rac1 CHM Repository:?????????,?????,????Grid Infrastructure home ? ,??1 GB ?????,???????????0.5GB???? ?????OCLUMON??????????????????(??????3????)? ??????????????: $ oclumon manage -get reppath CHM Repository Path = /u01/app/11.2.0/grid/crf/db/rac2 Done $ oclumon manage -get repsize CHM Repository Size = 68082 <====???? Done ????: $ oclumon manage -repos reploc /shared/oracle/chm ????: $ oclumon manage -repos resize 68083 <==?3600(??) ? 259200(3?)?? rac1 --> retention check successful New retention is 68083 and will use 1073750609 bytes of disk space CRS-9115-Cluster Health Monitor repository size change completed on all nodes. Done ??CHM???????????: 1. ?????Grid_home/bin/diagcollection.pl: 1). ??,??cluster logger service????: $ oclumon manage -get master Master = rac2 2).?root??????rac2???????: # <Grid_home>/bin/diagcollection.pl -collect -chmos -incidenttime inc_time -incidentduration duration inc_time?????????????,???MM/DD/YYYY24HH:MM:SS, duration?????????????????? ??:# diagcollection.pl -collect -crshome /u01/app/11.2.0/grid -chmoshome /u01/app/11.2.0/grid -chmos -incidenttime 06/15/201215:30:00 -incidentduration 00:05 3).????????,CHM?????????chmosData_rac2_20120615_1537.tar.gz? 2. ??????CHM?????????oclumon: $oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h] -s??????,-e?????? $ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt $ oclumon dumpnodeview -n node1 node2 node3 -last "12:00:00" >/tmp/chm1.txt $ oclumon dumpnodeview -allnodes -last "00:15:00" >/tmp/chm1.txt ???/tmp/chm1.txt??????:----------------------------------------Node: rac1 Clock: '06-15-12 07.40.01' SerialNo:168880----------------------------------------SYSTEM:#cpus: 1 cpu: 17.96 cpuq: 5 physmemfree: 32240 physmemtotal: 2065856 mcache: 1064024 swapfree: 3988376 swaptotal: 4192956 ior: 57 iow: 59 ios: 10 swpin: 0 swpout: 0 pgin: 57 pgout: 59 netr: 65.767 netw: 34.871 procs: 183 rtprocs: 10 #fds: 4902 #sysfdlimit: 6815744 #disks: 4 #nics: 3 nicErrors: 0TOP CONSUMERS:topcpu: 'mrtg(32385) 64.70' topprivmem: 'ologgerd(8353) 84068' topshm: 'oracle(8760) 329452' topfd: 'ohasd.bin(6627) 720' topthread: 'crsd.bin(8235) 44'PROCESSES:name: 'mrtg' pid: 32385 #procfdlimit: 65536 cpuusage: 64.70 privmem: 1160 shm: 1584 #fd: 5 #threads: 1 priority: 20 nice: 0name: 'oracle' pid: 32381 #procfdlimit: 65536 cpuusage: 0.29 privmem: 1456 shm: 12444 #fd: 32 #threads: 1 priority: 15 nice: 0...name: 'oracle' pid: 8756 #procfdlimit: 65536 cpuusage: 0.0 privmem: 2892 shm: 24356 #fd: 47 #threads: 1 priority: 16 nice: 0----------------------------------------Node: rac2 Clock: '06-15-12 07.40.02' SerialNo:168878----------------------------------------SYSTEM:#cpus: 1 cpu: 40.72 cpuq: 8 physmemfree: 34072 physmemtotal: 2065856 mcache: 1005636 swapfree: 3991808 swaptotal: 4192956 ior: 54 iow: 104 ios: 11 swpin: 0 swpout: 0 pgin: 54 pgout: 104 netr: 77.817 netw: 33.008 procs: 178 rtprocs: 10 #fds: 4948 #sysfdlimit: 6815744 #disks: 4 #nics: 4 nicErrors: 0TOP CONSUMERS:topcpu: 'orarootagent.bi(8490) 1.59' topprivmem: 'ologgerd(8257) 83108' topshm: 'oracle(8873) 324868' topfd: 'ohasd.bin(6744) 720' topthread: 'crsd.bin(8362) 47'PROCESSES:name: 'oracle' pid: 9040 #procfdlimit: 65536 cpuusage: 0.19 privmem: 6040 shm: 121712 #fd: 33 #threads: 1 priority: 16 nice: 0... ??CHM?????,???Oracle????: http://docs.oracle.com/cd/E11882_01/rac.112/e16794/troubleshoot.htm#CWADD92242 Oracle® Clusterware Administration and Deployment Guide 11g Release 2 (11.2) Part Number E16794-17 ?? My Oracle Support??: Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1)

Read the article

MySQL is running VERY slow

- by user1032531

I have two servers: a VPS and a laptop. I recently re-built both of them, and MySQL is running about 20 times slower on the laptop. Both servers used to run CentOS 5.8 and I think MySQL 5.1, and the laptop used to do great so I do not think it is the hardware. For the VPS, my provider installed CentOS 6.4, and then I installed MySQL 5.1.69 using yum with the CentOS repo. For the laptop, I installed CentOS 6.4 basic server and then installed MySQL 5.1.69 using yum with the CentOS repo. my.cnf for both servers are identical, and I have shown below. For both servers, I've also included below the output from SHOW VARIABLES; as well as output from sysbench, file system information, and cpu information. I have tried adding skip-name-resolve, but it didn't help. The matrix below shows the SHOW VARIABLES output from both servers which is different. Again, MySQL was installed the same way, so I do not know why it is different, but it is and I think this might be why the laptop is executing MySQL so slowly. Why is the laptop running MySQL slowly, and how do I fix it? Differences between SHOW VARIABLES on both servers +---------------------------+-----------------------+-------------------------+ | Variable | Value-VPS | Value-Laptop | +---------------------------+-----------------------+-------------------------+ | hostname | vps.site1.com | laptop.site2.com | | max_binlog_cache_size | 4294963200 | 18446744073709500000 | | max_seeks_for_key | 4294967295 | 18446744073709500000 | | max_write_lock_count | 4294967295 | 18446744073709500000 | | myisam_max_sort_file_size | 2146435072 | 9223372036853720000 | | myisam_mmap_size | 4294967295 | 18446744073709500000 | | plugin_dir | /usr/lib/mysql/plugin | /usr/lib64/mysql/plugin | | pseudo_thread_id | 7568 | 2 | | system_time_zone | EST | PDT | | thread_stack | 196608 | 262144 | | timestamp | 1372252112 | 1372252046 | | version_compile_machine | i386 | x86_64 | +---------------------------+-----------------------+-------------------------+ my.cnf for both servers [root@server1 ~]# cat /etc/my.cnf [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid innodb_strict_mode=on sql_mode=TRADITIONAL # sql_mode=STRICT_TRANS_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE character-set-server=utf8 collation-server=utf8_general_ci log=/var/log/mysqld_all.log [root@server1 ~]# VPS SHOW VARIABLES Info Same as Laptop shown below but changes per above matrix (removed to allow me to be under the 30000 characters as required by ServerFault) Laptop SHOW VARIABLES Info auto_increment_increment 1 auto_increment_offset 1 autocommit ON automatic_sp_privileges ON back_log 50 basedir /usr/ big_tables OFF binlog_cache_size 32768 binlog_direct_non_transactional_updates OFF binlog_format STATEMENT bulk_insert_buffer_size 8388608 character_set_client utf8 character_set_connection utf8 character_set_database latin1 character_set_filesystem binary character_set_results utf8 character_set_server latin1 character_set_system utf8 character_sets_dir /usr/share/mysql/charsets/ collation_connection utf8_general_ci collation_database latin1_swedish_ci collation_server latin1_swedish_ci completion_type 0 concurrent_insert 1 connect_timeout 10 datadir /var/lib/mysql/ date_format %Y-%m-%d datetime_format %Y-%m-%d %H:%i:%s default_week_format 0 delay_key_write ON delayed_insert_limit 100 delayed_insert_timeout 300 delayed_queue_size 1000 div_precision_increment 4 engine_condition_pushdown ON error_count 0 event_scheduler OFF expire_logs_days 0 flush OFF flush_time 0 foreign_key_checks ON ft_boolean_syntax + -><()~*:""&| ft_max_word_len 84 ft_min_word_len 4 ft_query_expansion_limit 20 ft_stopword_file (built-in) general_log OFF general_log_file /var/run/mysqld/mysqld.log group_concat_max_len 1024 have_community_features YES have_compress YES have_crypt YES have_csv YES have_dynamic_loading YES have_geometry YES have_innodb YES have_ndbcluster NO have_openssl DISABLED have_partitioning YES have_query_cache YES have_rtree_keys YES have_ssl DISABLED have_symlink DISABLED hostname server1.site2.com identity 0 ignore_builtin_innodb OFF init_connect init_file init_slave innodb_adaptive_hash_index ON innodb_additional_mem_pool_size 1048576 innodb_autoextend_increment 8 innodb_autoinc_lock_mode 1 innodb_buffer_pool_size 8388608 innodb_checksums ON innodb_commit_concurrency 0 innodb_concurrency_tickets 500 innodb_data_file_path ibdata1:10M:autoextend innodb_data_home_dir innodb_doublewrite ON innodb_fast_shutdown 1 innodb_file_io_threads 4 innodb_file_per_table OFF innodb_flush_log_at_trx_commit 1 innodb_flush_method innodb_force_recovery 0 innodb_lock_wait_timeout 50 innodb_locks_unsafe_for_binlog OFF innodb_log_buffer_size 1048576 innodb_log_file_size 5242880 innodb_log_files_in_group 2 innodb_log_group_home_dir ./ innodb_max_dirty_pages_pct 90 innodb_max_purge_lag 0 innodb_mirrored_log_groups 1 innodb_open_files 300 innodb_rollback_on_timeout OFF innodb_stats_method nulls_equal innodb_stats_on_metadata ON innodb_support_xa ON innodb_sync_spin_loops 20 innodb_table_locks ON innodb_thread_concurrency 8 innodb_thread_sleep_delay 10000 innodb_use_legacy_cardinality_algorithm ON insert_id 0 interactive_timeout 28800 join_buffer_size 131072 keep_files_on_create OFF key_buffer_size 8384512 key_cache_age_threshold 300 key_cache_block_size 1024 key_cache_division_limit 100 language /usr/share/mysql/english/ large_files_support ON large_page_size 0 large_pages OFF last_insert_id 0 lc_time_names en_US license GPL local_infile ON locked_in_memory OFF log OFF log_bin OFF log_bin_trust_function_creators OFF log_bin_trust_routine_creators OFF log_error /var/log/mysqld.log log_output FILE log_queries_not_using_indexes OFF log_slave_updates OFF log_slow_queries OFF log_warnings 1 long_query_time 10.000000 low_priority_updates OFF lower_case_file_system OFF lower_case_table_names 0 max_allowed_packet 1048576 max_binlog_cache_size 18446744073709547520 max_binlog_size 1073741824 max_connect_errors 10 max_connections 151 max_delayed_threads 20 max_error_count 64 max_heap_table_size 16777216 max_insert_delayed_threads 20 max_join_size 18446744073709551615 max_length_for_sort_data 1024 max_long_data_size 1048576 max_prepared_stmt_count 16382 max_relay_log_size 0 max_seeks_for_key 18446744073709551615 max_sort_length 1024 max_sp_recursion_depth 0 max_tmp_tables 32 max_user_connections 0 max_write_lock_count 18446744073709551615 min_examined_row_limit 0 multi_range_count 256 myisam_data_pointer_size 6 myisam_max_sort_file_size 9223372036853727232 myisam_mmap_size 18446744073709551615 myisam_recover_options OFF myisam_repair_threads 1 myisam_sort_buffer_size 8388608 myisam_stats_method nulls_unequal myisam_use_mmap OFF net_buffer_length 16384 net_read_timeout 30 net_retry_count 10 net_write_timeout 60 new OFF old OFF old_alter_table OFF old_passwords OFF open_files_limit 1024 optimizer_prune_level 1 optimizer_search_depth 62 optimizer_switch index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on pid_file /var/run/mysqld/mysqld.pid plugin_dir /usr/lib64/mysql/plugin port 3306 preload_buffer_size 32768 profiling OFF profiling_history_size 15 protocol_version 10 pseudo_thread_id 3 query_alloc_block_size 8192 query_cache_limit 1048576 query_cache_min_res_unit 4096 query_cache_size 0 query_cache_type ON query_cache_wlock_invalidate OFF query_prealloc_size 8192 rand_seed1 rand_seed2 range_alloc_block_size 4096 read_buffer_size 131072 read_only OFF read_rnd_buffer_size 262144 relay_log relay_log_index relay_log_info_file relay-log.info relay_log_purge ON relay_log_space_limit 0 report_host report_password report_port 3306 report_user rpl_recovery_rank 0 secure_auth OFF secure_file_priv server_id 0 skip_external_locking ON skip_name_resolve OFF skip_networking OFF skip_show_database OFF slave_compressed_protocol OFF slave_exec_mode STRICT slave_load_tmpdir /tmp slave_max_allowed_packet 1073741824 slave_net_timeout 3600 slave_skip_errors OFF slave_transaction_retries 10 slow_launch_time 2 slow_query_log OFF slow_query_log_file /var/run/mysqld/mysqld-slow.log socket /var/lib/mysql/mysql.sock sort_buffer_size 2097144 sql_auto_is_null ON sql_big_selects ON sql_big_tables OFF sql_buffer_result OFF sql_log_bin ON sql_log_off OFF sql_log_update ON sql_low_priority_updates OFF sql_max_join_size 18446744073709551615 sql_mode sql_notes ON sql_quote_show_create ON sql_safe_updates OFF sql_select_limit 18446744073709551615 sql_slave_skip_counter sql_warnings OFF ssl_ca ssl_capath ssl_cert ssl_cipher ssl_key storage_engine MyISAM sync_binlog 0 sync_frm ON system_time_zone PDT table_definition_cache 256 table_lock_wait_timeout 50 table_open_cache 64 table_type MyISAM thread_cache_size 0 thread_handling one-thread-per-connection thread_stack 262144 time_format %H:%i:%s time_zone SYSTEM timed_mutexes OFF timestamp 1372254399 tmp_table_size 16777216 tmpdir /tmp transaction_alloc_block_size 8192 transaction_prealloc_size 4096 tx_isolation REPEATABLE-READ unique_checks ON updatable_views_with_limit YES version 5.1.69 version_comment Source distribution version_compile_machine x86_64 version_compile_os redhat-linux-gnu wait_timeout 28800 warning_count 0 VPS Sysbench Info [root@vps ~]# cat sysbench.txt sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 8 Doing OLTP test. Running mixed OLTP test Doing read-only test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 7 times) Done. OLTP test statistics: queries performed: read: 1449966 write: 0 other: 207138 total: 1657104 transactions: 103569 (1726.01 per sec.) deadlocks: 0 (0.00 per sec.) read/write requests: 1449966 (24164.08 per sec.) other operations: 207138 (3452.01 per sec.) Test execution summary: total time: 60.0050s total number of events: 103569 total time taken by event execution: 479.1544 per-request statistics: min: 1.98ms avg: 4.63ms max: 330.73ms approx. 95 percentile: 8.26ms Threads fairness: events (avg/stddev): 12946.1250/381.09 execution time (avg/stddev): 59.8943/0.00 [root@vps ~]# Laptop Sysbench Info [root@server1 ~]# cat sysbench.txt sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 8 Doing OLTP test. Running mixed OLTP test Doing read-only test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 7 times) Done. OLTP test statistics: queries performed: read: 634718 write: 0 other: 90674 total: 725392 transactions: 45337 (755.56 per sec.) deadlocks: 0 (0.00 per sec.) read/write requests: 634718 (10577.78 per sec.) other operations: 90674 (1511.11 per sec.) Test execution summary: total time: 60.0048s total number of events: 45337 total time taken by event execution: 479.4912 per-request statistics: min: 2.04ms avg: 10.58ms max: 85.56ms approx. 95 percentile: 19.70ms Threads fairness: events (avg/stddev): 5667.1250/42.18 execution time (avg/stddev): 59.9364/0.00 [root@server1 ~]# VPS File Info [root@vps ~]# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/simfs simfs 20971520 16187440 4784080 78% / none tmpfs 6224432 4 6224428 1% /dev none tmpfs 6224432 0 6224432 0% /dev/shm [root@vps ~]# Laptop File Info [root@server1 ~]# df -T Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_server1-lv_root ext4 72383800 4243964 64462860 7% / tmpfs tmpfs 956352 0 956352 0% /dev/shm /dev/sdb1 ext4 495844 60948 409296 13% /boot [root@server1 ~]# VPS CPU Info Removed to stay under the 30000 character limit required by ServerFault Laptop CPU Info [root@server1 ~]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz stepping : 13 cpu MHz : 800.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dts tpr_shadow vnmi flexpriority bogomips : 3591.39 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz stepping : 13 cpu MHz : 800.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida dts tpr_shadow vnmi flexpriority bogomips : 3591.39 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: [root@server1 ~]# EDIT New Info requested by shakalandy [root@localhost ~]# cat /proc/meminfo MemTotal: 2044804 kB MemFree: 761464 kB Buffers: 68868 kB Cached: 369708 kB SwapCached: 0 kB Active: 881080 kB Inactive: 246016 kB Active(anon): 688312 kB Inactive(anon): 4416 kB Active(file): 192768 kB Inactive(file): 241600 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4095992 kB SwapFree: 4095992 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 688428 kB Mapped: 65156 kB Shmem: 4216 kB Slab: 92428 kB SReclaimable: 31260 kB SUnreclaim: 61168 kB KernelStack: 2392 kB PageTables: 28356 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 5118392 kB Committed_AS: 1530212 kB VmallocTotal: 34359738367 kB VmallocUsed: 343604 kB VmallocChunk: 34359372920 kB HardwareCorrupted: 0 kB AnonHugePages: 520192 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 8556 kB DirectMap2M: 2078720 kB [root@localhost ~]# ps aux | grep mysql root 2227 0.0 0.0 108332 1504 ? S 07:36 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/localhost.badobe.com.pid mysql 2319 0.1 24.5 1470068 501360 ? Sl 07:36 0:57 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/localhost.badobe.com.err --pid-file=/var/lib/mysql/localhost.badobe.com.pid root 3579 0.0 0.1 201840 3028 pts/0 S+ 07:40 0:00 mysql -u root -p root 13887 0.0 0.1 201840 3036 pts/3 S+ 18:08 0:00 mysql -uroot -px xxxxxxxxxx root 14449 0.0 0.0 103248 840 pts/2 S+ 18:16 0:00 grep mysql [root@localhost ~]# ps aux | grep mysql root 2227 0.0 0.0 108332 1504 ? S 07:36 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/localhost.badobe.com.pid mysql 2319 0.1 24.5 1470068 501356 ? Sl 07:36 0:57 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/localhost.badobe.com.err --pid-file=/var/lib/mysql/localhost.badobe.com.pid root 3579 0.0 0.1 201840 3028 pts/0 S+ 07:40 0:00 mysql -u root -p root 13887 0.0 0.1 201840 3048 pts/3 S+ 18:08 0:00 mysql -uroot -px xxxxxxxxxx root 14470 0.0 0.0 103248 840 pts/2 S+ 18:16 0:00 grep mysql [root@localhost ~]# vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 742172 76376 371064 0 0 6 6 78 202 2 1 97 1 0 0 0 0 742164 76380 371060 0 0 0 16 191 467 2 1 93 5 0 0 0 0 742164 76380 371064 0 0 0 0 148 388 2 1 98 0 0 0 0 0 742164 76380 371064 0 0 0 0 159 418 2 1 98 0 0 0 0 0 742164 76380 371064 0 0 0 0 145 380 2 1 98 0 0 0 0 0 742164 76380 371064 0 0 0 0 166 429 2 1 97 0 0 1 0 0 742164 76380 371064 0 0 0 0 148 373 2 1 98 0 0 0 0 0 742164 76380 371064 0 0 0 0 149 382 2 1 98 0 0 0 0 0 742164 76380 371064 0 0 0 0 168 408 2 0 97 0 0 0 0 0 742164 76380 371064 0 0 0 0 165 394 2 1 98 0 0 0 0 0 742164 76380 371064 0 0 0 0 159 354 2 1 98 0 0 0 0 0 742164 76388 371060 0 0 0 16 180 447 2 0 91 6 0 0 0 0 742164 76388 371064 0 0 0 0 143 344 2 1 98 0 0 0 1 0 742784 76416 370044 0 0 28 580 360 678 3 1 74 23 0 1 0 0 744768 76496 367772 0 0 40 1036 437 865 3 1 53 43 0 0 1 0 747248 76596 365412 0 0 48 1224 561 923 3 2 53 43 0 0 1 0 749232 76696 363092 0 0 32 1132 512 883 3 2 52 44 0 0 1 0 751340 76772 361020 0 0 32 1008 472 872 2 1 52 45 0 0 1 0 753448 76840 358540 0 0 36 1088 512 860 2 1 51 46 0 0 1 0 755060 76936 357636 0 0 28 1012 481 922 2 2 52 45 0 0 1 0 755060 77064 357988 0 0 12 896 444 902 2 1 53 45 0 0 1 0 754688 77148 358448 0 0 16 1096 506 1007 1 1 56 42 0 0 2 0 754192 77268 358932 0 0 12 1060 481 957 1 2 53 44 0 0 1 0 753696 77380 359392 0 0 12 1052 512 1025 2 1 55 42 0 0 1 0 751028 77480 359828 0 0 8 984 423 909 2 2 52 45 0 0 1 0 750524 77620 360200 0 0 8 788 367 869 1 2 54 44 0 0 1 0 749904 77700 360664 0 0 8 928 439 924 2 2 55 43 0 0 1 0 749408 77796 361084 0 0 12 976 468 967 1 1 56 43 0 0 1 0 748788 77896 361464 0 0 12 992 453 944 1 2 54 43 0 1 1 0 748416 77992 361996 0 0 12 784 392 868 2 1 52 46 0 0 1 0 747920 78092 362336 0 0 4 896 382 874 1 1 52 46 0 0 1 0 745252 78172 362780 0 0 12 1040 444 923 1 1 56 42 0 0 1 0 744764 78288 363220 0 0 8 1024 448 934 2 1 55 43 0 0 1 0 744144 78408 363668 0 0 8 1000 461 982 2 1 53 44 0 0 1 0 743648 78488 364148 0 0 8 872 443 888 2 1 54 43 0 0 1 0 743152 78548 364468 0 0 16 1020 511 995 2 1 55 43 0 0 1 0 742656 78632 365024 0 0 12 928 431 913 1 2 53 44 0 0 1 0 742160 78728 365468 0 0 12 996 470 955 2 2 54 44 0 1 1 0 739492 78840 365896 0 0 8 988 447 939 1 2 52 46 0 0 1 0 738872 78996 366352 0 0 12 972 442 928 1 1 55 44 0 1 1 0 738244 79148 366812 0 0 8 948 549 1126 2 2 54 43 0 0 1 0 737624 79312 367188 0 0 12 996 456 953 2 2 54 43 0 0 1 0 736880 79456 367660 0 0 12 960 444 918 1 1 53 46 0 0 1 0 736260 79584 368124 0 0 8 884 414 921 1 1 54 44 0 0 1 0 735648 79716 368488 0 0 12 976 450 955 2 1 56 41 0 0 1 0 733104 79840 368988 0 0 12 932 453 918 1 2 55 43 0 0 1 0 732608 79996 369356 0 0 16 916 444 889 1 2 54 43 0 1 1 0 731476 80128 369800 0 0 16 852 514 978 2 2 54 43 0 0 1 0 731244 80252 370200 0 0 8 904 398 870 2 1 55 43 0 1 1 0 730624 80384 370612 0 0 12 1032 447 977 1 2 57 41 0 0 1 0 730004 80524 371096 0 0 12 984 469 941 2 2 52 45 0 0 1 0 729508 80636 371544 0 0 12 928 438 922 2 1 52 46 0 0 1 0 728888 80756 371948 0 0 16 972 439 943 2 1 55 43 0 0 1 0 726468 80900 372272 0 0 8 960 545 1024 2 1 54 43 0 1 1 0 726344 81024 372272 0 0 8 464 490 1057 1 2 53 44 0 0 1 0 726096 81148 372276 0 0 4 328 441 1063 2 1 53 45 0 1 1 0 726096 81256 372292 0 0 0 296 387 975 1 1 53 45 0 0 1 0 725848 81380 372284 0 0 4 332 425 1034 2 1 54 44 0 1 1 0 725848 81496 372300 0 0 4 308 386 992 2 1 54 43 0 0 1 0 725600 81616 372296 0 0 4 328 404 1060 1 1 54 44 0 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 0 725600 81732 372296 0 0 4 328 439 1011 1 1 53 44 0 0 1 0 725476 81848 372308 0 0 0 316 441 1023 2 2 52 46 0 1 1 0 725352 81972 372300 0 0 4 344 451 1021 1 1 55 43 0 2 1 0 725228 82088 372320 0 0 0 328 427 1058 1 1 54 44 0 1 1 0 724980 82220 372300 0 0 4 336 419 999 2 1 54 44 0 1 1 0 724980 82328 372320 0 0 4 320 430 1019 1 1 54 44 0 1 1 0 724732 82436 372328 0 0 0 388 363 942 2 1 54 44 0 1 1 0 724608 82560 372312 0 0 4 308 419 993 1 2 54 44 0 1 0 0 724360 82684 372320 0 0 0 304 421 1028 2 1 55 42 0 1 0 0 724360 82684 372388 0 0 0 0 158 416 2 1 98 0 0 1 1 0 724236 82720 372360 0 0 0 6464 243 855 3 2 84 12 0 1 0 0 724112 82748 372360 0 0 0 5356 266 895 3 1 84 12 0 2 1 0 724112 82764 372380 0 0 0 3052 221 511 2 2 93 4 0 1 0 0 724112 82796 372372 0 0 0 4548 325 1067 2 2 81 16 0 1 0 0 724112 82816 372368 0 0 0 3240 259 829 3 1 90 6 0 1 0 0 724112 82836 372380 0 0 0 3260 309 822 3 2 88 8 0 1 1 0 724112 82876 372364 0 0 0 4680 326 978 3 1 77 19 0 1 0 0 724112 82884 372380 0 0 0 512 207 508 2 1 95 2 0 1 0 0 724112 82884 372388 0 0 0 0 138 361 2 1 98 0 0 1 0 0 724112 82884 372388 0 0 0 0 158 397 2 1 98 0 0 1 0 0 724112 82884 372388 0 0 0 0 146 395 2 1 98 0 0 2 0 0 724112 82884 372388 0 0 0 0 160 395 2 1 98 0 0 1 0 0 724112 82884 372388 0 0 0 0 163 382 1 1 98 0 0 1 0 0 724112 82884 372388 0 0 0 0 176 422 2 1 98 0 0 1 0 0 724112 82884 372388 0 0 0 0 134 351 2 1 98 0 0 0 0 0 724112 82884 372388 0 0 0 0 190 429 2 1 97 0 0 0 0 0 724104 82884 372392 0 0 0 0 139 358 2 1 98 0 0 0 0 0 724848 82884 372392 0 0 0 4 211 432 2 1 97 0 0 1 0 0 724980 82884 372392 0 0 0 0 166 370 2 1 98 0 0 0 0 0 724980 82884 372392 0 0 0 0 164 397 2 1 98 0 0 ^C [root@localhost ~]#

Read the article

SQL SERVER – Reducing CXPACKET Wait Stats for High Transactional Database

- by pinaldave

While engaging in a performance tuning consultation for a client, a situation occurred where they were facing a lot of CXPACKET Waits Stats. The client asked me if I could help them reduce this huge number of wait stats. I usually receive this kind of request from other client as well, but the important thing to understand is whether this question has any merits or benefits, or not. Before we continue the resolution, let us understand what CXPACKET Wait Stats are. The official definition suggests that CXPACKET Wait Stats occurs when trying to synchronize the query processor exchange iterator. You may consider lowering the degree of parallelism if a conflict concerning this wait type develops into a problem. (from BOL) In simpler words, when a parallel operation is created for SQL Query, there are multiple threads for a single query. Each query deals with a different set of the data (or rows). Due to some reasons, one or more of the threads lag behind, creating the CXPACKET Wait Stat. Threads which came first have to wait for the slower thread to finish. The Wait by a specific completed thread is called CXPACKET Wait Stat. Note that CXPACKET Wait is done by completed thread and not the one which are unfinished. “Note that not all the CXPACKET wait types are bad. You might experience a case when it totally makes sense. There might also be cases when this is also unavoidable. If you remove this particular wait type for any query, then that query may run slower because the parallel operations are disabled for the query.” Now let us see what the best practices to reduce the CXPACKET Wait Stats are. The suggestions, with which you will find that if you search online through the browser, would play a major role as and might be asked about their jobs In addition, might tell you that you should set ‘maximum degree of parallelism’ to 1. I do agree with these suggestions, too; however, I think this is not the final resolutions. As soon as you set your entire query to run on single CPU, you will get a very bad performance from the queries which are actually performing okay when using parallelism. The best suggestion to this is that you set ‘the maximum degree of parallelism’ to a lower number or 1 (be very careful with this – it can create more problems) but tune the queries which can be benefited from multiple CPU’s. You can use query hint OPTION (MAXDOP 0) to run the server to use parallelism. Here is the two-quick script which helps to resolve these issues: Change MAXDOP at Server Level EXEC sys.sp_configure N'max degree of parallelism', N'1' GO RECONFIGURE WITH OVERRIDE GO Run Query with all the CPU (using parallelism) USE AdventureWorks GO SELECT * FROM Sales.SalesOrderDetail ORDER BY ProductID OPTION (MAXDOP 0) GO Below is the blog post which will help you to find all the parallel query in your server. SQL SERVER – Find Queries using Parallelism from Cached Plan Please note running Queries in single CPU may worsen your performance and it is not recommended at all. Infect this can be very bad advise. I strongly suggest that you identify the queries which are offending and tune them instead of following any other suggestions. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL White Papers, SQLAuthority News, T SQL, Technology

Read the article

Innovative SPARC: Lighting a Fire Under Oracle's New Hardware Business

- by Paulo Folgado

"There's a certain level of things you can do with commercially available parts," says Oracle Executive Vice President Mike Splain. But, he notes, you can do so much more if you design the parts yourself. Mike Splain,EVP, OracleYou can, for example, design cryptographic accelerators into your microprocessors so customers can run their networks fully encrypted if they choose.Of course, it helps if you've already built multiple processing "cores" into those chips so they can handle all that encrypting and decrypting while still getting their other work done.System on a ChipAs the leader of Oracle Microelectronics, Mike knows how implementing clever innovations in silicon can give systems a real competitive advantage.The SPARC microprocessors that his team designed at Sun pioneered the concept of multiple cores several years ago, and the UltraSPARC T2 processor--the industry's first "system on a chip"--packs up to eight cores per chip, each running as many as eight threads at once. That's the most cores and threads of any general-purpose processor. Looking back, Mike points out that the real value of large enterprise-class servers was their ability to run a lot of very large applications in parallel."The beauty of our CMT [chip multi-threading] machines is you can get that same kind of parallel-processing capability at a much lower cost and in a much smaller footprint," he says.The Whole StackWhat has Mike excited these days is that suddenly the opportunity to innovate is much bigger as part of Oracle."In my group, we used to look up the software stack and say, 'We can do any innovation we want, provided the only thing we have to change is what's in the Solaris operating system'--or maybe Java," he says. "If we wanted to change things beyond that, we'd have to go outside the walls of Sun and we'd have to convince the vendors: 'You have to align with us, you have to test with us, you have to build for us, and then you'll reap the benefits.' Now we get access to the entire stack. We can look all the way through the stack and say, 'Okay, what would make the database go faster? What would make the middleware go faster?'"Changing the WorldMike and his microelectronics team also like the fact that Oracle is not just any software company. We're #1 in database, middleware, business intelligence, and more."We're like all the other engineers from Sun; we believe we can change the world, if we can just figure out how to get people to pay attention to us," he says. "Now there's a mechanism at Oracle--much more so than we ever had at Sun."He notes, too, that every innovation in SPARC has involved some combination of hardware and softwareoptimization."Take our cryptography framework, for example. Sure, we can accelerate rapidly, but the Solaris OS has to provide the right set of interfaces that applications can tap into," Mike says. "Same thing with our multicore architecture. We have to have software that can utilize all those threads and run in parallel." His engineers, he points out, have never been interested in producing chips that sell as mere components."Our chips are always designed to go into systems and be combined with various pieces of software," he says. "Our job is to enable the creation of systems."

Read the article

WebLogic stuck thread protection

- by doublep

By default WebLogic kills stuck threads after 15 min (600 s), this is controlled by StuckThreadMaxTime parameter. However, I cannot find more details on how exactly "stuckness" is defined. Specifically: What is the point at which 15 min countdown begins. Request processing start? Last wait()-like method? Something else? Does this apply only to request-processing threads or to all threads? I.e. can a request-processing thread "escape" this protection by spawning a worker thread for a long task? Especially, can it delegate response writing to such a worker without 15 min countdown? My usecase is download of huge files through a permission system. Since a user needs to be authenticated and have permissions to view a file, I cannot (or at least don't know how) leave this to a simple HTTP server, e.g. Apache. And because files can be huge, download could (at least in theory) take more than 15 minutes.

Read the article

Nonblocking Tcp server

- by hoodoos

It's not a question really, i'm just looking for some guidelines :) I'm currently writing some abstract tcp server which should use as low number of threads as it can. Currently it works this way. I have a thread doing listening and some worker threads. Listener thread is just sits and wait for clients to connect I expect to have a single listener thread per server instance. Worker threads are doing all read/write/processing job on clients socket. So my problem is in building efficient worker process. And I came to some problem I can't really solve yet. Worker code is something like that(code is really simple just to show a place where i have my problem): List<Socket> readSockets = new List<Socket>(); List<Socket> writeSockets = new List<Socket>(); List<Socket> errorSockets = new List<Socket>(); while( true ){ Socket.Select( readSockets, writeSockets, errorSockets, 10 ); foreach( readSocket in readSockets ){ // do reading here } foreach( writeSocket in writeSockets ){ // do writing here } // POINT2 and here's the problem i will describe below } it works all smothly accept for 100% CPU utilization because of while loop being cycling all over again, if I have my clients doing send-receive-disconnect routine it's not that painful, but if I try to keep alive doing send-receive-send-receive all over again it really eats up all CPU. So my first idea was to put a sleep there, I check if all sockets have their data send and then putting Thread.Sleep in POINT2 just for 10ms, but this 10ms later on produces a huge delay of that 10ms when I want to receive next command from client socket.. For example if I don't try to "keep alive" commands are being executed within 10-15ms and with keep alive it becomes worse by atleast 10ms :( Maybe it's just a poor architecture? What can be done so my processor won't get 100% utilization and my server to react on something appear in client socket as soon as possible? Maybe somebody can point a good example of nonblocking server and architecture it should maintain?

Read the article

multi thread apps crashes in release mode

- by etzarfat

Hello, I'm using Visual Studio 2008 (programming in c). I've a weird problem I worte a program that has 2 threads that runs simultaneously, a recording thread (using audio card to record into memory) and a translation thread (using a speech engine to recognize the words). when I run my program in debug mode (aka setting a breakpoint in the code) it runs great, however when I run in debug mode or release mode (outside the visual studio enviroment) it crashes and give me the following exception: "Unhandled exception at 0x7c911129 in LowLevel.exe: 0xC0000005: Access violation reading location 0x014c7245." My stack looks: LowLevel.exe!__set_flsgetvalue() Line 256 + 0xc bytes C LowLevel.exe!_isleadbyte_l(int c=4359676, localeinfo_struct * plocinfo=0x00000001) Line 57 C++ 014b00d8() LowLevel.exe!PlayDateOfExam(int option=1) Line 2240 + 0x7 bytes C++ LowLevel.exe!NSCThread(void * arg=0x00000000) Line 1585 + 0xb bytes C++ kernel32.dll!7c80b729() winmm.dll!76b5b294() I uses the following file in my project "nsc.lib" and WinMM.lib" I'm not really familiar with threads I used a sample (which works great) and built on it. I saw a similiar question year on the forum but I didn't really understand the answers since I'm not familiar with with threads. Can someone help me? Thanks

Read the article

evaluation of a java thread dump

- by raticulin

I got a thread dump of one of my processes. It has a bunch of these threads. I guess they are keeping a bunch of memory so I am getting OOM. "Thread-8264" prio=6 tid=0x4c94ac00 nid=0xf3c runnable [0x4fe7f000] java.lang.Thread.State: RUNNABLE at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:223) - locked <0x0c9bc640 (a java.util.zip.Inflater) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:235) at com.my.ZipExtractorCommonsCompress.extract(ZipExtractorCommonsCompress.java:48) at com.my.CustomThreadedExtractorWrapper$ExtractionThread.run(CustomThreadedExtractorWrapper.java:151) Locked ownable synchronizers: - None "Thread-8241" prio=6 tid=0x4c94a400 nid=0xb8c runnable [0x4faef000] java.lang.Thread.State: RUNNABLE at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:223) - locked <0x0c36b808 (a java.util.zip.Inflater) at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:235) at com.my.ZipExtractorCommonsCompress.extract(ZipExtractorCommonsCompress.java:48) at com.my.CustomThreadedExtractorWrapper$ExtractionThread.run(CustomThreadedExtractorWrapper.java:151) Locked ownable synchronizers: - None I am trying to find out how it arrived to this situation. CustomThreadedExtractorWrapper is a wrapper class that fires a thread to do some work (ExtractionThread, which uses ZipExtractorCommonsCompress to extract zip contents from a compressed stream). If the task is taking too long, ExtractionThread.interrupt(); is called to cancel the operation. I can see in my logs that the cancellation happened 25 times. And I see 21 of these threads in my dump. My questions: What is the status of these threads? Alive and running? Blocked somehow? They did not die with .interrupt() apparently? Is there a sure way to really kill a thread? What does really mean 'locked ' in the stack trace? Line 223 in Inflater.java is: public synchronized int inflate(byte[] b, int off, int len) { ... //return is line 223 return inflateBytes(b, off, len); }

Read the article

Hooking thread exit

- by mackenir

Is there a way for me to hook the exit of managed threads (i.e. run some code on a thread, just before it exits?) I've developed a mechanism for hooking thread exit that works for some threads. Step 1: develop a 'hook' STA COM class that takes a callback function and calls it in its destructor. Step 2: create a ThreadStatic instance of this object on the thread I want to hook, and pass the object a managed delegate converted to an unmanaged function pointer. The delegate then gets called on thread exit (since the CLR calls IUnknown::Release on all STA COM RCWs as part of thread exit). This mechanism works on, for example, worker threads that I create in code using the Thread class. However, it doesn't seem to work for the application's main thread (be it a console or windows app). The 'hook' COM object seems to be deleted too late in the shutdown process and the attempt to call the delegate fails. (The reason I want to implement this facility is so I can run some native COM code on the exiting thread that works with STA COM objects that were created on the thread, before it's 'too late' (i.e. before the thread has exited, and it's no longer possible to work with STA COM objects on that thread.))

Read the article

Surprising results with .NET multi-theading algorithm

- by Myles J

Hi, I've recently wrote a C# console time tabling algorithm that is based on a combination of a genetic algorithm with a few brute force routines thrown in. The initial results were promising but I figured I could improve the performance by splitting the brute force routines up to run in parallel on multi processor architectures. To do this I used the well documented Producer/Consumer model (as documented in this fantastic article http://www.albahari.com/threading/part2.aspx#_ProducerConsumerQWaitHandle). I changed my code to create one thread per logical processor during the brute force routines. The performance gains on my work station were very pleasing. I am running Windows XP on the following hardware: Intel Core 2 Quad CPU 2.33 GHz 3.49 GB RAM Initial tests indicated average performance gains of approx 40% when using 4 threads. The next step was to deploy the new multi-threading version of the algorithm to our higher spec UAT server. Here is the spec of our UAT server: Windows 2003 Server R2 Enterprise x64 8 cpu (Quad-Core) AMD Opteron 2.70 GHz 255 GB RAM After running the first round of tests we were all extremely surprised to find that the algorithm actually runs slower on the high spec W2003 server than on my local XP work station! In fact the tests seem to indicate that it doesn't matter how many threads are generated (tests were ran with the app spawning between 2 to 32 threads). The algorithm always runs significantly slower on the UAT W2003 server? How could this be? Surely the app should run faster on a 8 cpu (Quad-Core) than my 2 Quad work station? Why are we seeing no performance gains with the multi-threading on the W2003 server whilst the XP workstation tests show gains of up to 40%? Any help or pointers would be appreciated. Regards Myles

Read the article

Does this use of Monitor.Wait/Pulse have a race condition?

- by jw

I have a simple producer/consumer scenario, where there is only ever a single item being produced/consumed. Also, the producer waits for the worker thread to finish before continuing. I realize that kind of obviates the whole point of multithreading, but please just assume it really needs to be this way (: This code doesn't compile, but I hope you get the idea: // m_data is initially null // This could be called by any number of producer threads simultaneously void SetData(object foo) { lock(x) // Line A { assert(m_data == null); m_data = foo; Monitor.Pulse(x) // Line B while(m_data != null) Monitor.Wait(x) // Line C } } // This is only ever called by a single worker thread void UseData() { lock(x) // Line D { while(m_data == null) Monitor.Wait(x) // Line E // here, do something with m_data m_data = null; Monitor.Pulse(x) // Line F } } Here is the situation that I am not sure about: Suppose many threads call SetData() with different inputs. Only one of them will get inside the lock, and the rest will be blocked on Line A. Suppose the one that got inside the lock sets m_data and makes its way to Line C. Question: Could the Wait() on Line C allow another thread at Line A to obtain the lock and overwrite m_data before the worker thread even gets to it? Supposing that doesn't happen, and the worker thread processes the original m_data, and eventually makes its way to Line F, what happens when that Pulse() goes off? Will only the thread waiting on Line C be able to get the lock? Or will it be competing with all the other threads waiting on Line A as well? Essentially, I want to know if Pulse()/Wait() communicate with each other specially "under the hood" or if they are on the same level with lock(). The solution to these problems, if they exist, is obvious of course - just surround SetData() with another lock - say, lock(y). I'm just curious if it's even an issue to begin with.

Read the article

Solving problems involving more complex data structures with CUDA

- by Nils

So I read a bit about CUDA and GPU programming. I noticed a few things such that access to global memory is slow (therefore shared memory should be used) and that the execution path of threads in a warp should not diverge. I also looked at the (dense) matrix multiplication example, described in the programmers manual and the nbody problem. And the trick with the implementation seems to be the same: Arrange the calculation in a grid (which it already is in case of the matrix mul); then subdivide the grid into smaller tiles; fetch the tiles into shared memory and let the threads calculate as long as possible, until it needs to reload data from the global memory into shared memory. In case of the nbody problem the calculation for each body-body interaction is exactly the same (page 682): bodyBodyInteraction(float4 bi, float4 bj, float3 ai) It takes two bodies and an acceleration vectors. The body vector has four components it's position and the weight. When reading the paper, the calculation is understood easily. But what is if we have a more complex object, with a dynamic data structure? For now just assume that we have an object (similar to the body object presented in the paper) which has a list of other objects attached and the number of objects attached is different in each thread. How could I implement that without having the execution paths of the threads to diverge? I'm also looking for literature which explains how different algorithms involving more complex data structures can be effectively implemented in CUDA.

Read the article

Rails model relations depending on count of nested relations

- by Lowgain

I am putting together a messaging system for a rails app I am working on. I am building it in a similar fashion to facebook's system, so messages are grouped into threads, etc. My related models are: MsgThread - main container of a thread Message - each message/reply in thread Recipience - ties to user to define which users should subscribe to this thread Read - determines whether or not a user has read a specific message My relationships look like class User < ActiveRecord::Base #stuff... has_many :msg_threads, :foreign_key => 'originator_id' #threads the user has started has_many :recipiences has_many :subscribed_threads, :through => :recipiences, :source => :msg_thread #threads the user is subscribed to end class MsgThread < ActiveRecord::Base has_many :messages has_many :recipiences belongs_to :originator, :class_name => "User", :foreign_key => "originator_id" end class Recipience < ActiveRecord::Base belongs_to :user belongs_to :msg_thread end class Message < ActiveRecord::Base belongs_to :msg_thread belongs_to :author, :class_name => "User", :foreign_key => "author_id" end class Read < ActiveRecord::Base belongs_to :user belongs_to :message end I'd like to create a new selector in the user sort of like: has_many :updated_threads, :through => :recipiencies, :source => :msg_thread, :conditions => {THREAD CONTAINS MESSAGES WHICH ARE UNREAD (have no 'read' models tying a user to a message)} I was thinking of either writing a long condition with multiple joins, or possibly writing giving the model an updated_threads method to return this, but I'd like to see if there is an easier way first. Any ideas? Also, if there is something fundamentally wrong with my structure for this functionality let me know! Thanks!!

Search Results

Search found 3641 results on 146 pages for 'threads'.

Page 46/146 | < Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >

- by user65374

- by Singlet

- by user1684726

- by Rob Stevenson-Leggett

- by Reed

- by Los Frijoles

- by tonyrogerson

- by Stefan Hinker

- by Tim Cooper

- by Daniel Moth

- by Roo

- by VanillaH

- by JaneZhang(???)

- by user1032531

- by pinaldave

- by Paulo Folgado

- by doublep

- by hoodoos

- by etzarfat

- by raticulin

- by mackenir

- by Myles J

- by jw

- by Nils

- by Lowgain

< Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >