Search Results

Search found 854 results on 35 pages for 'cores'.

Page 3/35 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Processes sharing cores on Ubuntu system

    - by muckabout
    My coworkers and I share an 8-core server running Ubuntu for our batch processes. I tend to run 4 processes at a time, each of which consumes 100% CPU per core when nothing else is running. When a coworker runs his processes (typically about 4 at a time), his also get 100% per. However, when both of us run ours (he always goes first), his still get 100% and mine seem to divide the remaining processing power and linger in the 10-40% range. I even reniced his process to a lower value and it did not change. What are the issues that may cause this?

    Read the article

  • Can processor cores thrash each other's caches?

    - by Jørgen Fogh
    If more than one core on a processor is accessing the same memory address, will they thrash each other's caches or will some snooping protocol allow each to keep the data in L1-cache? I am interested in a general answer as well as answers for specific processors. How many layers of cache are invalidated? Will accessing another address within the same cache-line invalidate the entire line? What can you do to alleviate these problems?

    Read the article

  • Any rerefence of CPU world statistics?

    - by Áxel Costas Pena
    I am looking for any referencee about computer power statistics across the world. My main interest is about real computing capabilities, so I'd prefer information about real processor power, and even best if it includes also other critical hardware statistics, like RAM memory, but if it isn't possible, maybe statistics about brand/model distribution will be also useful. I've Googled for some minutes and I've found nothing related.

    Read the article

  • Do Hyper-V guests see multiple CPUs (sockets) or multiple CPU cores when assigned more than 1 vCPU?

    - by Filip Kierzek
    I have SQL Server 2008 Express running on Hyper-V based virtual machine with two vCPU-s. I've just been reading up on SQL Server 2012 Express and noticed that it's CPU is "Limited to lesser of 1 Socket or 4 cores" (http://msdn.microsoft.com/en-us/library/cc645993(v=SQL.110).aspx) My question is how do the SQL Server 2012 limits on CPUs/Cores translate into vCPU-s? Are they "processors" or are they "cores"?

    Read the article

  • How many virtual processors or cores should I assign to my Guest OS?

    - by reidLinden
    I've just received an upgraded Host machine, and am looking to push some of those advances to my workstations Guest OS(s). In particular, I used to have a single processor, with 2 cores, so my Guest OS only had 1/1. Now, I've got a single processor with 8 cores, so I'm curious about what would be recommended for my Guest OS now? 1 processor/4 cores? 2 processors/2 cores? 4 processors/1 core? My instinct says to stick with the number of physical processors (or less), but, is that based on reality? I spent a good while looking for an answer to this, but perhaps my google-karma isn't in my favor today.

    Read the article

  • Les processeurs multi-cores pourraient gagner en performances, grâce à des communications internes six fois plus rapides

    Les processeurs multi-cores pourraient gagner en performances, grâce à des communications internes six fois plus rapides Des chercheurs américains, de l'Université de la Caroline du Nord, on fait une découverte qui pourrait améliorer les performances des puces multicores. En effet, la majorité des processeurs actuels de 4 ou 8 coeurs, connaissent des difficultés à voir leurs cores communiquer directement entre eux. Mais une alternative serait possible. Au lieu de devoir passer par la mémoire pour envoyer et récupérer des données, plusieurs coeurs travaillant sur la même tâche pourraient bénéficier d'une accélération matérielle. En effet, les ingénieurs développant le projet ont mis au point une HAQu (h...

    Read the article

  • Why lock-free data structures just aren't lock-free enough

    - by Alex.Davies
    Today's post will explore why the current ways to communicate between threads don't scale, and show you a possible way to build scalable parallel programming on top of shared memory. The problem with shared memory Soon, we will have dozens, hundreds and then millions of cores in our computers. It's inevitable, because individual cores just can't get much faster. At some point, that's going to mean that we have to rethink our architecture entirely, as millions of cores can't all access a shared memory space efficiently. But millions of cores are still a long way off, and in the meantime we'll see machines with dozens of cores, struggling with shared memory. Alex's tip: The best way for an application to make use of that increasing parallel power is to use a concurrency model like actors, that deals with synchronisation issues for you. Then, the maintainer of the actors framework can find the most efficient way to coordinate access to shared memory to allow your actors to pass messages to each other efficiently. At the moment, NAct uses the .NET thread pool and a few locks to marshal messages. It works well on dual and quad core machines, but it won't scale to more cores. Every time we use a lock, our core performs an atomic memory operation (eg. CAS) on a cell of memory representing the lock, so it's sure that no other core can possibly have that lock. This is very fast when the lock isn't contended, but we need to notify all the other cores, in case they held the cell of memory in a cache. As the number of cores increases, the total cost of a lock increases linearly. A lot of work has been done on "lock-free" data structures, which avoid locks by using atomic memory operations directly. These give fairly dramatic performance improvements, particularly on systems with a few (2 to 4) cores. The .NET 4 concurrent collections in System.Collections.Concurrent are mostly lock-free. However, lock-free data structures still don't scale indefinitely, because any use of an atomic memory operation still involves every core in the system. A sync-free data structure Some concurrent data structures are possible to write in a completely synchronization-free way, without using any atomic memory operations. One useful example is a single producer, single consumer (SPSC) queue. It's easy to write a sync-free fixed size SPSC queue using a circular buffer*. Slightly trickier is a queue that grows as needed. You can use a linked list to represent the queue, but if you leave the nodes to be garbage collected once you're done with them, the GC will need to involve all the cores in collecting the finished nodes. Instead, I've implemented a proof of concept inspired by this intel article which reuses the nodes by putting them in a second queue to send back to the producer. * In all these cases, you need to use memory barriers correctly, but these are local to a core, so don't have the same scalability problems as atomic memory operations. Performance tests I tried benchmarking my SPSC queue against the .NET ConcurrentQueue, and against a standard Queue protected by locks. In some ways, this isn't a fair comparison, because both of these support multiple producers and multiple consumers, but I'll come to that later. I started on my dual-core laptop, running a simple test that had one thread producing 64 bit integers, and another consuming them, to measure the pure overhead of the queue. So, nothing very interesting here. Both concurrent collections perform better than the lock-based one as expected, but there's not a lot to choose between the ConcurrentQueue and my SPSC queue. I was a little disappointed, but then, the .NET Framework team spent a lot longer optimising it than I did. So I dug out a more powerful machine that Red Gate's DBA tools team had been using for testing. It is a 6 core Intel i7 machine with hyperthreading, adding up to 12 logical cores. Now the results get more interesting. As I increased the number of producer-consumer pairs to 6 (to saturate all 12 logical cores), the locking approach was slow, and got even slower, as you'd expect. What I didn't expect to be so clear was the drop-off in performance of the lock-free ConcurrentQueue. I could see the machine only using about 20% of available CPU cycles when it should have been saturated. My interpretation is that as all the cores used atomic memory operations to safely access the queue, they ended up spending most of the time notifying each other about cache lines that need invalidating. The sync-free approach scaled perfectly, despite still working via shared memory, which after all, should still be a bottleneck. I can't quite believe that the results are so clear, so if you can think of any other effects that might cause them, please comment! Obviously, this benchmark isn't realistic because we're only measuring the overhead of the queue. Any real workload, even on a machine with 12 cores, would dwarf the overhead, and there'd be no point worrying about this effect. But would that be true on a machine with 100 cores? Still to be solved. The trouble is, you can't build many concurrent algorithms using only an SPSC queue to communicate. In particular, I can't see a way to build something as general purpose as actors on top of just SPSC queues. Fundamentally, an actor needs to be able to receive messages from multiple other actors, which seems to need an MPSC queue. I've been thinking about ways to build a sync-free MPSC queue out of multiple SPSC queues and some kind of sign-up mechanism. Hopefully I'll have something to tell you about soon, but leave a comment if you have any ideas.

    Read the article

  • Why do some software not get load balanced even when there are multiple cores?

    - by Nav
    While VTune Analyzer was running on a blade server with 8 cores, I observed the cpu useage percentage using mpstat -P ALL 1. mpstat showed me that VTune was taking up 100% of a single core, while all other cores were idle. Why does that happen? Shouldn't the OS (RHEL Server 5.2) automatically distribute load across cores? The same happened when I tried running MATLAB (even after enabling multithreading support in the MATLAB settings). p.s: I'm a developer. Not a sys admin. So felt it better to ask here rather than at serverfault.

    Read the article

  • Restrict whole system on certain cores except a few process?

    - by icando
    Hi I am running some latency sensitive program on a Linux machine (more specifically, CentOS 6), and I don't want the threads of the process being preempted. So in my plan, the first step is to set cpu affinity of the threads so that threads are running on separate cores, so they don't preempt each other. Then the second step is to make sure other processes in the system not running on these cores. So my question is: is it possible to restrict the whole system running on certain cores, except this process? This should apply to any newly created processes in the future.

    Read the article

  • tomcat multithreading problem

    - by jutky
    Hi all I'm writing a java application that runs in Tomcat, on a multi-core hardware. The application executes an algorithm and returns the answer to the user. The problem is that even when I run two requests simultaneously, the tomcat process uses at most one CPU core. As far as I understand each request in Tomcat is executed in separate thread, and JVM should run each thread on separate CPU core. What could be the problem that bounds the JVM or Tomcat to use no more than one core? Thanks in advance.

    Read the article

  • What is the *correct* term for a program that makes use of multiple hardware processor cores?

    - by Ryan Thompson
    I want to say that my program is capable of splitting some work across multiple CPU cores on a single system. What is the simple term for this? It's not multi-threaded, because that doesn't automatically imply that the threads run in parallel. It's not multi-process, because multiprocessing seems to be a property of a computer system, not a program. "capable of parallel operation" seems too wordy, and with all the confusion of terminology, I'm not even sure if it's accurate. So is there a simple term for this?

    Read the article

  • Output the number of cores and speed of a server?

    - by Sam
    I have access to another college's standalone server and am running several experiments on it. However, I don't know how many cores or the speed of the cores in the machine. Is there a way to get that information through the command line? Right now I'm accessing it through SSH.

    Read the article

  • Recommend a free temperature-monitoring utility for cores + video card, on Vista?

    - by smci
    Looking for your recommendations for a free temperature-monitoring utility, for my PC (Core 2) and graphics card for Vista. (Question reposted with the hyperlinks now I have 10 reputation). I don't want all the geeky details, I don't overclock, I don't see the need to mess with my fan speeds or motherboard settings, I just want something fairly basic to help with basic troubleshooting of intermittent overheats on video card and/or mobo: must run on Windows Vista (yes, don't laugh). ideally displays temperature when minimized to toolbar, and/or: automatically alerts me when temperature on either core or the video card exceeds a threshold ideally measures temperature of video card and system as well, not just the cores. HDD temperature is not necessary I think. logging is nice, graphs are also nice portability to Linux and Mac is nice Apparently Everest is the best paid option, but I'm not prepared to spend $40. I found the following free options, but no head-to-head at-a-glance comparison: CoreTemp (only does cores, not video card?) Open Hardware Monitor (nice graphs, displays when minimized to toolbar, no alerts) RealTemp (has alerts, works minimized, lightweight install) HWMonitor (no alerts, CNET: "[free version is] simple but effective") from CPUID CPUCool (not free: 21-day trialware, then $18) SpeedFan from Almico (too geeky, detail overload; CNET: "most users won't be able to make head or tail of the data this utility provides") Motherboard Monitor (CNET: not recommended, requires expert knowledge of your mobo, dangerous) Intel Thermal Analysis Tool (only does cores, not video card? has logging) Useful discussions I found: hardwarecanucks.com , superuser.com 1, 2 , forums.techarena.in (Update: I downloaded Real Temp 3.60 and it meets all my needs, the customizable alert temperature is great. Open Hardware Monitor seems to be the other one that mostly meets my needs, except no alerts; but it is portable. I tried SpeedFan but the interface is very cluttered, too much unnecessary detail (needs a Basic/Advanced mode and a revamp of the interface.) The answer to my underlying issue is nVidia Geforce LE 7500 video card which runs very hot.)

    Read the article

  • More CPU cores may not always lead to better performance – MAXDOP and query memory distribution in spotlight

    - by sqlworkshops
    More hardware normally delivers better performance, but there are exceptions where it can hinder performance. Understanding these exceptions and working around it is a major part of SQL Server performance tuning.   When a memory allocating query executes in parallel, SQL Server distributes memory to each task that is executing part of the query in parallel. In our example the sort operator that executes in parallel divides the memory across all tasks assuming even distribution of rows. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.   In reality, how often are column values evenly distributed, think about an example; are employees working for your company distributed evenly across all the Zip codes or mainly concentrated in the headquarters? What happens when you sort result set based on Zip codes? Do all products in the catalog sell equally or are few products hot selling items?   One of my customers tested the below example on a 24 core server with various MAXDOP settings and here are the results:MAXDOP 1: CPU time = 1185 ms, elapsed time = 1188 msMAXDOP 4: CPU time = 1981 ms, elapsed time = 1568 msMAXDOP 8: CPU time = 1918 ms, elapsed time = 1619 msMAXDOP 12: CPU time = 2367 ms, elapsed time = 2258 msMAXDOP 16: CPU time = 2540 ms, elapsed time = 2579 msMAXDOP 20: CPU time = 2470 ms, elapsed time = 2534 msMAXDOP 0: CPU time = 2809 ms, elapsed time = 2721 ms - all 24 cores.In the above test, when the data was evenly distributed, the elapsed time of parallel query was always lower than serial query.   Why does the query get slower and slower with more CPU cores / higher MAXDOP? Maybe you can answer this question after reading the article; let me know: [email protected].   Well you get the point, let’s see an example.   The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.   Let’s update the Employees table with 49 out of 50 employees located in Zip code 2001. update Employees set Zip = EmployeeID / 400 + 1 where EmployeeID % 50 = 1 update Employees set Zip = 2001 where EmployeeID % 50 != 1 go update statistics Employees with fullscan go   Let’s create the temporary table #FireDrill with all possible Zip codes. drop table #FireDrill go create table #FireDrill (Zip int primary key) insert into #FireDrill select distinct Zip from Employees update statistics #FireDrill with fullscan go  Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --First serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 1) goThe query took 1011 ms to complete.   The execution plan shows the 77816 KB of memory was granted while the estimated rows were 799624.  No Sort Warnings in SQL Server Profiler.  Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 0) go The query took 1912 ms to complete.  The execution plan shows the 79360 KB of memory was granted while the estimated rows were 799624.  The estimated number of rows between serial and parallel plan are the same. The parallel plan has slightly more memory granted due to additional overhead. Sort properties shows the rows are unevenly distributed over the 4 threads.   Sort Warnings in SQL Server Profiler.   Intermediate Summary: The reason for the higher duration with parallel plan was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001. Now let’s update the Employees table and distribute employees evenly across all Zip codes.   update Employees set Zip = EmployeeID / 400 + 1 go update statistics Employees with fullscan go  Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --Serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 1) go   The query took 751 ms to complete.  The execution plan shows the 77816 KB of memory was granted while the estimated rows were 784707.  No Sort Warnings in SQL Server Profiler.   Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 0) go The query took 661 ms to complete.  The execution plan shows the 79360 KB of memory was granted while the estimated rows were 784707.  Sort properties shows the rows are evenly distributed over the 4 threads. No Sort Warnings in SQL Server Profiler.    Intermediate Summary: When employees were distributed unevenly, concentrated on 1 Zip code, parallel sort spilled while serial sort performed well without spilling to tempdb. When the employees were distributed evenly across all Zip codes, parallel sort and serial sort did not spill to tempdb. This shows uneven data distribution may affect the performance of some parallel queries negatively. For detailed discussion of memory allocation, refer to webcasts available at www.sqlworkshops.com/webcasts.     Some of you might conclude from the above execution times that parallel query is not faster even when there is no spill. Below you can see when we are joining limited amount of Zip codes, parallel query will be fasted since it can use Bitmap Filtering.   Let’s update the Employees table with 49 out of 50 employees located in Zip code 2001. update Employees set Zip = EmployeeID / 400 + 1 where EmployeeID % 50 = 1 update Employees set Zip = 2001 where EmployeeID % 50 != 1 go update statistics Employees with fullscan go  Let’s create the temporary table #FireDrill with limited Zip codes. drop table #FireDrill go create table #FireDrill (Zip int primary key) insert into #FireDrill select distinct Zip       from Employees where Zip between 1800 and 2001 update statistics #FireDrill with fullscan go  Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --Serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 1) go The query took 989 ms to complete.  The execution plan shows the 77816 KB of memory was granted while the estimated rows were 785594. No Sort Warnings in SQL Server Profiler.  Now let’s execute the query in parallel with MAXDOP 0. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 0) go The query took 1799 ms to complete.  The execution plan shows the 79360 KB of memory was granted while the estimated rows were 785594.  Sort Warnings in SQL Server Profiler.    The estimated number of rows between serial and parallel plan are the same. The parallel plan has slightly more memory granted due to additional overhead.  Intermediate Summary: The reason for the higher duration with parallel plan even with limited amount of Zip codes was sort spill. This is due to uneven distribution of employees over Zip codes, especially concentration of 49 out of 50 employees in Zip code 2001.   Now let’s update the Employees table and distribute employees evenly across all Zip codes. update Employees set Zip = EmployeeID / 400 + 1 go update statistics Employees with fullscan go Let’s execute the query serially with MAXDOP 1. --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --Serially with MAXDOP 1 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 1) go The query took 250  ms to complete.  The execution plan shows the 9016 KB of memory was granted while the estimated rows were 79973.8.  No Sort Warnings in SQL Server Profiler.  Now let’s execute the query in parallel with MAXDOP 0.  --Example provided by www.sqlworkshops.com --Execute query with uneven Zip code distribution --In parallel with MAXDOP 0 set statistics time on go declare @EmployeeID int, @EmployeeName varchar(48),@zip int select @EmployeeName = e.EmployeeName, @zip = e.Zip from Employees e       inner join #FireDrill fd on (e.Zip = fd.Zip)       order by e.Zip option (maxdop 0) go The query took 85 ms to complete.  The execution plan shows the 13152 KB of memory was granted while the estimated rows were 784707.  No Sort Warnings in SQL Server Profiler.    Here you see, parallel query is much faster than serial query since SQL Server is using Bitmap Filtering to eliminate rows before the hash join.   Parallel queries are very good for performance, but in some cases it can hinder performance. If one identifies the reason for these hindrances, then it is possible to get the best out of parallelism. I covered many aspects of monitoring and tuning parallel queries in webcasts (www.sqlworkshops.com/webcasts) and articles (www.sqlworkshops.com/articles). I suggest you to watch the webcasts and read the articles to better understand how to identify and tune parallel query performance issues.   Summary: One has to avoid sort spill over tempdb and the chances of spills are higher when a query executes in parallel with uneven data distribution. Parallel query brings its own advantage, reduced elapsed time and reduced work with Bitmap Filtering. So it is important to understand how to avoid spills over tempdb and when to execute a query in parallel.   I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.   Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.   Disclaimer and copyright information:This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to R Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.   Register for the upcoming 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Hands-on Workshop in London, United Kingdom during March 15-17, 2011, click here to register / Microsoft UK TechNet.These are hands-on workshops with a maximum of 12 participants and not lectures. For consulting engagements click here.   R Meyyappan [email protected] LinkedIn: http://at.linkedin.com/in/rmeyyappan  

    Read the article

  • Is there any difference between processor and core?

    - by Salvador
    The following two command seems to give me different information about the same hardware srs@ubuntu:~$ cat /proc/cpuinfo | grep -e processor -e cores processor : 0 cpu cores : 4 processor : 1 cpu cores : 4 processor : 2 cpu cores : 4 processor : 3 cpu cores : 4 srs@ubuntu:~$ sudo dmidecode -t processor # dmidecode 2.9 SMBIOS 2.6 present. Handle 0x0004, DMI type 4, 42 bytes Processor Information Socket Designation: LGA1155 Type: Central Processor Family: <OUT OF SPEC> Manufacturer: Intel ID: A7 06 02 00 FF FB EB BF Version: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz Voltage: 1.0 V External Clock: 100 MHz Max Speed: 3800 MHz Current Speed: 3300 MHz Status: Populated, Enabled Upgrade: Other L1 Cache Handle: 0x0005 L2 Cache Handle: 0x0006 L3 Cache Handle: 0x0007 Serial Number: To Be Filled By O.E.M. Asset Tag: To Be Filled By O.E.M. Part Number: To Be Filled By O.E.M. Core Count: 4 Core Enabled: 1 Characteristics: 64-bit capable Until today I thought I had a single processor with 4 independent cores. I also thought that within each core can be used different threads.

    Read the article

  • How to optimally configure memcache running on 16 cores 144G ram server?

    - by Ivko Maksimovic
    Memcache is the only important app running on the server Server has 16 cores and 144G RAM Memcache is given 135G Memcache runs at 32 threads Gigabit network, test shows at least 300Mbit/s availability on network port 600 connections 3000 requests per second Say that memcache (memory) usage is at 50% - it's definitely not full As we increase number of requests towards server, requests slow down (from 8ms to 100ms per request) but server load remains 0.00. We suspect this can be solved by adjusting configuration but we don't understand many of the configuration parameters (besides, maybe, the number of threads). Any ideas?

    Read the article

  • How do I make use of multiple cores in Large SQL Server Queries?

    - by Jonathan Beerhalter
    I have two SQL Servers, one for production, and one as an archive. Every night, we've got a SQL job that runs and copies the days production data over to the archive. As we've grown, this process takes longer and longer and longer. When I watch the utilization on the archive server running the archival process, I see that it only ever makes use of a single core. And since this box has eight cores, this is a huge waste of resources. The job runs at 3AM, so it's free to take any and all resources it can find. So what I need to do if figure out how to structure SQL Server jobs so they can take advantage of multiple cores, but I can't find any literature on tackling this problem. We're running SQL Server 2005, but I could certainly push for an upgrade if 2008 takes of this problem.

    Read the article

  • Unlocking AMD Phenom II X2 550 Black Edition cores - what are the risks?

    - by Vilx-
    I've got the above mentioned CPU and a GigaByte GA-MA790XT-UD4P motherboard, which should be capable of unlocking the extra two cores - if I'm lucky and they're not faulty. The Internet is full of instructions on how to do that. What I don't have is spare money to buy new hardware if I brick something. What are the risks when attempting to do this? Is it completely safe, or can I be left with an expensive pile of junk? What should I keep in mind when doing so? Bigger cooler maybe (I'm running with the default box cooler)? Should I lower the frequencies too? I've never done any OC before.

    Read the article

  • How much processor speed and cores do I need for these tasks?

    - by ajay
    I am planning to buy a new laptop as I find my current one very slow. My question here is specifically related to RAM size and CPU power. I will mostly be doing development (not much games). I would be dabbling in distributed computing, multithreaded and data intensive parallelizable tasks on multi-cores. For e.g. I would want to be able to Concurrent programming in Scala/Java/Clojure etc. and be able to see parallelization. Furthermore, I would want the RAM to be enough. But from a developer machine standpoint, do you think 4GB RAM and 2.53GHz Dual Core processor would be enough. I'm basically looking at this model: http://store.apple.com/us/configure/MC118LL/A?mco=MTM3NDcyODk (link dead)

    Read the article

  • How to decide the optimal number of ruby thin/mongrel instances for a server, number of cores?

    - by Amala
    We are trying to deploy mongrel instances on a machine. What is the optimal number of mongrel instances for a server? Since an instance can handle concurrent connections, I do not see any benefit in starting more than 1 per core. Any more than that and the threads will just fight for CPU. Our predecessors have assigned 10 instances for 4 cores, but I think it will just cause CPU contention. Any definitive answers / opinions? I have seen this question: How many mongrel instances? But it is really not specific enough.

    Read the article

  • Das T5-4 TPC-H Ergebnis naeher betrachtet

    - by Stefan Hinker
    Inzwischen haben vermutlich viele das neue TPC-H Ergebnis der SPARC T5-4 gesehen, das am 7. Juni bei der TPC eingereicht wurde.  Die wesentlichen Punkte dieses Benchmarks wurden wie gewohnt bereits von unserer Benchmark-Truppe auf  "BestPerf" zusammengefasst.  Es gibt aber noch einiges mehr, das eine naehere Betrachtung lohnt. Skalierbarkeit Das TPC raet von einem Vergleich von TPC-H Ergebnissen in unterschiedlichen Groessenklassen ab.  Aber auch innerhalb der 3000GB-Klasse ist es interessant: SPARC T4-4 mit 4 CPUs (32 Cores mit 3.0 GHz) liefert 205,792 QphH. SPARC T5-4 mit 4 CPUs (64 Cores mit 3.6 GHz) liefert 409,721 QphH. Das heisst, es fehlen lediglich 1863 QphH oder 0.45% zu 100% Skalierbarkeit, wenn man davon ausgeht, dass die doppelte Anzahl Kerne das doppelte Ergebnis liefern sollte.  Etwas anspruchsvoller, koennte man natuerlich auch einen Faktor von 2.4 erwarten, wenn man die hoehere Taktrate mit beruecksichtigt.  Das wuerde die Latte auf 493901 QphH legen.  Dann waere die SPARC T5-4 bei 83%.  Damit stellt sich die Frage: Was hat hier nicht skaliert?  Vermutlich der Plattenspeicher!  Auch hier lohnt sich eine naehere Betrachtung: Plattenspeicher Im Bericht auf BestPerf und auch im Full Disclosure Report der TPC stehen einige interessante Details zum Plattenspeicher und der Konfiguration.   In der Konfiguration der SPARC T4-4 wurden 12 2540-M2 Arrays verwendet, die jeweils ca. 1.5 GB/s Durchsatz liefert, insgesamt also eta 18 GB/s.  Dabei waren die Arrays offensichtlich mit jeweils 2 Kabeln pro Array direkt an die 24 8GBit FC-Ports des Servers angeschlossen.  Mit den 2x 8GBit Ports pro Array koennte man so ein theoretisches Maximum von 2GB/s erreichen.  Tatsaechlich wurden 1.5GB/s geliefert, was so ziemlich dem realistischen Maximum entsprechen duerfte. Fuer den Lauf mit der SPARC T5-4 wurden doppelt so viele Platten verwendet.  Dafuer wurden die 2540-M2 Arrays mit je einem zusaetzlichen Plattentray erweitert.  Mit dieser Konfiguration wurde dann (laut BestPerf) ein Maximaldurchsatz von 33 GB/s erreicht - nicht ganz das doppelte des SPARC T4-4 Laufs.  Um tatsaechlich den doppelten Durchsatz (36 GB/s) zu liefern, haette jedes der 12 Arrays 3 GB/s ueber seine 4 8GBit Ports liefern muessen.  Im FDR stehen nur 12 dual-port FC HBAs, was die Verwendung der Brocade FC Switches erklaert: Es wurden alle 4 8GBit ports jedes Arrays an die Switches angeschlossen, die die Datenstroeme dann in die 24 16GBit HBA ports des Servers buendelten.  Das theoretische Maximum jedes Storage-Arrays waere nun 4 GB/s.  Wenn man jedoch den Protokoll- und "Realitaets"-Overhead mit einrechnet, sind die tatsaechlich gelieferten 2.75 GB/s gar nicht schlecht.  Mit diesen Zahlen im Hinterkopf ist die Verdopplung des SPARC T4-4 Ergebnisses eine gute Leistung - und gleichzeitig eine gute Erklaerung, warum nicht bis zum 2.4-fachen skaliert wurde. Nebenbei bemerkt: Weder die SPARC T4-4 noch die SPARC T5-4 hatten in der gemessenen Konfiguration irgendwelche Flash-Devices. Mitbewerb Seit die T4 Systeme auf dem Markt sind, bemuehen sich unsere Mitbewerber redlich darum, ueberall den Eindruck zu hinterlassen, die Leistung des SPARC CPU-Kerns waere weiterhin mangelhaft.  Auch scheinen sie ueberzeugt zu sein, dass (ueber)grosse Caches und hohe Taktraten die einzigen Schluessel zu echter Server Performance seien.  Wenn ich mir nun jedoch die oeffentlichen TPC-H Ergebnisse ansehe, sehe ich dies: TPC-H @3000GB, Non-Clustered Systems System QphH SPARC T5-4 3.6 GHz SPARC T5 4/64 – 2048 GB 409,721.8 SPARC T4-4 3.0 GHz SPARC T4 4/32 – 1024 GB 205,792.0 IBM Power 780 4.1 GHz POWER7 8/32 – 1024 GB 192,001.1 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64 – 512 GB 162,601.7 Kurz zusammengefasst: Mit 32 Kernen (mit 3 GHz und 4MB L3 Cache), liefert die SPARC T4-4 mehr QphH@3000GB ab als IBM mit ihrer 32 Kern Power7 (bei 4.1 GHz und 32MB L3 Cache) und auch mehr als HP mit einem 64 Kern Intel Xeon System (2.27 GHz und 24MB L3 Cache).  Ich frage mich, wo genau SPARC hier mangelhaft ist? Nun koennte man natuerlich argumentieren, dass beide Ergebnisse nicht gerade neu sind.  Nun, in Ermangelung neuerer Ergebnisse kann man ja mal ein wenig spekulieren: IBMs aktueller Performance Report listet die o.g. IBM Power 780 mit einem rPerf Wert von 425.5.  Ein passendes Nachfolgesystem mit Power7+ CPUs waere die Power 780+ mit 64 Kernen, verfuegbar mit 3.72 GHz.  Sie wird mit einem rPerf Wert von  690.1 angegeben, also 1.62x mehr.  Wenn man also annimmt, dass Plattenspeicher nicht der limitierende Faktor ist (IBM hat mit 177 SSDs getestet, sie duerfen das gerne auf 400 erhoehen) und IBMs eigene Leistungsabschaetzung zugrunde legt, darf man ein theoretisches Ergebnis von 311398 QphH@3000GB erwarten.  Das waere dann allerdings immer noch weit von dem Ergebnis der SPARC T5-4 entfernt, und gerade in der von IBM so geschaetzen "per core" Metric noch weniger vorteilhaft. In der x86-Welt sieht es nicht besser aus.  Leider gibt es von Intel keine so praktischen rPerf-Tabellen.  Daher muss ich hier fuer eine Schaetzung auf SPECint_rate2006 zurueckgreifen.  (Ich bin kein grosser Fan von solchen Kreuz- und Querschaetzungen.  Insb. SPECcpu ist nicht besonders geeignet, um Datenbank-Leistung abzuschaetzen, da fast kein IO im Spiel ist.)  Das o.g. HP System wird bei SPEC mit 1580 CINT2006_rate gelistet.  Das bis einschl. 2013-06-14 beste Resultat fuer den neuen Intel Xeon E7-4870 mit 8 CPUs ist 2180 CINT2006_rate.  Das ist immerhin 1.38x besser.  (Wenn man nur die Taktrate beruecksichtigen wuerde, waere man bei 1.32x.)  Hier weiter zu rechnen, ist muessig, aber fuer die ungeduldigen Leser hier eine kleine tabellarische Zusammenfassung: TPC-H @3000GB Performance Spekulationen System QphH* Verbesserung gegenueber der frueheren Generation SPARC T4-4 32 cores SPARC T4 205,792 2x SPARC T5-464 cores SPARC T5 409,721 IBM Power 780 32 cores Power7 192,001 1.62x IBM Power 780+ 64 cores Power7+  311,398* HP ProLiant DL980 G764 cores Intel Xeon X7560 162,601 1.38x HP ProLiant DL980 G780 cores Intel Xeon E7-4870    224,348* * Keine echten Resultate  - spekulative Werte auf der Grundlage von rPerf (Power7+) oder SPECint_rate2006 (HP) Natuerlich sind IBM oder HP herzlich eingeladen, diese Werte zu widerlegen.  Aber stand heute warte ich noch auf aktuelle Benchmark Veroffentlichungen in diesem Datensegment. Was koennen wir also zusammenfassen? Es gibt einige Hinweise, dass der Plattenspeicher der begrenzende Faktor war, der die SPARC T5-4 daran hinderte, auf jenseits von 2x zu skalieren Der Mythos, dass SPARC Kerne keine Leistung bringen, ist genau das - ein Mythos.  Wie sieht es umgekehrt eigentlich mit einem TPC-H Ergebnis fuer die Power7+ aus? Cache ist nicht der magische Performance-Schalter, fuer den ihn manche Leute offenbar halten. Ein System, eine CPU-Architektur und ein Betriebsystem jenseits einer gewissen Grenze zu skalieren ist schwer.  In der x86-Welt scheint es noch ein wenig schwerer zu sein. Was fehlt?  Nun, das Thema Preis/Leistung ueberlasse ich gerne den Verkaeufern ;-) Und zu guter Letzt: Nein, ich habe mich nicht ins Marketing versetzen lassen.  Aber manchmal kann ich mich einfach nicht zurueckhalten... Disclosure Statements The views expressed on this blog are my own and do not necessarily reflect the views of Oracle. TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org, results as of 6/7/13. Prices are in USD. SPARC T5-4 409,721.8 QphH@3000GB, $3.94/QphH@3000GB, available 9/24/13, 4 processors, 64 cores, 512 threads; SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads. SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of June 18, 2013 from www.spec.org. HP ProLiant DL980 G7 (2.27 GHz, Intel Xeon X7560): 1580 SPECint_rate2006; HP ProLiant DL980 G7 (2.4 GHz, Intel Xeon E7-4870): 2180 SPECint_rate2006,

    Read the article

  • 32 core (each physical core) 2.2 GhZ or 12 core (6 physical cores) 3.0GHZ?

    - by Tejaswi Rana
    I am working on a multithreaded application (Forex trading app built on C#) and had the client upgrade from the 12 core 3.0GHZ machine (Intel) to a 32 core 2.2 Ghz machine (AMD). The PassMark benchmark results were significantly higher when using multicores doing Integer, Floating and other calculations while for a single core calculation it was a bit slower than the pack (others that were being compared to with similar config as the 12 core one). Oh it also comes with 64 GB RAM (4 times as the other one) and a much faster SSD. So after configuring and running the application on that machine, not only did it not perform as well, it was significantly slower. We're talking about 30seconds - 1 minute slower on an app that usually completes processing within 5-20 secs. The application uses MAX DEGREE of PARALLELISM (TPL) which I've tried setting to number of cores and also half of that. I've also tried running single threaded and without setting any limits in parallel threading. While it may be the hardware has some issues, I am wondering if the CPU processing speed is the issue. I can overclock to 3.0 GHZ. But is that even a good idea? Server Info - AMD http://www.passmark.com/forum/showthread.php?4013-AMD-Dual-6272-performance-is-60-lower-than-benchmarks Seems that benchmark was wrong to start with - officially. Intel i7 3930k OS (same in both) Windows 7 Professional 64-bit

    Read the article

  • Why does this Java code not utilize all CPU cores?

    - by ReneS
    The attached simple Java code should load all available cpu core when starting it with the right parameters. So for instance, you start it with java VMTest 8 int 0 and it will start 8 threads that do nothing else than looping and adding 2 to an integer. Something that runs in registers and not even allocates new memory. The problem we are facing now is, that we do not get a 24 core machine loaded (AMD 2 sockets with 12 cores each), when running this simple program (with 24 threads of course). Similar things happen with 2 programs each 12 threads or smaller machines. So our suspicion is that the JVM (Sun JDK 6u20 on Linux x64) does not scale well. Did anyone see similar things or has the ability to run it and report whether or not it runs well on his/her machine (= 8 cores only please)? Ideas? I tried that on Amazon EC2 with 8 cores too, but the virtual machine seems to run different from a real box, so the loading behaves totally strange. package com.test; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; public class VMTest { public class IntTask implements Runnable { @Override public void run() { int i = 0; while (true) { i = i + 2; } } } public class StringTask implements Runnable { @Override public void run() { int i = 0; String s; while (true) { i++; s = "s" + Integer.valueOf(i); } } } public class ArrayTask implements Runnable { private final int size; public ArrayTask(int size) { this.size = size; } @Override public void run() { int i = 0; String[] s; while (true) { i++; s = new String[size]; } } } public void doIt(String[] args) throws InterruptedException { final String command = args[1].trim(); ExecutorService executor = Executors.newFixedThreadPool(Integer.valueOf(args[0])); for (int i = 0; i < Integer.valueOf(args[0]); i++) { Runnable runnable = null; if (command.equalsIgnoreCase("int")) { runnable = new IntTask(); } else if (command.equalsIgnoreCase("string")) { runnable = new StringTask(); } Future<?> submit = executor.submit(runnable); } executor.awaitTermination(1, TimeUnit.HOURS); } public static void main(String[] args) throws InterruptedException { if (args.length < 3) { System.err.println("Usage: VMTest threadCount taskDef size"); System.err.println("threadCount: Number 1..n"); System.err.println("taskDef: int string array"); System.err.println("size: size of memory allocation for array, "); System.exit(-1); } new VMTest().doIt(args); } }

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >