Search Results

Search found 19055 results on 763 pages for 'high performance'.

Page 112/763 | < Previous Page | 108 109 110 111 112 113 114 115 116 117 118 119  | Next Page >

  • AWS Free Usage Tier + Cloudflare... possible?

    - by crashintoty
    If I throw my MySQL/PHP app up on a Amazon EC2 instance (using their AWS Free Usage Tier program) and couple it with CloudFlare (the free plan of course) roughly how many daily visitors can I comfortably handle before performance starts to suffer? Just looking for a rough estimate or educated guess - I understand this setup might be less than ideal but I'm still very curious nonetheless. Thanks in advance

    Read the article

  • Clean install vs disk image

    - by Thanos
    Once a year I am making a clean install on windows, in order to keep my system fast. After posting a question on making a bootable windows usb with exe programs where I was adviced to make a disk image, a new question rose. What is the difference in making a disk image and performing a clean install on windows? Which is better in terms of speed, general performance, value for time and transfering between different computers?

    Read the article

  • How to measure TCP connection time in Linux

    - by Paul Draper
    I want to measure the overhead in creating a TCP connection. I know of many tools like hping and netperf, but they seem oriented at measuring latency. I want to know how long the 3-way handshake takes, and allocating any buffers, etc., and then closing it. So I want to open a real, legitimate TCP connection, and then close it. Are there any tools that will do that and help me measure performance?

    Read the article

  • How to get Ubuntu to perform better on an older computer?

    - by alex
    Ubuntu 9.1 runs quite slugglish on my old laptop from 2004. Slower than Windows XP that was on there. It has 512mb RAM and probably 1.2ghz (can't remember) CPU. I have turned off Visual Effects under Appearance Preferences. Are there any other tricks to get better performance, or do I just need a better computer to try Ubuntu? Thanks

    Read the article

  • Can a SQL Server have a CPU bottleneck when Processor Time is under 30%

    - by Sleepless
    Is it in principle possible for the CPU to be the bottleneck on a SQL Server if the Performance Counter Processor:Processor Time is constantly under 30% on all cores? Or does low Processor Time automatically allow me to rule out the CPU as a potential trouble source? I am asking this because SQL Nexus lists CPU as the top bottleneck on a server with low Processor Time values.

    Read the article

  • Performance in backpropagation algorithm

    - by Taban
    I've written a matlab program for standard backpropagation algorithm, it is my homework and I should not use matlab toolbox, so I write the entire code by myself. This link helped me for backpropagation algorithm. I have a data set of 40 random number and initial weights randomly. As output, I want to see a diagram that shows the performance. I used mse and plot function to see performance for 20 epochs but the result is this: I heard that performance should go up through backpropagation, so I want to know is there any problem with my code or this result is normal because local minimums. This is my code: Hidden_node=inputdlg('Enter the number of Hidden nodes'); a=0.5;%initialize learning rate hiddenn=str2num(Hidden_node{1,1}); randn('seed',0); %creating data set s=2; N=10; m=[5 -5 5 5;-5 -5 5 -5]; S = s*eye(2); [l,c] = size(m); x = []; % Creating the training set for i = 1:c x = [x mvnrnd(m(:,i)',S,N)']; end % target value toutput=[ones(1,N) zeros(1,N) ones(1,N) zeros(1,N)]; for epoch=1:20; %number of epochs for kk=1:40; %number of patterns %initial weights of hidden layer for ii=1 : 2; for jj=1 :hiddenn; whidden{ii,jj}=rand(1); end end initial the wights of output layer for ii=1 : hiddenn; woutput{ii,1}=rand(1); end for ii=1:hiddenn; x1=x(1,kk); x2=x(2,kk); w1=whidden{1,ii}; w2=whidden{2,ii}; activation{1,ii}=(x1(1,1)*w1(1,1))+(x2(1,1)*w2(1,1)); end %calculate output of hidden nodes for ii=1:hiddenn; hidden_to_out{1,ii}=logsig(activation{1,ii}); end activation_O{1,1}=0; for jj=1:hiddenn; activation_O{1,1} = activation_O{1,1}+(hidden_to_out{1,jj}*woutput{jj,1}); end %calculate output out{1,1}=logsig(activation_O{1,1}); out_for_plot(1,kk)= out{1,ii}; %calculate error for output node delta_out{1,1}=(toutput(1,kk)-out{1,1}); %update weight of output node for ii=1:hiddenn; woutput{ii,jj}=woutput{ii,jj}+delta_out{1,jj}*hidden_to_out{1,ii}*dlogsig(activation_O{1,jj},logsig(activation_O{1,jj}))*a; end %calculate error of hidden nodes for ii=1:hiddenn; delta_hidden{1,ii}=woutput{ii,1}*delta_out{1,1}; end %update weight of hidden nodes for ii=1:hiddenn; for jj=1:2; whidden{jj,ii}= whidden{jj,ii}+(delta_hidden{1,ii}*dlogsig(activation{1,ii},logsig(activation{1,ii}))*x(jj,kk)*a); end end a=a/(1.1);%decrease learning rate end %calculate performance e=toutput(1,kk)-out_for_plot(1,1); perf(1,epoch)=mse(e); end plot(perf); Thanks a lot.

    Read the article

  • Will high reputation in Programmers help to get a good job?

    - by Lorenzo
    In reference to this question, do you think that having a high reputation on this site will help to get a good job? Aside silly and humorous questions, on Programmers we can see a lot of high quality theory questions. I think that, if Stack Overflow will eventually evolve in "strictly programming related" (which usually is "strictly coding related"), the questions on Programmers will be much more interesting and meaningful ("Stack Overflow" = "I have this specific coding/implementation issue"; "Programmers" = "Best practices, team shaping, paradigms, CS theory"). So could high reputation on this site help (or at least be a good reference)? And then, more o less than Stack Overflow?

    Read the article

  • implementing a high level function in a script to call a low level function in the game engine

    - by eat_a_lemon
    In my 2d game engine I have a function that does pathfinding, find_shortest_path. It executes for each time step in the game loop and calculates the next coordinate pair in the series of coordinates to reach the destination object. Now I want to call this function in a scripting language and have it only return the last coordinate pair result. I want the game engine to go about the business of rendering the incremental steps but I don't want the high level script to care about the rendering. The high level script is only for ai game logic. Now I know how to bind a method from C to python but how can I signal and coordinate the wait time between the incremental steps without the high level function returning until its time for the last step?

    Read the article

  • optimizing iPhone OpenGL ES fill rate

    - by NateS
    I have an Open GL ES game on the iPhone. My framerate is pretty sucky, ~20fps. Using the Xcode OpenGL ES performance tool on an iPhone 3G, it shows: Renderer Utilization: 95% to 99% Tiler Utilization: ~27% I am drawing a lot of pretty large images with a lot of blending. If I reduce the number of images drawn, framerates go from ~20 to ~40, though the performance tool results stay about the same (renderer still maxed). I think I'm being limited by the fill rate of the iPhone 3G, but I'm not sure. My questions are: How can I determine with more granularity where the bottleneck is? That is my biggest problem, I just don't know what is taking all the time. If it is fillrate, is there anything I do to improve it besides just drawing less? I am using texture atlases. I have tried to minimize image binds, though it isn't always possible (drawing order, not everything fits on one 1024x1024 texture, etc). Every frame I do 10 image binds. This seem pretty reasonable, but I could be mistaken. I'm using vertex arrays and glDrawArrays. I don't really have a lot of geometry. I can try to be more precise if needed. Each image is 2 triangles and I try to batch things were possible, though often (maybe half the time) images are drawn with individual glDrawArrays calls. Besides the images, I have ~60 triangles worth of geometry being rendered in ~6 glDrawArrays calls. I often glTranslate before calling glDrawArrays. Would it improve the framerate to switch to VBOs? I don't think it is a huge amount of geometry, but maybe it is faster for other reasons? Are there certain things to watch out for that could reduce performance? Eg, should I avoid glTranslate, glColor4g, etc? I'm using glScissor in a 3 places per frame. Each use consists of 2 glScissor calls, one to set it up, and one to reset it to what it was. I don't know if there is much of a performance impact here. If I used PVRTC would it be able to render faster? Currently all my images are GL_RGBA. I don't have memory issues. Here is a rough idea of what I'm drawing, in this order: 1) Switch to perspective matrix. 2) Draw a full screen background image 3) Draw a full screen image with translucency (this one has a scrolling texture). 4) Draw a few sprites. 5) Switch to ortho matrix. 6) Draw a few sprites. 7) Switch to perspective matrix. 8) Draw sprites and some other textured geometry. 9) Switch to ortho matrix. 10) Draw a few sprites (eg, game HUD). Steps 1-6 draw a bunch of background stuff. 8 draws most of the game content. 10 draws the HUD. As you can see, there are many layers, some of them full screen and some of the sprites are pretty large (1/4 of the screen). The layers use translucency, so I have to draw them in back-to-front order. This is further complicated by needing to draw various layers in ortho and others in perspective. I will gladly provide additional information if reqested. Thanks in advance for any performance tips or general advice on my problem!

    Read the article

  • Performance of Cluster Shared Volume file copy from SAN

    - by Sequenzia
    I am hoping someone can help me out with a strange issue. We are running a Microsoft Failover Cluster with Server 2008 R2 and an Equallogic PS4000 SAN. Our main configuration has 2 Dell Poweredge T710 Servers in the cluster. We have CSV and Quorm setup. The servers each have 10 Broadcom 1Gb NICs. Right now 4 of the NICS are on the iSCSI network for accessing the SAN. They use MPIO and the Dell HIT pack. We have 5 VMs running on each node and everything runs smooth. No noticeable performance issues or anything. From the SAN I can see the 4 iSCSI connections from each server to each volume (CSV and Quorm). Again, it seems to perform great. The problem I am running into is with backups. I have tried a few backup programs like backupchain and Veeam. The problem is both of them are very very slow to backup the VMs. For instance I have a 500GB (fixed disc) VHD that’s running on the cluster. It takes over 18 hours to backup that VHD and that’s with compression and depuping turned off which is supposed to be the fasted. We also have a separate server that is just for backups. It has a lot of directed attached storage. As part of the troubleshooting I decided to bring that server into the cluster as a node. It now has access to the CSV and can read from C:\clusterstorage\volume1 which is where our VHDs live. This backup server only has 2 NICs. 1 NIC is going to the iSCSI network and the other is just on the main network. It has Intel NICS in it without any sort of MPIO or teaming. So with the 3rd server now in the cluster I started doing some benchmarking. I have a test VHD that’s about 7GBs that’s stored in the CSV. I have tested file copying that VHD from all 3 servers to directed attached storage in the respective server. The 2 Dell servers that are the main nodes in the cluster (they house the VMs) are reading that file at about 20Mbs/Sec. Which at that rate is way to slow for the backups. The other server which only has 1 NIC to the SAN is reading at around 100Mbs/Sec. I spent a few hours on the phone with Dell today about this . We went through all kind of tests and he was pretty dumb founded. He really has no idea why that server with only 1 NIC is reading about 5 times as fast as the servers with 4 NICS and MPIO. We looked at the network utilization of the NICs while the file copy was going on. The servers with the 4 NICs had a small increase of activity during the file copy but they only went up to around 8-10% on all 4 NICs. The other server with the 1 NIC jumped up to over 80% during the file copy. I plan on doing some more testing after hours and calling Dell back tomorrow but I really am confused (and so is Dell’s support rep) why I cannot get faster file copy access to the CSV on those servers. Anyone have any input on this? Any feedback would be greatly appreciated. Thanks in advance.

    Read the article

  • Language+IDE for teaching high school students?

    - by daveagp
    I'm investigating languages and IDEs for a project involving teaching high-school students (around grade 11). It will teach basics of programming as an introduction to computer science (e.g., including how numbers/strings/characters are represented, using procedures and arrays, control flow, a little bit of algorithms, only very basic I/O). The non-negotiable requirements for this project are: a free up-to-date cross-platform IDE (Win & Mac incl. 64-bit) with debug a compiler where it's easy to learn from your mistakes together with the IDE, a gentle installation+learning curve So far, the best options I see are the following. Are there others I should know about? I am giving a short explanation with each one to generally show what I am looking for. In order from most to least promising: Pascal + FreePascal IDE (it seems a little buggy but actively developed?) Python + Eclipse + PyDev (good but features are overwhelming/hard to navigate) Groovy + Eclipse ('') Python + IDLE (looks unnatural to do debugging, to me) Pascal + Lazarus (IDE overwhelming, e.g. not obvious how to "start from scratch") Preferably, as a rule of thumb, the language should be direct enough that you don't need to wrap every program in a class, don't need to reference a System object to println, etc. I tried a little bit to see if there is something in JavaScript or (non-Visual) Basic along the lines of what I want, but found nothing so far. I would say that C/C++/C#, Java, Ruby, Lisp, VB do not fit my criteria for languages for this project. To reiterate my questions: are any of those 5 options really awesome or un-awesome? Are there other options which are even MORE awesome? Anything for Basic or JavaScript which meets all of the criteria? Thanks!

    Read the article

  • java distributed cache for low latency, high availability

    - by Shahbaz
    I've never used distributed caches/DHTs like memcached, jboss cache, ehcache, etc. I'm wondering which, if any, is appropriate for my use. First, I'm not doing web applications (as most of these project seem to be geared towards web apps). I write servers (Order Management Systems actually) for financial trading firms. The servers themselves are not too complicated. They need to receive information (market data, orders, executions, etc.) rout them to their destination while possibly transforming some of these messages. I am looking at these products to solve the following problems: * Safe repository of the state of the server. I'd rather build the logic of my application as a bunch of transformers (similar to Apache Camel) and store the state in a 'safe' place * This repository should be distributed: in case one of these data stores crashes, one or two more should be up and I should be able to switch to them seamlessly * This repository should be fast. Single digits milliseconds count here, in other words, systems which consume/process this data are automated systems, not humans clicking on links. This system needs to have high-throughput and low latency. By sending my data outside the process, I am necessarily slowing performance, but I am trying to balance absolute raw speed and absolute protection of data. * This repository should be safe. Similar to the point about several on-line backups, this system needs to write data to disk (potentially more than one disk). I'd really like to stop writing my own 'transaction servers.' Am I correct to be looking into projects such as jboss cache, ehcache, etc.? Thanks

    Read the article

  • Optimising RSS parsing on App Engine to avoid high CPU warnings

    - by Danny Tuppeny
    I'm pulling some RSS feeds into a datastore in App Engine to serve up to an iPhone app. I use cron to schedule updating the RSS every x minutes. Each task only parses one RSS feed (which has 15-20 items). I frequently get warnings about high CPU usage in the App Engine dashboard, so I'm looking for ways to optimise my code. Currently, I use minidom (since it's already there on App Engine), but I suspect it's not very efficient! Here's the code: dom = minidom.parseString(urlfetch.fetch(url).content) if dom: items = [] for node in dom.getElementsByTagName('item'): item = RssItem( key_name = self.getText(node.getElementsByTagName('guid')[0].childNodes), title = self.getText(node.getElementsByTagName('title')[0].childNodes), description = self.getText(node.getElementsByTagName('description')[0].childNodes), modified = datetime.now(), link = self.getText(node.getElementsByTagName('link')[0].childNodes), categories = [self.getText(category.childNodes) for category in node.getElementsByTagName('category')] ); items.append(item); db.put(items); def getText(self, nodelist): rc = '' for node in nodelist: if node.nodeType == node.TEXT_NODE: rc = rc + node.data return rc There isn't much going on, but the scripts often take 2-6 seconds CPU time, which seems a bit excessive for looping through 20ish items and reading a few attributes. What can I do to make this faster? Is there anything particularly bad in the above code, or should I change to another way of parsing? Are there are any libraries (that work on App Engine) that would be better, or would I be better parsing the RSS myself?

    Read the article

  • Threads, Sockets, and Designing Low-Latency, High Concurrency Servers

    - by lazyconfabulator
    I've been thinking a lot lately about low-latency, high concurrency servers. Specifically, http servers. http servers (fast ones, anyway) can serve thousands of users simultaneously, with very little latency. So how do they do it? As near as I can tell, they all use events. Cherokee and Lighttpd use libevent. Nginx uses it's own event library performing much the same function of libevent, that is, picking a platform optimal strategy for polling events (like kqueue on *bsd, epoll on linux, /dev/poll on Solaris, etc). They all also seem to employ a strategy of multiprocess or multithread once the connection is made - using worker threads to handle the more cpu intensive tasks while another thread continues to listen and handle connections (via events). This is the extent of my understanding and ability to grok the thousand line sources of these applications. What I really want are finer details about how this all works. In examples of using events I've seen (and written) the events are handling both input and output. To this end, do the workers employ some sort of input/output queue to the event handling thread? Or are these worker threads handling their own input and output? I imagine a fixed amount of worker threads are spawned, and connections are lined up and served on demand, but how does the event thread feed these connections to the workers? I've read about FIFO queues and circular buffers, but I've yet to see any implementations to work from. Are there any? Do any use compare-and-swap instructions to avoid locking or is locking less detrimental to event polling than I think? Or have I misread the design entirely? Ultimately, I'd like to take enough away to improve some of my own event-driven network services. Bonus points to anyone providing solid implementation details (especially for stuff like low-latency queues) in C, as that's the language my network services are written in.

    Read the article

  • Which linear programming package should I use for high numbers of constraints and "warm starts"

    - by davidsd
    I have a "continuous" linear programming problem that involves maximizing a linear function over a curved convex space. In typical LP problems, the convex space is a polytope, but in this case the convex space is piecewise curved -- that is, it has faces, edges, and vertices, but the edges aren't straight and the faces aren't flat. Instead of being specified by a finite number of linear inequalities, I have a continuously infinite number. I'm currently dealing with this by approximating the surface by a polytope, which means discretizing the continuously infinite constraints into a very large finite number of constraints. I'm also in the situation where I'd like to know how the answer changes under small perturbations to the underlying problem. Thus, I'd like to be able to supply an initial condition to the solver based on a nearby solution. I believe this capability is called a "warm start." Can someone help me distinguish between the various LP packages out there? I'm not so concerned with user-friendliness as speed (for large numbers of constraints), high-precision arithmetic, and warm starts. Thanks!

    Read the article

  • High Runtime for Dictionary.Add for a large amount of items

    - by aaginor
    Hi folks, I have a C#-Application that stores data from a TextFile in a Dictionary-Object. The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed. Here is my function: private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections) { Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count); foreach (NodeConnection con in connections) { ... resultSet.Add(nodeIdPair, newEdge); } return resultSet; } In my tests, I insert ~300k items. I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. It is the same as when I initialize the Dictionary with new Dictionary(); (about 0.256 ms on average for each Add). This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). For the first 20k items, the average time for Add is 0.03 ms for each item. Any idea, how to make the add-operation faster? Thanks in advance, Frank

    Read the article

  • Alternatives to LINQ To SQL on high loaded pages

    - by Alex
    To begin with, I LOVE LINQ TO SQL. It's so much easier to use than direct querying. But, there's one great problem: it doesn't work well on high loaded requests. I have some actions in my ASP.NET MVC project, that are called hundreds times every minute. I used to have LINQ to SQL there, but since the amount of requests is gigantic, LINQ TO SQL almost always returned "Row not found or changed" or "X of X updates failed". And it's understandable. For instance, I have to increase some value by one with every request. var stat = DB.Stats.First(); stat.Visits++; // .... DB.SubmitChanges(); But while ASP.NET was working on those //... instructions, the stats.Visits value stored in the table got changed. I found a solution, I created a stored procedure UPDATE Stats SET Visits=Visits+1 It works well. Unfortunately now I'm getting more and more moments like that. And it sucks to create stored procedures for all cases. So my question is, how to solve this problem? Are there any alternatives that can work here? I hear that Stackoverflow works with LINQ to SQL. And it's more loaded than my site.

    Read the article

  • How to convert string with double high/wide characters to normal string [VC++6]

    - by Shaitan00
    My application typically recieves a string in the following format: " Item $5.69 " Some contants I always expect: - the LENGHT always 20 characters - the start index of the text always [5] - and most importantly the index of the DECIMAL for the price always [14] In order to identify this string correctly I validate all the expected contants listed above .... Some of my clients have now started sending the string with Doube-High / Double-Wide values (pair of characters which represent a single readable character) similar to the following: " Item $x80x90.x81x91x82x92 " For testing I simply scan the string character-by-character, compare char[i] and char[i+1] and replace these pairs with their corresponding single character when a match is found (works fine) as follows: [Code] for (int i=0; i < sData.length(); i++) { char ch = sData[i] & 0xFF; char ch2 = sData[i+1] & 0xFF; if (ch == '\x80' && ch2 == '\x90') zData.replace("\x80\x90", "0"); else if (ch == '\x81' && ch2 == '\x91') zData.replace("\x81\x91", "1"); else if (ch == '\x82' && ch2 == '\x92') zData.replace("\x82\x92", "2"); ... ... ... } [/Code] But the result is something like this: " Item $5.69 " Notice how this no longer matches my expectation: the lenght is now 17 (instead of 20) due to the 3 conversions and the decimal is now at index 13 (instead of 14) due to the conversion of the "5" before the decimal point. Ideally I would like to convert the string to a normal readable format keeping the constants (length, index of text, index of decimal) at the same place (so the rest of my application is re-usable) ... or any other suggestion (I'm pretty much stuck with this)... Is there a STANDARD way of dealing with these type of characters? Any help would be greatly appreciated, I've been stuck on this for a while now ... Thanks,

    Read the article

  • node.js UDP data lost at high package rates

    - by koleto
    I am observing a significant data-lost on a UDP connection with node.js 0.6.18 and 0.8.0 . It appears at high packet rates about 1200 packet per second with frames about 1500 byte limit. Each data packages has a incrementing number so it easy to track the number of lost packages. var server = dgram.createSocket("udp4"); server.on("message", function (message, rinfo) { //~processData(message); //~ writeData(message, null, 5000); }).bind(10001); On the receiving callback I tested two cases I first saved 5000 packages in a file. The result ware no dropped packages. After I have included a data processing routine and got about 50% drop rate. What I expected was that the process data routine should be completely asynchronous and should not introduce dead time to the system, since it is a simple parser to process binary data in the package and to emits events to a further processing routine. It seems that the parsing routine introduce dead time in which the event handler is unable to handle each packets. At the low package rates (< 1200 packages/sec) there are no data lost observed! Is this a bug or I am doing something wrong?

    Read the article

  • C++ bughunt - High-score insertion in a vector crashes the program

    - by Francisco P.
    Hello, everyone! I have a game I'm working on. My players are stored in a vector, and, at the end of the game, the game crashes when trying to insert the high-scores in the correct positions. Here's what I have (please ignore the portuguese comments, the code is pretty straightforward :P): //TOTAL_HIGHSCORES is the max. number of hiscores that i'm willing to store. This is set as 10. bool Game::updateHiScores() { bool stopIterating; bool scoresChanged = false; //Se ainda nao existirem TOTAL_HISCORES melhores pontuacoes ou se a pontuacao for melhor que uma das existentes for (size_t i = 0; i < players.size(); ++i) { //&& !(players[i].isAI()) if (players[i].getScoreValue() > 0 && (hiScores.size() < TOTAL_HISCORES || hiScores.back() < players[i].getScore())) { scoresChanged = true; if(hiScores.empty() || hiScores.back() >= players[i].getScore()) hiScores.push_back(players[i].getScore()); else { //Ciclo que encontra e insere a pontuacao no lugar desejado stopIterating = false; for(vector<Score>::iterator it = hiScores.begin(); it < hiScores.end() && !(stopIterating); ++it) { if(*it <= players[i].getScore()) { //E inserida na posicao 'it' o Score correspondente hiScores.insert(it, players[i].getScore()); //Verifica se o comprimento do vector esta dentro do desejado, se nao estiver, este e rectificado if (hiScores.size() > TOTAL_HISCORES) hiScores.pop_back(); stopIterating = true; } } } } } if (scoresChanged) sort(hiScores.begin(), hiScores.end(), higher); return scoresChanged; } What am I doing wrong here? Thanks for your time, fellas.

    Read the article

  • Improving SAS multipath to JBOD performance on Linux

    - by user36825
    Hello all I'm trying to optimize a storage setup on some Sun hardware with Linux. Any thoughts would be greatly appreciated. We have the following hardware: Sun Blade X6270 2* LSISAS1068E SAS controllers 2* Sun J4400 JBODs with 1 TB disks (24 disks per JBOD) Fedora Core 12 2.6.33 release kernel from FC13 (also tried with latest 2.6.31 kernel from FC12, same results) Here's the datasheet for the SAS hardware: http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf It's using PCI Express 1.0a, 8x lanes. With a bandwidth of 250 MB/sec per lane, we should be able to do 2000 MB/sec per SAS controller. Each controller can do 3 Gb/sec per port and has two 4 port PHYs. We connect both PHYs from a controller to a JBOD. So between the JBOD and the controller we have 2 PHYs * 4 SAS ports * 3 Gb/sec = 24 Gb/sec of bandwidth, which is more than the PCI Express bandwidth. With write caching enabled and when doing big writes, each disk can sustain about 80 MB/sec (near the start of the disk). With 24 disks, that means we should be able to do 1920 MB/sec per JBOD. multipath { rr_min_io 100 uid 0 path_grouping_policy multibus failback manual path_selector "round-robin 0" rr_weight priorities alias somealias no_path_retry queue mode 0644 gid 0 wwid somewwid } I tried values of 50, 100, 1000 for rr_min_io, but it doesn't seem to make much difference. Along with varying rr_min_io I tried adding some delay between starting the dd's to prevent all of them writing over the same PHY at the same time, but this didn't make any difference, so I think the I/O's are getting properly spread out. According to /proc/interrupts, the SAS controllers are using a "IR-IO-APIC-fasteoi" interrupt scheme. For some reason only core #0 in the machine is handling these interrupts. I can improve performance slightly by assigning a separate core to handle the interrupts for each SAS controller: echo 2 /proc/irq/24/smp_affinity echo 4 /proc/irq/26/smp_affinity Using dd to write to the disk generates "Function call interrupts" (no idea what these are), which are handled by core #4, so I keep other processes off this core too. I run 48 dd's (one for each disk), assigning them to cores not dealing with interrupts like so: taskset -c somecore dd if=/dev/zero of=/dev/mapper/mpathx oflag=direct bs=128M oflag=direct prevents any kind of buffer cache from getting involved. None of my cores seem maxed out. The cores dealing with interrupts are mostly idle and all the other cores are waiting on I/O as one would expect. Cpu0 : 0.0%us, 1.0%sy, 0.0%ni, 91.2%id, 7.5%wa, 0.0%hi, 0.2%si, 0.0%st Cpu1 : 0.0%us, 0.8%sy, 0.0%ni, 93.0%id, 0.2%wa, 0.0%hi, 6.0%si, 0.0%st Cpu2 : 0.0%us, 0.6%sy, 0.0%ni, 94.4%id, 0.1%wa, 0.0%hi, 4.8%si, 0.0%st Cpu3 : 0.0%us, 7.5%sy, 0.0%ni, 36.3%id, 56.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.3%sy, 0.0%ni, 85.7%id, 4.9%wa, 0.0%hi, 8.1%si, 0.0%st Cpu5 : 0.1%us, 5.5%sy, 0.0%ni, 36.2%id, 58.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 5.0%sy, 0.0%ni, 36.3%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 5.1%sy, 0.0%ni, 36.3%id, 58.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 0.1%us, 8.3%sy, 0.0%ni, 27.2%id, 64.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 0.1%us, 7.9%sy, 0.0%ni, 36.2%id, 55.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 0.0%us, 7.8%sy, 0.0%ni, 36.2%id, 56.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 0.0%us, 7.3%sy, 0.0%ni, 36.3%id, 56.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 0.0%us, 5.6%sy, 0.0%ni, 33.1%id, 61.2%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 0.1%us, 5.3%sy, 0.0%ni, 36.1%id, 58.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 0.0%us, 4.9%sy, 0.0%ni, 36.4%id, 58.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 0.1%us, 5.4%sy, 0.0%ni, 36.5%id, 58.1%wa, 0.0%hi, 0.0%si, 0.0%st Given all this, the throughput reported by running "dstat 10" is in the range of 2200-2300 MB/sec. Given the math above I would expect something in the range of 2*1920 ~= 3600+ MB/sec. Does anybody have any idea where my missing bandwidth went? Thanks!

    Read the article

  • Slow filesystem access

    - by danneh3826
    I'm trying to diagnose a slow filesystem issue on a server I look after. It's been ongoing for quite some time, and I've run out of ideas as to what I can try. Here's the thick of it. The server itself is a Dell Poweredge T310. It has 4 SAS hard drives in it, configured at RAID5, and is running Citrix XenServer 5.6. The VM is a (relatively) old Debian 5.0.6 installation. It's given 4 cores, and 4Gb's of RAM. It has 3 volumes. A 10Gb volume (ext3) for the system, 980Gb volume (xfs) for data (~94% full), and another 200Gb volume (xfs) for data (~13% full). Now here's the weird thing. Read/write access to the 980Gb volume is really slow. I might get 5Mb/s out of it if I'm lucky. At first I figured it was actually disk access in the system, or at a hypervisor level, but ruled that out entirely as other VMs on the same host are running perfectly fine (a good couple hundred Mb/s disk r/w access). So then I started to target this particular VM. I started thinking it was XFS, but to prove it I wasn't going to attempt to change the filesystem on the 980Gb drive, with years and years of billions of files on there. So I provisioned the 200Gb drive, and did the same read/write test (basically dd), and got a good couple hundred Mb/s r/w access to it. So that ruled out the VM, the hardware, and the filesystem type. There's also a lot of these in /var/log/kern.log; (sorry, this is quite long) Sep 4 10:16:59 uriel kernel: [32571790.564689] httpd: page allocation failure. order:5, mode:0x4020 Sep 4 10:16:59 uriel kernel: [32571790.564693] Pid: 7318, comm: httpd Not tainted 2.6.32-4-686-bigmem #1 Sep 4 10:16:59 uriel kernel: [32571790.564696] Call Trace: Sep 4 10:16:59 uriel kernel: [32571790.564705] [<c1092a4d>] ? __alloc_pages_nodemask+0x476/0x4e0 Sep 4 10:16:59 uriel kernel: [32571790.564711] [<c1092ac3>] ? __get_free_pages+0xc/0x17 Sep 4 10:16:59 uriel kernel: [32571790.564716] [<c10b632e>] ? __kmalloc+0x30/0x128 Sep 4 10:16:59 uriel kernel: [32571790.564722] [<c11dd774>] ? pskb_expand_head+0x4f/0x157 Sep 4 10:16:59 uriel kernel: [32571790.564727] [<c11ddbbf>] ? __pskb_pull_tail+0x41/0x1fb Sep 4 10:16:59 uriel kernel: [32571790.564732] [<c11e4882>] ? dev_queue_xmit+0xe4/0x38e Sep 4 10:16:59 uriel kernel: [32571790.564738] [<c1205902>] ? ip_finish_output+0x0/0x5c Sep 4 10:16:59 uriel kernel: [32571790.564742] [<c12058c7>] ? ip_finish_output2+0x187/0x1c2 Sep 4 10:16:59 uriel kernel: [32571790.564747] [<c1204dc8>] ? ip_local_out+0x15/0x17 Sep 4 10:16:59 uriel kernel: [32571790.564751] [<c12055a9>] ? ip_queue_xmit+0x31e/0x379 Sep 4 10:16:59 uriel kernel: [32571790.564758] [<c1279a90>] ? _spin_lock_bh+0x8/0x1e Sep 4 10:16:59 uriel kernel: [32571790.564767] [<eda15a8d>] ? __nf_ct_refresh_acct+0x66/0xa4 [nf_conntrack] Sep 4 10:16:59 uriel kernel: [32571790.564773] [<c103bf42>] ? _local_bh_enable_ip+0x16/0x6e Sep 4 10:16:59 uriel kernel: [32571790.564779] [<c1214593>] ? tcp_transmit_skb+0x595/0x5cc Sep 4 10:16:59 uriel kernel: [32571790.564785] [<c1005c4f>] ? xen_restore_fl_direct_end+0x0/0x1 Sep 4 10:16:59 uriel kernel: [32571790.564791] [<c12165ea>] ? tcp_write_xmit+0x7a3/0x874 Sep 4 10:16:59 uriel kernel: [32571790.564796] [<c121203a>] ? tcp_ack+0x1611/0x1802 Sep 4 10:16:59 uriel kernel: [32571790.564801] [<c10055ec>] ? xen_force_evtchn_callback+0xc/0x10 Sep 4 10:16:59 uriel kernel: [32571790.564806] [<c121392f>] ? tcp_established_options+0x1d/0x8b Sep 4 10:16:59 uriel kernel: [32571790.564811] [<c1213be4>] ? tcp_current_mss+0x38/0x53 Sep 4 10:16:59 uriel kernel: [32571790.564816] [<c1216701>] ? __tcp_push_pending_frames+0x1e/0x50 Sep 4 10:16:59 uriel kernel: [32571790.564821] [<c1212246>] ? tcp_data_snd_check+0x1b/0xd2 Sep 4 10:16:59 uriel kernel: [32571790.564825] [<c1212de3>] ? tcp_rcv_established+0x5d0/0x626 Sep 4 10:16:59 uriel kernel: [32571790.564831] [<c121902c>] ? tcp_v4_do_rcv+0x15f/0x2cf Sep 4 10:16:59 uriel kernel: [32571790.564835] [<c1219561>] ? tcp_v4_rcv+0x3c5/0x5c0 Sep 4 10:16:59 uriel kernel: [32571790.564841] [<c120197e>] ? ip_local_deliver_finish+0x10c/0x18c Sep 4 10:16:59 uriel kernel: [32571790.564846] [<c12015a4>] ? ip_rcv_finish+0x2c4/0x2d8 Sep 4 10:16:59 uriel kernel: [32571790.564852] [<c11e3b71>] ? netif_receive_skb+0x3bb/0x3d6 Sep 4 10:16:59 uriel kernel: [32571790.564864] [<ed823efc>] ? xennet_poll+0x9b8/0xafc [xen_netfront] Sep 4 10:16:59 uriel kernel: [32571790.564869] [<c11e40ee>] ? net_rx_action+0x96/0x194 Sep 4 10:16:59 uriel kernel: [32571790.564874] [<c103bd4c>] ? __do_softirq+0xaa/0x151 Sep 4 10:16:59 uriel kernel: [32571790.564878] [<c103be24>] ? do_softirq+0x31/0x3c Sep 4 10:16:59 uriel kernel: [32571790.564883] [<c103befa>] ? irq_exit+0x26/0x58 Sep 4 10:16:59 uriel kernel: [32571790.564890] [<c118ff9f>] ? xen_evtchn_do_upcall+0x12c/0x13e Sep 4 10:16:59 uriel kernel: [32571790.564896] [<c1008c3f>] ? xen_do_upcall+0x7/0xc Sep 4 10:16:59 uriel kernel: [32571790.564899] Mem-Info: Sep 4 10:16:59 uriel kernel: [32571790.564902] DMA per-cpu: Sep 4 10:16:59 uriel kernel: [32571790.564905] CPU 0: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564908] CPU 1: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564911] CPU 2: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564914] CPU 3: hi: 0, btch: 1 usd: 0 Sep 4 10:16:59 uriel kernel: [32571790.564916] Normal per-cpu: Sep 4 10:16:59 uriel kernel: [32571790.564919] CPU 0: hi: 186, btch: 31 usd: 175 Sep 4 10:16:59 uriel kernel: [32571790.564922] CPU 1: hi: 186, btch: 31 usd: 165 Sep 4 10:16:59 uriel kernel: [32571790.564925] CPU 2: hi: 186, btch: 31 usd: 30 Sep 4 10:16:59 uriel kernel: [32571790.564928] CPU 3: hi: 186, btch: 31 usd: 140 Sep 4 10:16:59 uriel kernel: [32571790.564931] HighMem per-cpu: Sep 4 10:16:59 uriel kernel: [32571790.564933] CPU 0: hi: 186, btch: 31 usd: 159 Sep 4 10:16:59 uriel kernel: [32571790.564936] CPU 1: hi: 186, btch: 31 usd: 22 Sep 4 10:16:59 uriel kernel: [32571790.564939] CPU 2: hi: 186, btch: 31 usd: 24 Sep 4 10:16:59 uriel kernel: [32571790.564942] CPU 3: hi: 186, btch: 31 usd: 13 Sep 4 10:16:59 uriel kernel: [32571790.564947] active_anon:485974 inactive_anon:121138 isolated_anon:0 Sep 4 10:16:59 uriel kernel: [32571790.564948] active_file:75215 inactive_file:79510 isolated_file:0 Sep 4 10:16:59 uriel kernel: [32571790.564949] unevictable:0 dirty:516 writeback:15 unstable:0 Sep 4 10:16:59 uriel kernel: [32571790.564950] free:230770 slab_reclaimable:36661 slab_unreclaimable:21249 Sep 4 10:16:59 uriel kernel: [32571790.564952] mapped:20016 shmem:29450 pagetables:5600 bounce:0 Sep 4 10:16:59 uriel kernel: [32571790.564958] DMA free:2884kB min:72kB low:88kB high:108kB active_anon:0kB inactive_anon:0kB active_file:5692kB inactive_file:724kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15872kB mlocked:0kB dirty:8kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:5112kB slab_unreclaimable:156kB kernel_stack:56kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 4 10:16:59 uriel kernel: [32571790.564964] lowmem_reserve[]: 0 698 4143 4143 Sep 4 10:16:59 uriel kernel: [32571790.564977] Normal free:143468kB min:3344kB low:4180kB high:5016kB active_anon:56kB inactive_anon:2068kB active_file:131812kB inactive_file:131728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:715256kB mlocked:0kB dirty:156kB writeback:0kB mapped:308kB shmem:4kB slab_reclaimable:141532kB slab_unreclaimable:84840kB kernel_stack:1928kB pagetables:22400kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 4 10:16:59 uriel kernel: [32571790.564983] lowmem_reserve[]: 0 0 27559 27559 Sep 4 10:16:59 uriel kernel: [32571790.564995] HighMem free:776728kB min:512kB low:4636kB high:8760kB active_anon:1943840kB inactive_anon:482484kB active_file:163356kB inactive_file:185588kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3527556kB mlocked:0kB dirty:1900kB writeback:60kB mapped:79756kB shmem:117796kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Sep 4 10:16:59 uriel kernel: [32571790.565001] lowmem_reserve[]: 0 0 0 0 Sep 4 10:16:59 uriel kernel: [32571790.565011] DMA: 385*4kB 16*8kB 3*16kB 9*32kB 6*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2900kB Sep 4 10:16:59 uriel kernel: [32571790.565032] Normal: 21505*4kB 6508*8kB 273*16kB 24*32kB 3*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 143412kB Sep 4 10:16:59 uriel kernel: [32571790.565054] HighMem: 949*4kB 8859*8kB 7063*16kB 6186*32kB 4631*64kB 727*128kB 6*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 776604kB Sep 4 10:16:59 uriel kernel: [32571790.565076] 198980 total pagecache pages Sep 4 10:16:59 uriel kernel: [32571790.565079] 14850 pages in swap cache Sep 4 10:16:59 uriel kernel: [32571790.565082] Swap cache stats: add 2556273, delete 2541423, find 82961339/83153719 Sep 4 10:16:59 uriel kernel: [32571790.565085] Free swap = 250592kB Sep 4 10:16:59 uriel kernel: [32571790.565087] Total swap = 385520kB Sep 4 10:16:59 uriel kernel: [32571790.575454] 1073152 pages RAM Sep 4 10:16:59 uriel kernel: [32571790.575458] 888834 pages HighMem Sep 4 10:16:59 uriel kernel: [32571790.575461] 11344 pages reserved Sep 4 10:16:59 uriel kernel: [32571790.575463] 1090481 pages shared Sep 4 10:16:59 uriel kernel: [32571790.575465] 737188 pages non-shared Now, I've no idea what this means. There's plenty of free memory; total used free shared buffers cached Mem: 4247232 3455904 791328 0 5348 736412 -/+ buffers/cache: 2714144 1533088 Swap: 385520 131004 254516 Though now I see the swap is relatively low in size, but would that matter? I've been starting to think about fragmentation, or inode usage on that large partition, but a recent fsck on it showed is as only like 0.5% fragmented. Which leaves me with inode usage, but how much of an effect really would a large inode table or filesystem TOC have? I've love to hear people's opinions on this. It's driving me potty! df -h output; Filesystem Size Used Avail Use% Mounted on /dev/xvda1 9.5G 6.6G 2.4G 74% / tmpfs 2.1G 0 2.1G 0% /lib/init/rw udev 10M 520K 9.5M 6% /dev tmpfs 2.1G 0 2.1G 0% /dev/shm /dev/xvdb 980G 921G 59G 94% /data

    Read the article

  • MySQL Memory usage

    - by Rob Stevenson-Leggett
    Our MySQL server seems to be using a lot of memory. I've tried looking for slow queries and queries with no index and have halved the peak CPU usage and Apache memory usage but the MySQL memory stays constantly at 2.2GB (~51% of available memory on the server). Here's the graph from Plesk. Running top in the SSH window shows the same figures. Does anyone have any ideas on why the memory usage is constant like this and not peaks and troughs with usage of the app? Here's the output of the MySQL Tuning Primer script: -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.0.77-log x86_64 Uptime = 1 days 14 hrs 4 min 21 sec Avg. qps = 22 Total Questions = 3059456 Threads Connected = 13 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1 sec. You have 6 out of 3059477 that take longer than 1 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html WORKER THREADS Current thread_cache_size = 0 Current threads_cached = 0 Current threads_per_sec = 2 Historic threads_per_sec = 0 Threads created per/sec are overrunning threads cached You should raise thread_cache_size MAX CONNECTIONS Current max_connections = 100 Current threads_connected = 14 Historic max_used_connections = 20 The number of used connections is 20% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS Current InnoDB index space = 6 M Current InnoDB data space = 18 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 8 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 2.07 G Configured Max Per-thread Buffers : 274 M Configured Max Global Buffers : 2.01 G Configured Max Memory Limit : 2.28 G Physical Memory : 3.84 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 4 M Current key_buffer_size = 7 M Key cache miss rate is 1 : 40 Key buffer free ratio = 81 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is supported but not enabled Perhaps you should set the query_cache_size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 132.00 K You have had 16 queries where a join could not use an index properly You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. If you are unable to optimize your queries you may want to increase your join_buffer_size to accommodate larger joins in one pass. Note! This script will still suggest raising the join_buffer_size when ANY joins not using indexes are found. OPEN FILES LIMIT Current open_files_limit = 1024 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_cache value = 64 tables You have a total of 426 tables You have 64 open tables. Current table_cache hit rate is 1% , while 100% of your table cache is in use You should probably increase your table_cache TEMP TABLES Current max_heap_table_size = 16 M Current tmp_table_size = 32 M Of 15134 temp tables, 9% were created on disk Effective in-memory tmp_table_size is limited to max_heap_table_size. Created disk tmp tables ratio seems fine TABLE SCANS Current read_buffer_size = 128 K Current table scan ratio = 2915 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 142213 Your table locking seems to be fine The app is a facebook game with about 50-100 concurrent users. Thanks, Rob

    Read the article

< Previous Page | 108 109 110 111 112 113 114 115 116 117 118 119  | Next Page >