Search Results

Search found 8687 results on 348 pages for 'per'.

Page 34/348 | < Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41  | Next Page >

  • Tips for maximizing Nginx requests/sec?

    - by linkedlinked
    I'm building an analytics package, and project requirements state that I need to support 1 billion hits per day. Yep, "billion". In other words, no less than 12,000 hits per second sustained, and preferably some room to burst. I know I'll need multiple servers for this, but I'm trying to get maximum performance out of each node before "throwing more hardware at it". Right now, I have the hits-tracking portion completed, and well optimized. I pretty much just save the requests straight into Redis (for later processing with Hadoop). The application is Python/Django with a gunicorn for the gateway. My 2GB Ubuntu 10.04 Rackspace server (not a production machine) can serve about 1200 static files per second (benchmarked using Apache AB against a single static asset). To compare, if I swap out the static file link with my tracking link, I still get about 600 requests per second -- I think this means my tracker is well optimized, because it's only a factor of 2 slower than serving static assets. However, when I benchmark with millions of hits, I notice a few things -- No disk usage -- this is expected, because I've turned off all Nginx logs, and my custom code doesn't do anything but save the request details into Redis. Non-constant memory usage -- Presumably due to Redis' memory managing, my memory usage will gradually climb up and then drop back down, but it's never once been my bottleneck. System load hovers around 2-4, the system is still responsive during even my heaviest benchmarks, and I can still manually view http://mysite.com/tracking/pixel with little visible delay while my (other) server performs 600 requests per second. If I run a short test, say 50,000 hits (takes about 2m), I get a steady, reliable 600 requests per second. If I run a longer test (tried up to 3.5m so far), my r/s degrades to about 250. My questions -- a. Does it look like I'm maxing out this server yet? Is 1,200/s static files nginx performance comparable to what others have experienced? b. Are there common nginx tunings for such high-volume applications? I have worker threads set to 64, and gunicorn worker threads set to 8, but tweaking these values doesn't seem to help or harm me much. c. Are there any linux-level settings that could be limiting my incoming connections? d. What could cause my performance to degrade to 250r/s on long-running tests? Again, the memory is not maxing out during these tests, and HDD use is nil. Thanks in advance, all :)

    Read the article

  • what is the main cause of 500 internal server error? [closed]

    - by Usman
    I want to know that I have hosted with a hosting company . My website gives "500 Internal server error many times" I have following Web server statistics :- Web Server Statistics Successful requests: 127,310 (7,504) Average successful requests per day: 814 (1,071) Successful requests for pages: 24,949 (1,309) Average successful requests for pages per day: 159 (186) Failed requests: 3,499 (58) Redirected requests: 10,091 (114) Distinct files requested: 5,791 (556) Distinct hosts served: 5,107 (330) Data transferred: 4.28 gigabytes (190.56 megabytes) Average data transferred per day: 28.03 megabytes (27.22 megabytes) Can you tell me my server condition by seeing this or i have to give another details. Thanks in advance

    Read the article

  • TCP and fair bandwidth sharing

    - by lxgr
    The congestion control algorithm(s) of TCP seem to distribute the available bandwidth fairly between individual TCP flows. Is there some way to enable (or more precisely, enforce) fair bandwidth sharing on a per-host instead of a per-flow basis on a router? There should not be an (easy) way for a user to gain a disproportional bandwidth share by using multiple concurrent TCP flows (the way some download managers and most P2P clients do). I'm currently running a DD-WRT router to share a residential DSL line, and currently it's possible to (inadvertently or maliciously) hog most of the bandwidth by using multiple concurrent connections, which affecty VoIP conversations badly. I've played with the QoS settings a bit, but I'm not sure how to enable fair bandwidth sharing on a per-IP basis (per-service is not an option, as most of the flows are HTTP).

    Read the article

  • Emulating old-school sprite flickering (theory and concept)

    - by Jeffrey Kern
    I'm trying to develop an oldschool NES-style video game, with sprite flickering and graphical slowdown. I've been thinking of what type of logic I should use to enable such effects. I have to consider the following restrictions if I want to go old-school NES style: No more than 64 sprites on the screen at a time No more than 8 sprites per scanline, or for each line on the Y axis If there is too much action going on the screen, the system freezes the image for a frame to let the processor catch up with the action From what I've read up, if there were more than 64 sprites on the screen, the developer would only draw high-priority sprites while ignoring low-priority ones. They could also alternate, drawing each even numbered sprite on opposite frames from odd numbered ones. The scanline issue is interesting. From my testing, it is impossible to get good speed on the XBOX 360 XNA framework by drawing sprites pixel-by-pixel, like the NES did. This is why in old-school games, if there were too many sprites on a single line, some would appear if they were cut in half. For all purposes for this project, I'm making scanlines be 8 pixels tall, and grouping the sprites together per scanline by their Y positioning. So, dumbed down I need to come up with a solution that.... 64 sprites on screen at once 8 sprites per 'scanline' Can draw sprites based on priority Can alternate between sprites per frame Emulate slowdown Here is my current theory First and foremost, a fundamental idea I came up with is addressing sprite priority. Assuming values between 0-255 (0 being low), I can assign sprites priority levels, for instance: 0 to 63 being low 63 to 127 being medium 128 to 191 being high 192 to 255 being maximum Within my data files, I can assign each sprite to be a certain priority. When the parent object is created, the sprite would randomly get assigned a number between its designated range. I would then draw sprites in order from high to low, with the end goal of drawing every sprite. Now, when a sprite gets drawn in a frame, I would then randomly generate it a new priority value within its initial priority level. However, if a sprite doesn't get drawn in a frame, I could add 32 to its current priority. For example, if the system can only draw sprites down to a priority level of 135, a sprite with an initial priority of 45 could then be drawn after 3 frames of not being drawn (45+32+32+32=141) This would, in theory, allow sprites to alternate frames, allow priority levels, and limit sprites to 64 per screen. Now, the interesting question is how do I limit sprites to only 8 per scanline? I'm thinking that if I'm sorting the sprites high-priority to low-priority, iterate through the loop until I've hit 64 sprites drawn. However, I shouldn't just take the first 64 sprites in the list. Before drawing each sprite, I could check to see how many sprites were drawn in it's respective scanline via counter variables . For example: Y-values between 0 to 7 belong to Scanline 0, scanlineCount[0] = 0 Y-values between 8 to 15 belong to Scanline 1, scanlineCount[1] = 0 etc. I could reset the values per scanline for every frame drawn. While going down the sprite list, add 1 to the scanline's respective counter if a sprite gets drawn in that scanline. If it equals 8, don't draw that sprite and go to the sprite with the next lowest priority. SLOWDOWN The last thing I need to do is emulate slowdown. My initial idea was that if I'm drawing 64 sprites per frame and there's still more sprites that need to be drawn, I could pause the rendering by 16ms or so. However, in the NES games I've played, sometimes there's slowdown if there's not any sprite flickering going on whereas the game moves beautifully even if there is some sprite flickering. Perhaps give a value to each object that uses sprites on the screen (like the priority values above), and if the combined values of all objects w/ sprites surpass a threshold, introduce the sprite flickering? IN CONCLUSION... Does everything I wrote actually sound legitimate and could work, or is it a pipe dream? What improvements can you all possibly think with this game programming theory of mine?

    Read the article

  • Ubuntu 12.04 on Amazon EC2: /dev/xvda1 will be checked for errors at next reboot?

    - by cwd
    I'm running the lastest Ubuntu 12.04 AMI (ami-a29943cb) from Canonical on Amazon EC2 and quite often when I log in I get the message: *** /dev/xvda1 will be checked for errors at next reboot *** I have read a bunch of documentation on this and seem to understand that every so many reboots (around 37 see Mount count / Maximum mount count below) Ubuntu wants to check a disk for errors. I can see that by using dumpe2fs -h /dev/xvda1 (reference) to get information such as: Last mounted on: / Filesystem UUID: 1ad27d06-4ecf-493d-bb19-4710c3caf924 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 524288 Block count: 2097152 Reserved block count: 104857 Free blocks: 1778055 Free inodes: 482659 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 511 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Tue Apr 24 03:07:48 2012 Last mount time: Thu Nov 8 03:17:58 2012 Last write time: Tue Apr 24 03:08:52 2012 Mount count: 3 Maximum mount count: 37 Last checked: Tue Apr 24 03:07:48 2012 Check interval: 15552000 (6 months) Next check after: Sun Oct 21 03:07:48 2012 Lifetime writes: 2454 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 0a25e04c-6169-4d68-bfa6-a1acd8e39632 Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 128M Journal length: 32768 Journal sequence: 0x0000158b Journal start: 1 I've tried these things to get rid of the message and usually the badblocks is what does it for me: Run this command and reboot: sudo touch /forcefsck Run badblocks to check the disk: badblocks /dev/sda1 Edit /etc/fstab and change the last "0" which is the fs_passno column accordingly and then reboot: The root filesystem should be specified with a fs_passno of 1, and other filesystems should have a fs_passno of 2. I don't understand: If this is a virtual drive shouldn't it be less prone to errors? Was the image created with one of the flags set? If not what is triggering it? Why is fs_passno set to 0 on Amazon EC2 Ubuntu images? This is not the first one that is like this.

    Read the article

  • Site too large to officially use Google Analytics?

    - by Jeff Atwood
    We just got this email from the Google Analytics team: We love that you love our product and use it as much as you do. We have observed however, that a website you are tracking with Google Analytics is sending over 1 million hits per day to Google Analytics servers. This is well above the "5 million pageviews per month per account" limit specified in the Google Analytics Terms of Service. Processing this amount of data multiple times a day takes up valuable resources that enable us to continue to develop the product for all Google Analytics users. Processing this amount of data multiple times a day takes up valuable resources that enable us to continue to develop the product for all Google Analytics users. As such, starting August 23rd, 2010, the metrics in your reports will be updated once a day, as opposed to multiple times during the course of the day. You will continue to receive all the reports and features in Google Analytics as usual. The only change will be that data for a given day will appear the following day. We trust you understand the reasons for this change. I totally respect this decision, and I think it's very generous to not kick us out. But how do we do this the right way -- what's the official, blessed Google way to use Google Analytics if you're a "whale" website with lots of hits per day? Or, are there other analytics services that would be more appropriate for very large websites?

    Read the article

  • Oracle Partner Network Specialized

    - by luca.maghernino(at)oracle.com
    Eventi specialized Eventi di specializzazione Il prezzo a listino del training è di 2.700 euro a partecipante. Per i nostri Partner che aderiscono a questa iniziativa il costo è di 700 euro per partecipante. Il numero massimo di partecipanti per ciascuna sessione è di 15 persone. Per iscriverti clicca sulla data di tuo interesse: Codice Corso Data Location D64292GC10 OPN Oracle BI EE 10.1.3 Implementation Boot Camp Ed 1 (5 gg) 28 febbraio Milano D50102GC10 Oracle Database 11g: Workshop di amministrazione Ed 2 PRV 21 marzo -- 21 marzo Empoli D64735GC10 OPN Oracle ECM 10g R3 Implementation Boot Camp Ed 1 PRV (3 gg) 28 marzo Milano D50317GC20 Oracle Database 11g: Performance Tuning Ed 1 PRV (5 gg) 4 aprile -- 4 aprile Milano D53946GC10 Oracle SOA Suite 11g: Build Composite Applications (5 gg) 18 aprile Milano D50081GC20 Oracle Database 11g: New Features for Administrators DBA Release 2 (5 gg) * 09 maggio Milano *Oracle Database 11g: New Features for Administrators DBA Release 2: questo corso si rivoge ad amministratori di database in possesso della certificazione Orale Certified Professional 10g che desiderano effettuare l'upgrade al livello Oracle Certified Professional 11g ed è propedeutico al superamento dell'esame 1Z0_050 Oracle Database 11g: New Features for Administrators oppure ad amministratori di database che hanno una buona conoscenza della versione 10g e desiderano aggiornare le proprie competenze alla release 11g.

    Read the article

  • Quoting people for website dev. work

    - by Jason
    Hi All, I have recently given some quotes to a few people. And I need some advice about how things should be done... Q1: I've seen, heard of and read about a lot of developers using free resource sites online to obtain free Privacy Policy, Disclaimers etc for their/customers websites. A customer I quoted the other day expected me to write/get a disclaimer for their site. Who in their right mind would expect a document like that from a Web Developer? I just told them that they need to sort that stuff out themselves with a Lawyer or something, and then to send it to me so I can paste it on a webpage for them. Q2: If you're charging per hour, and you estimate that the project would take 1week to finish (including testing/releasing), but you soon realise that you'll require more time, do you RE-quote them? Or do you just finish off the site at the original quote price? Q3: How do you figure out how much you will charge your customers? Do you charge per-feature, or per hour, or per day, or all of the above? Thanks :)

    Read the article

  • How to handle growing QA reporting requirements?

    - by Phillip Jackson
    Some Background: Our company is growing very quickly - in 3 years we've tripled in size and there are no signs of stopping any time soon. Our marketing department has expanded and our IT requirements have as well. When I first arrived everything was managed in Dreamweaver and Excel spreadsheets and we've worked hard to implement bug tracking, version control, continuous integration, and multi-stage deployment. It's been a long hard road, and now we need to get more organized. The Problem at Hand: Management would like to track, per-developer, who is generating the most issues at the QA stage (post unit testing, regression, and post-production issues specifically). This becomes a fine balance because many issues can't be reported granularly (e.g. per-url or per-"page") but yet that's how Management would like reporting to be broken down. Further, severity has to be taken into account. We have drafted standards for each of these areas specific to our environment. Developers don't want to be nicked for 100+ instances of an issue if it was a problem with an include or inheritance... I had a suggestion to "score" bugs based on severity... but nobody likes that. We can't enter issues for every individual module affected by a global issue. [UPDATED] The Actual Questions: How do medium sized businesses and code shops handle bug tracking, reporting, and providing useful metrics to management? What kinds of KPIs are better metrics for employee performance? What is the most common way to provide per-developer reporting as far as time-to-close, reopens, etc.? Do large enterprises ignore the efforts of the individuals and rather focus on the team? Some other questions: Is this too granular of reporting? Is this considered 'blame culture'? If you were the developer working in this environment, what would you define as a measureable goal for this year to track your progress, with the reward of achieving the goal a bonus?

    Read the article

  • Algorithm to figure out appointment times?

    - by Rachel
    I have a weird situation where a client would like a script that automatically sets up thousands of appointments over several days. The tricky part is the appointments are for a variety of US time zones, and I need to take the consumer's local time zone into account when generating appointment dates and times for each record. Appointment Rules: Appointments should be set from 8AM to 8PM Eastern Standard Time, with breaks from 12P-2P and 4P-6P. This leaves a total of 8 hours per day available for setting appointments. Appointments should be scheduled 5 minutes apart. 8 hours of 5-minute intervals means 96 appointments per day. There will be 5 users at a time handling appointments. 96 appointments per day multiplied by 5 users equals 480, so the maximum number of appointments that can be set per day is 480. Now the tricky requirement: Appointments are restricted to 8am to 8pm in the consumer's local time zone. This means that the earliest time allowed for each appointment is different depending on the consumer's time zone: Eastern: 8A Central: 9A Mountain: 10A Pacific: 11A Alaska: 12P Hawaii or Undefined: 2P Arizona: 10A or 11A based on current Daylight Savings Time Assuming a data set can be several thousand records, and each record will contain a timezone value, is there an algorithm I could use to determine a Date and Time for every record that matches the rules above?

    Read the article

  • Why is my root filesystem always scanned at boot?

    - by luri
    I always have a pause at boot saying my filesystems are being checked (with a "press C to cancel" note, too). Actually (seeing boot.log) I think it's the / fs, which is located at /dev/sdb5 Several questions altoghether, here (hope this does not break any rule): Is this normal? Can I (or even should I) prevent this anyhow? According to boot.log (below) the fs does not seem to be 'clean', or, at least, it's in an state or condition that makes fsck always can it for errors for a while (just a few seconds). How can I fix it? Edit: This is my boot.log: fsck desde util-linux-ng 2.17.2 udevd[515]: can not read '/etc/udev/rules.d/z80_user.rules' /dev/sdb5: 249045/32841728 ficheros (0.3% no contiguos), 20488485/131338752 bloques init: ureadahead-other main process (1111) terminated with status 4 init: ureadahead-other main process (1116) terminated with status 4 Password: * Starting AppArmor profiles [160G Skipping profile in /etc/apparmor.d/disable: usr.bin.firefox [154G[ OK ] * Setting sensors limits [160G [154G[ OK ] And this is dumpe2fs results for the filesystem being checked (well, the relevant part of the log): Filesystem volume name: <none> Last mounted on: / Filesystem UUID: 42509bf9-f3e6-460a-8947-ec0f5c1fbcc8 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 32841728 Block count: 131338752 Reserved block count: 6566937 Free blocks: 110850356 Free inodes: 32592701 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 992 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Fri Dec 10 19:44:15 2010 Last mount time: Mon Feb 14 17:00:02 2011 Last write time: Mon Feb 14 16:59:45 2011 Mount count: 1 Maximum mount count: 33 Last checked: Mon Feb 14 16:59:45 2011 Check interval: 15552000 (6 months) Next check after: Sat Aug 13 17:59:45 2011 Lifetime writes: 331 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 First orphan inode: 28049496 Default directory hash: half_md4 Directory Hash Seed: d3d24459-514b-4413-b840-e970b766095b Journal backup: inode blocks Journal features: journal_incompat_revoke Tamaño de fichero de transacciones: 128M Journal length: 32768 Journal sequence: 0x0005e0c4 Journal start: 1 This is the relevant (at least I think this is the fs being checked) line in fstab: #Entry for /dev/sdb5 : UUID=42509bf9-f3e6-460a-8947-ec0f5c1fbcc8 / ext4 errors=remount-ro 0 1

    Read the article

  • Red Samurai Performance Audit Tool – OOW 2013 release (v 1.1)

    - by JuergenKress
    We are running our Red Samurai Performance Audit tool and monitoring ADF performance in various projects already for about one year and the half. It helps us a lot to understand ADF performance bottlenecks and tune slow ADF BC View Objects or optimise large ADF BC fetches from DB. There is special update implemented for OOW'13 - advanced ADF BC statistics are collected directly from your application ADF BC runtime and later displayed as graphical information in the dashboard. I will be attending OOW'13 in San Francisco, feel free to stop me and ask about this tool - I will be happy to give it away and explain how to use it in your project. Original audit screen with ADF BC performance issues, this is part of our Audit console application: Audit console v1.1 is improved with one more tab - Statistics. This tab displays all SQL Selects statements produced by ADF BC over time, logged users, AM access load distribution and number of AM activations along with user sessions. Available graphs: Daily Queries  - total number of SQL selects per day Hourly Queries - Last 48 Hours Logged Users - total number of user sessions per day SQL Selects per Application Module - workload per Application Module Number of Activations and User sessions - last 48 hours - displays stress load Read the complete article here. WebLogic Partner Community For regular information become a member in the WebLogic Partner Community please visit: http://www.oracle.com/partners/goto/wls-emea ( OPN account required). If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Mix Forum Wiki Technorati Tags: Red Samurai,ADF performance,WebLogic,WebLogic Community,Oracle,OPN,Jürgen Kress

    Read the article

  • Maintaining a main project line with satellite projects

    - by NickLarsen
    Some projects I work on have a main line of features, but are customizable per customer. Up until now those customizations have been implemented as preferences, but now there are 2 problems with the system... The settings page is getting out of control with features. There are probably some improvements that could be made to the settings UI, but regardless, it is quite cumbersome setting up new instances for new customers. Customers have started asking for customizations which would be more easily maintained as separate threads instead of having tons of customizations code. Optimally I am envisioning some kind of source control in which features are either in the main project line and customizations per customer are maintained in a repo per customer set up. The customizations per project would need to remain separate but if a bug is found and fixed in a particular project, I would need to percolate the fix back to the main line and into all of the other customer repos. The problem is I have never seen this done before, and before spending time trying to find source control that can accommodate this scenario and implement it, I figure it best to ask if anyone has something less complicated or knows of a source control product which can handle this with very little hair pulling.

    Read the article

  • Frame Independent Movement

    - by ShrimpCrackers
    I've read two other threads here on movement: Time based movement Vs Frame rate based movement?, and Fixed time step vs Variable time step but I think I'm lacking a basic understanding of frame independent movement because I don't understand what either of those threads are talking about. I'm following along with lazyfoo's SDL tutorials and came upon the frame independent lesson. http://lazyfoo.net/SDL_tutorials/lesson32/index.php I'm not sure what the movement part of the code is trying to say but I think it's this (please correct me if I'm wrong): In order to have frame independent movement, we need to find out how far an object (ex. sprite) moves within a certain time frame, for example 1 second. If the dot moves at 200 pixels per second, then I need to calculate how much it moves within that second by multiplying 200 pps by 1/1000 of a second. Is that right? The lesson says: "velocity in pixels per second * time since last frame in seconds. So if the program runs at 200 frames per second: 200 pps * 1/200 seconds = 1 pixel" But...I thought we were multiplying 200 pps by 1/1000th of a second. What is this business with frames per second? I'd appreciate if someone could give me a little bit more detailed explanation as to how frame independent movement works. Thank you.

    Read the article

  • Working as a contractor--questions part II

    - by universe_hacker
    This is a follow-up to this discussion: Working as a contractor--questions I still would like to get more info on the following points, when working as a contractor as opposed to direct employee: Housing: for short-term contracts (let's say 6 months or less) that are not in your home area, where and how do you search for short-term, flexible housing? Especially since an employer would typically want you to start immediately, so you don't have the time to go out and explore. Also, would you typically look for furnished apartments because the cost transporting your own furniture for a few months is not justified? Work hours and pay: are contractors more strictly supervised (as far as getting specified work done) because they get paid by the hour? There are also supposed to get overtime pay (at a higher rate) if they work more than 40 hours per week, does this really happen? Or do they work unpaid extra hours just like many regular employees? Some potential employers have mentioned paying the "per diem", which is essentially a non-taxable daily allowance, which is supposed to be used for living expenses. This money gets subtracted from you per-hour rate, but the advantage is that you pay less tax. However, from the information I have seen, the per diem can only be paid if you maintain a "permanent" residence you intend to return to. Is this checked in practice, and if yes, how?

    Read the article

  • New SPC2 benchmark- The 7420 KILLS it !!!

    - by user12620172
    This is pretty sweet. The new SPC2 benchmark came out last week, and the 7420 not only came in 2nd of ALL speed scores, but came in #1 for price per MBPS. Check out this table. The 7420 score of 10,704 makes it really fast, but that's not the best part. The price one would have to pay in order to beat it is ridiculous. You can go see for yourself at http://www.storageperformance.org/results/benchmark_results_spc2The only system on the whole page that beats it was over twice the price per MBPS. Very sweet for Oracle. So let's see, the 7420 is the fastest per $. The 7420 is the cheapest per MBPS. The 7420 has incredible, built-in features, management services, analytics, and protocols. It's extremely stable and as a cluster has no single point of failure. It won the Storage Magazine award for best NAS system this year. So how long will it be before it's the number 1 NAS system in the market? What are the biggest hurdles still stopping the widespread adoption of the ZFSSA? From what I see, it's three things: 1. Administrator's comfort level with older legacy systems. 2. Politics 3. Past issues with Oracle Support.   I see all of these issues crop up regularly. Number 1 just takes time and education. Number 3 takes time with our new, better, and growing support team. many of them came from Oracle and there were growing pains when they went from a straight software-model to having to also support hardware. Number 2 is tricky, but it's the job of the sales teams to break through the internal politics and help their clients see the value in oracle hardware systems. Benchmarks like this will help.

    Read the article

  • Single CAS web application in a cluster

    - by Dolf Dijkstra
    Recently a customer wanted to set up a cluster of CAS nodes to be used together with WebCenter Sites. In the process of setting this up they realized that they needed to create a web application per managed server. They did not want to have this management burden but would like to have one web application deployed to multiple nodes. The reason that there is a need for a unique application per node is that the web-application contains information that needs to be unique per node, the postfix for the ticket id.  My customer would like to externalize the node specific configuration to either a specific classpath per managed server or to system properties set at startup.It turns out that the postfix for ticket ids is managed through a property host.name and that this property can be externalized.The host.name property is used in: /webapps/cas/WEB-INF/spring-configuration/uniqueIdGenerators.xmlIt is set in /webapps/cas/WEB-INF/spring-configuration/propertyFileConfigurer.xmlin a PropertyPlaceholderConfigurer.The documentation for PropertyPlaceholderConfigurer:http://static.springsource.org/spring/docs/2.0.x/api/org/springframework/beans/factory/config/PropertyPlaceholderConfigurer.htmlThis indicates that the properties defined through the PropertyPlaceHolderConfigurer can be externalized.To enable this externalization you would need to change host.properties so it is generic for all the managed servers and thus can be reused for all the managed servers: host.name=${cluster.node.id}Next step is to change the startup scripts for the managed servers and add a system property for -Dcluster.node.id=<something unique and stable>.Viola, the postfix is externalized and the web application can be shared amongst the cluster nodes.

    Read the article

  • General monitoring for SQL Server Analysis Services using Performance Monitor

    - by Testas
    A recent customer engagement required a setup of a monitoring solution for SSAS, due to the time restrictions placed upon this, native Windows Performance Monitor (Perfmon) and SQL Server Profiler Monitoring Tools was used as using a third party tool would have meant the customer providing an additional monitoring server that was not available.I wanted to outline the performance monitoring counters that was used to monitor the system on which SSAS was running. Due to the slow query performance that was occurring during certain scenarios, perfmon was used to establish if any pressure was being placed on the Disk, CPU or Memory subsystem when concurrent connections access the same query, and Profiler to pinpoint how the query was being managed within SSAS, profiler I will leave for another blogThis guide is not designed to provide a definitive list of what should be used when monitoring SSAS, different situations may require the addition or removal of counters as presented by the situation. However I hope that it serves as a good basis for starting your monitoring of SSAS. I would also like to acknowledge Chris Webb’s awesome chapters from “Expert Cube Development” that also helped shape my monitoring strategy:http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!6657.entrySimulating ConnectionsTo simulate the additional connections to the SSAS server whilst monitoring, I used ascmd to simulate multiple connections to the typical and worse performing queries that were identified by the customer. A similar sript can be downloaded from codeplex at http://www.codeplex.com/SQLSrvAnalysisSrvcs.     File name: ASCMD_StressTestingScripts.zip. Performance MonitorWithin performance monitor,  a counter log was created that contained the list of counters below. The important point to note when running the counter log is that the RUN AS property within the counter log properties should be changed to an account that has rights to the SSAS instance when monitoring MSAS counters. Failure to do so means that the counter log runs under the system account, no errors or warning are given while running the counter log, and it is not until you need to view the MSAS counters that they will not be displayed if run under the default account that has no right to SSAS. If your connection simulation takes hours, this could prove quite frustrating if not done beforehand JThe counters used……  Object Counter Instance Justification System Processor Queue legnth N/A Indicates how many threads are waiting for execution against the processor. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine's processors are able to handle. System Context Switches/sec N/A Measures how frequently the processor has to switch from user- to kernel-mode to handle a request from a thread running in user mode. The heavier the workload running on your machine, the higher this counter will generally be, but over long term the value of this counter should remain fairly constant. If this counter suddenly starts increasing however, it may be an indicating of a malfunctioning device, especially if the Processor\Interrupts/sec\(_Total) counter on your machine shows a similar unexplained increase Process % Processor Time sqlservr Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process % Processor Time msmdsrv Definately should be used if Processor\% Processor Time\(_Total) is maxing at 100% to assess the effect of the SQL Server process on the processor Process Working Set sqlservr If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Process Working Set msmdsrv If the Memory\Available bytes counter is decreaing this counter can be run to indicate if the process is consuming larger and larger amounts of RAM. Process(instance)\Working Set measures the size of the working set for each process, which indicates the number of allocated pages the process can address without generating a page fault. Processor % Processor Time _Total and individual cores measures the total utilization of your processor by all running processes. If multi-proc then be mindful only an average is provided Processor % Privileged Time _Total To see how the OS is handling basic IO requests. If kernel mode utilization is high, your machine is likely underpowered as it's too busy handling basic OS housekeeping functions to be able to effectively run other applications. Processor % User Time _Total To see how the applications is interacting from a processor perspective, a high percentage utilisation determine that the server is dealing with too many apps and may require increasing thje hardware or scaling out Processor Interrupts/sec _Total  The average rate, in incidents per second, at which the processor received and serviced hardware interrupts. Shoulr be consistant over time but a sudden unexplained increase could indicate a device malfunction which can be confirmed using the System\Context Switches/sec counter Memory Pages/sec N/A Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays, this is the primary counter to watch for indication of possible insufficient RAM to meet your server's needs. A good idea here is to configure a perfmon alert that triggers when the number of pages per second exceeds 50 per paging disk on your system. May also want to see the configuration of the page file on the Server Memory Available Mbytes N/A is the amount of physical memory, in bytes, available to processes running on the computer. if this counter is greater than 10% of the actual RAM in your machine then you probably have more than enough RAM. monitor it regularly to see if any downward trend develops, and set an alert to trigger if it drops below 2% of the installed RAM. Physical Disk Disk Transfers/sec for each physical disk If it goes above 10 disk I/Os per second then you've got poor response time for your disk. Physical Disk Idle Time _total If Disk Transfers/sec is above  25 disk I/Os per second use this counter. which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. Physical Disk Disk queue legnth For the OLAP and SQL physical disk A value that is consistently less than 2 means that the disk system is handling the IO requests against the physical disk Network Interface Bytes Total/sec For the NIC Should be monitored over a period of time to see if there is anb increase/decrease in network utilisation Network Interface Current Bandwidth For the NIC is an estimate of the current bandwidth of the network interface in bits per second (BPS). MSAS 2005: Memory Memory Limit High KB N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Limit Low KB N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini MSAS 2005: Memory Memory Usage KB N/A Displays the memory usage of the server process. MSAS 2005: Memory File Store KB N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache MSAS 2005: Storage Engine Query Queries from Cache Direct / sec N/A Displays the rate of queries answered from the cache directly MSAS 2005: Storage Engine Query Queries from Cache Filtered / Sec N/A Displays the Rate of queries answered by filtering existing cache entry. MSAS 2005: Storage Engine Query Queries from File / Sec N/A Displays the Rate of queries answered from files. MSAS 2005: Storage Engine Query Average time /query N/A Displays the average time of a query MSAS 2005: Connection Current connections N/A Displays the number of connections against the SSAS instance MSAS 2005: Connection Requests / sec N/A Displays the rate of query requests per second MSAS 2005: Locks Current Lock Waits N/A Displays thhe number of connections waiting on a lock MSAS 2005: Threads Query Pool job queue Length N/A The number of queries in the job queue MSAS 2005:Proc Aggregations Temp file bytes written/sec N/A Shows the number of bytes of data processed in a temporary file MSAS 2005:Proc Aggregations Temp file rows written/sec N/A Shows the number of bytes of data processed in a temporary file 

    Read the article

  • MySQL Cluster 7.2: Over 8x Higher Performance than Cluster 7.1

    - by Mat Keep
    0 0 1 893 5092 Homework 42 11 5974 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-ansi-language:EN-US;} Summary The scalability enhancements delivered by extensions to multi-threaded data nodes enables MySQL Cluster 7.2 to deliver over 8x higher performance than the previous MySQL Cluster 7.1 release on a recent benchmark What’s New in MySQL Cluster 7.2 MySQL Cluster 7.2 was released as GA (Generally Available) in February 2012, delivering many enhancements to performance on complex queries, new NoSQL Key / Value API, cross-data center replication and ease-of-use. These enhancements are summarized in the Figure below, and detailed in the MySQL Cluster New Features whitepaper Figure 1: Next Generation Web Services, Cross Data Center Replication and Ease-of-Use Once of the key enhancements delivered in MySQL Cluster 7.2 is extensions made to the multi-threading processes of the data nodes. Multi-Threaded Data Node Extensions The MySQL Cluster 7.2 data node is now functionally divided into seven thread types: 1) Local Data Manager threads (ldm). Note – these are sometimes also called LQH threads. 2) Transaction Coordinator threads (tc) 3) Asynchronous Replication threads (rep) 4) Schema Management threads (main) 5) Network receiver threads (recv) 6) Network send threads (send) 7) IO threads Each of these thread types are discussed in more detail below. MySQL Cluster 7.2 increases the maximum number of LDM threads from 4 to 16. The LDM contains the actual data, which means that when using 16 threads the data is more heavily partitioned (this is automatic in MySQL Cluster). Each LDM thread maintains its own set of data partitions, index partitions and REDO log. The number of LDM partitions per data node is not dynamically configurable, but it is possible, however, to map more than one partition onto each LDM thread, providing flexibility in modifying the number of LDM threads. The TC domain stores the state of in-flight transactions. This means that every new transaction can easily be assigned to a new TC thread. Testing has shown that in most cases 1 TC thread per 2 LDM threads is sufficient, and in many cases even 1 TC thread per 4 LDM threads is also acceptable. Testing also demonstrated that in some instances where the workload needed to sustain very high update loads it is necessary to configure 3 to 4 TC threads per 4 LDM threads. In the previous MySQL Cluster 7.1 release, only one TC thread was available. This limit has been increased to 16 TC threads in MySQL Cluster 7.2. The TC domain also manages the Adaptive Query Localization functionality introduced in MySQL Cluster 7.2 that significantly enhanced complex query performance by pushing JOIN operations down to the data nodes. Asynchronous Replication was separated into its own thread with the release of MySQL Cluster 7.1, and has not been modified in the latest 7.2 release. To scale the number of TC threads, it was necessary to separate the Schema Management domain from the TC domain. The schema management thread has little load, so is implemented with a single thread. The Network receiver domain was bound to 1 thread in MySQL Cluster 7.1. With the increase of threads in MySQL Cluster 7.2 it is also necessary to increase the number of recv threads to 8. This enables each receive thread to service one or more sockets used to communicate with other nodes the Cluster. The Network send thread is a new thread type introduced in MySQL Cluster 7.2. Previously other threads handled the sending operations themselves, which can provide for lower latency. To achieve highest throughput however, it has been necessary to create dedicated send threads, of which 8 can be configured. It is still possible to configure MySQL Cluster 7.2 to a legacy mode that does not use any of the send threads – useful for those workloads that are most sensitive to latency. The IO Thread is the final thread type and there have been no changes to this domain in MySQL Cluster 7.2. Multiple IO threads were already available, which could be configured to either one thread per open file, or to a fixed number of IO threads that handle the IO traffic. Except when using compression on disk, the IO threads typically have a very light load. Benchmarking the Scalability Enhancements The scalability enhancements discussed above have made it possible to scale CPU usage of each data node to more than 5x of that possible in MySQL Cluster 7.1. In addition, a number of bottlenecks have been removed, making it possible to scale data node performance by even more than 5x. Figure 2: MySQL Cluster 7.2 Delivers 8.4x Higher Performance than 7.1 The flexAsynch benchmark was used to compare MySQL Cluster 7.2 performance to 7.1 across an 8-node Intel Xeon x5670-based cluster of dual socket commodity servers (6 cores each). As the results demonstrate, MySQL Cluster 7.2 delivers over 8x higher performance per data nodes than MySQL Cluster 7.1. More details of this and other benchmarks will be published in a new whitepaper – coming soon, so stay tuned! In a following blog post, I’ll provide recommendations on optimum thread configurations for different types of server processor. You can also learn more from the Best Practices Guide to Optimizing Performance of MySQL Cluster Conclusion MySQL Cluster has achieved a range of impressive benchmark results, and set in context with the previous 7.1 release, is able to deliver over 8x higher performance per node. As a result, the multi-threaded data node extensions not only serve to increase performance of MySQL Cluster, they also enable users to achieve significantly improved levels of utilization from current and future generations of massively multi-core, multi-thread processor designs.

    Read the article

  • OpenGL ES 2.0: Using VBOs?

    - by Bunkai.Satori
    OpenGL VBOs (vertex buffer objects) have been developed to improve performance of OpenGL (OpenGL ES 2.0 in my case). The logic is that with the help of VBOs, the data does not need to be copied from client memory to graphics card on per frame basis. However, as I see it, the game scene changes continuously: position of objects change, their scaling and rotating change, they get animated, they explode, they get spawn or disappear. In such highly dynamic environment, such as computer game scene is, what is the point of using VBOs, if the VBOs would need to be constructed on per-frame basis anyway? Can you please help me to understand how to practically take beneif of VBOs in computer games? Can there be more vertex based VBOs (say one per one object) or there must be always exactly only one VBO present for each draw cycle?

    Read the article

  • Geometry instancing in OpenGL ES 2.0

    - by seahorse
    I am planning to do geometry instancing in OpenGL ES 2.0 Basically I plan to render the same geometry(a chair) maybe 1000 times in my scene. What is the best way to do this in OpenGL ES 2.0? I am considering passing model view mat4 as an attribute. Since attributes are per vertex data do I need to pass this same mat4, three times for each vertex of the same triangle(since modelview remains constant across vertices of the triangle). That would amount to a lot of extra data sent to the GPU( 2 extra vertices*16 floats*(Number of triangles) amount of extra data). Or should I be sending the mat4 only once per triangle?But how is that possible using attributes since attributes are defined as "per vertex" data? What is the best and efficient way to do instancing in OpenGL ES 2.0?

    Read the article

  • Something other than Vertex Welding with Texture Atlas?

    - by Tim Winter
    What options (in C# with XNA) would there be for texture usage in a procedural generated 3D world made of cubes to increase performance? Yes, it's like Minecraft. I've been doing a texture atlas and rendering faces individually (4 vertices per face), but I've also read in a couple places about using texture wrapping with two 1D atlases to merge adjacent faces with the same texture. If two or more adjacent faces share the same image, it'd be quite easy to wrap in this way reducing vertices by a large amount. My problem with this is having too many textures, swapping too often, and many image related things like non-power of 2 images. Is there a middle ground option between the 1D texture atlas trick and rendering 4 vertices per cube face? This is a picture of what I have currently (in wireframe). 4 vertices per face seems extremely inefficient to me.

    Read the article

  • PTLQueue : a scalable bounded-capacity MPMC queue

    - by Dave
    Title: Fast concurrent MPMC queue -- I've used the following concurrent queue algorithm enough that it warrants a blog entry. I'll sketch out the design of a fast and scalable multiple-producer multiple-consumer (MPSC) concurrent queue called PTLQueue. The queue has bounded capacity and is implemented via a circular array. Bounded capacity can be a useful property if there's a mismatch between producer rates and consumer rates where an unbounded queue might otherwise result in excessive memory consumption by virtue of the container nodes that -- in some queue implementations -- are used to hold values. A bounded-capacity queue can provide flow control between components. Beware, however, that bounded collections can also result in resource deadlock if abused. The put() and take() operators are partial and wait for the collection to become non-full or non-empty, respectively. Put() and take() do not allocate memory, and are not vulnerable to the ABA pathologies. The PTLQueue algorithm can be implemented equally well in C/C++ and Java. Partial operators are often more convenient than total methods. In many use cases if the preconditions aren't met, there's nothing else useful the thread can do, so it may as well wait via a partial method. An exception is in the case of work-stealing queues where a thief might scan a set of queues from which it could potentially steal. Total methods return ASAP with a success-failure indication. (It's tempting to describe a queue or API as blocking or non-blocking instead of partial or total, but non-blocking is already an overloaded concurrency term. Perhaps waiting/non-waiting or patient/impatient might be better terms). It's also trivial to construct partial operators by busy-waiting via total operators, but such constructs may be less efficient than an operator explicitly and intentionally designed to wait. A PTLQueue instance contains an array of slots, where each slot has volatile Turn and MailBox fields. The array has power-of-two length allowing mod/div operations to be replaced by masking. We assume sensible padding and alignment to reduce the impact of false sharing. (On x86 I recommend 128-byte alignment and padding because of the adjacent-sector prefetch facility). Each queue also has PutCursor and TakeCursor cursor variables, each of which should be sequestered as the sole occupant of a cache line or sector. You can opt to use 64-bit integers if concerned about wrap-around aliasing in the cursor variables. Put(null) is considered illegal, but the caller or implementation can easily check for and convert null to a distinguished non-null proxy value if null happens to be a value you'd like to pass. Take() will accordingly convert the proxy value back to null. An advantage of PTLQueue is that you can use atomic fetch-and-increment for the partial methods. We initialize each slot at index I with (Turn=I, MailBox=null). Both cursors are initially 0. All shared variables are considered "volatile" and atomics such as CAS and AtomicFetchAndIncrement are presumed to have bidirectional fence semantics. Finally T is the templated type. I've sketched out a total tryTake() method below that allows the caller to poll the queue. tryPut() has an analogous construction. Zebra stripping : alternating row colors for nice-looking code listings. See also google code "prettify" : https://code.google.com/p/google-code-prettify/ Prettify is a javascript module that yields the HTML/CSS/JS equivalent of pretty-print. -- pre:nth-child(odd) { background-color:#ff0000; } pre:nth-child(even) { background-color:#0000ff; } border-left: 11px solid #ccc; margin: 1.7em 0 1.7em 0.3em; background-color:#BFB; font-size:12px; line-height:65%; " // PTLQueue : Put(v) : // producer : partial method - waits as necessary assert v != null assert Mask = 1 && (Mask & (Mask+1)) == 0 // Document invariants // doorway step // Obtain a sequence number -- ticket // As a practical concern the ticket value is temporally unique // The ticket also identifies and selects a slot auto tkt = AtomicFetchIncrement (&PutCursor, 1) slot * s = &Slots[tkt & Mask] // waiting phase : // wait for slot's generation to match the tkt value assigned to this put() invocation. // The "generation" is implicitly encoded as the upper bits in the cursor // above those used to specify the index : tkt div (Mask+1) // The generation serves as an epoch number to identify a cohort of threads // accessing disjoint slots while s-Turn != tkt : Pause assert s-MailBox == null s-MailBox = v // deposit and pass message Take() : // consumer : partial method - waits as necessary auto tkt = AtomicFetchIncrement (&TakeCursor,1) slot * s = &Slots[tkt & Mask] // 2-stage waiting : // First wait for turn for our generation // Acquire exclusive "take" access to slot's MailBox field // Then wait for the slot to become occupied while s-Turn != tkt : Pause // Concurrency in this section of code is now reduced to just 1 producer thread // vs 1 consumer thread. // For a given queue and slot, there will be most one Take() operation running // in this section. // Consumer waits for producer to arrive and make slot non-empty // Extract message; clear mailbox; advance Turn indicator // We have an obvious happens-before relation : // Put(m) happens-before corresponding Take() that returns that same "m" for T v = s-MailBox if v != null : s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 // unlock slot to admit next producer and consumer return v Pause tryTake() : // total method - returns ASAP with failure indication for auto tkt = TakeCursor slot * s = &Slots[tkt & Mask] if s-Turn != tkt : return null T v = s-MailBox // presumptive return value if v == null : return null // ratify tkt and v values and commit by advancing cursor if CAS (&TakeCursor, tkt, tkt+1) != tkt : continue s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 return v The basic idea derives from the Partitioned Ticket Lock "PTL" (US20120240126-A1) and the MultiLane Concurrent Bag (US8689237). The latter is essentially a circular ring-buffer where the elements themselves are queues or concurrent collections. You can think of the PTLQueue as a partitioned ticket lock "PTL" augmented to pass values from lock to unlock via the slots. Alternatively, you could conceptualize of PTLQueue as a degenerate MultiLane bag where each slot or "lane" consists of a simple single-word MailBox instead of a general queue. Each lane in PTLQueue also has a private Turn field which acts like the Turn (Grant) variables found in PTL. Turn enforces strict FIFO ordering and restricts concurrency on the slot mailbox field to at most one simultaneous put() and take() operation. PTL uses a single "ticket" variable and per-slot Turn (grant) fields while MultiLane has distinct PutCursor and TakeCursor cursors and abstract per-slot sub-queues. Both PTL and MultiLane advance their cursor and ticket variables with atomic fetch-and-increment. PTLQueue borrows from both PTL and MultiLane and has distinct put and take cursors and per-slot Turn fields. Instead of a per-slot queues, PTLQueue uses a simple single-word MailBox field. PutCursor and TakeCursor act like a pair of ticket locks, conferring "put" and "take" access to a given slot. PutCursor, for instance, assigns an incoming put() request to a slot and serves as a PTL "Ticket" to acquire "put" permission to that slot's MailBox field. To better explain the operation of PTLQueue we deconstruct the operation of put() and take() as follows. Put() first increments PutCursor obtaining a new unique ticket. That ticket value also identifies a slot. Put() next waits for that slot's Turn field to match that ticket value. This is tantamount to using a PTL to acquire "put" permission on the slot's MailBox field. Finally, having obtained exclusive "put" permission on the slot, put() stores the message value into the slot's MailBox. Take() similarly advances TakeCursor, identifying a slot, and then acquires and secures "take" permission on a slot by waiting for Turn. Take() then waits for the slot's MailBox to become non-empty, extracts the message, and clears MailBox. Finally, take() advances the slot's Turn field, which releases both "put" and "take" access to the slot's MailBox. Note the asymmetry : put() acquires "put" access to the slot, but take() releases that lock. At any given time, for a given slot in a PTLQueue, at most one thread has "put" access and at most one thread has "take" access. This restricts concurrency from general MPMC to 1-vs-1. We have 2 ticket locks -- one for put() and one for take() -- each with its own "ticket" variable in the form of the corresponding cursor, but they share a single "Grant" egress variable in the form of the slot's Turn variable. Advancing the PutCursor, for instance, serves two purposes. First, we obtain a unique ticket which identifies a slot. Second, incrementing the cursor is the doorway protocol step to acquire the per-slot mutual exclusion "put" lock. The cursors and operations to increment those cursors serve double-duty : slot-selection and ticket assignment for locking the slot's MailBox field. At any given time a slot MailBox field can be in one of the following states: empty with no pending operations -- neutral state; empty with one or more waiting take() operations pending -- deficit; occupied with no pending operations; occupied with one or more waiting put() operations -- surplus; empty with a pending put() or pending put() and take() operations -- transitional; or occupied with a pending take() or pending put() and take() operations -- transitional. The partial put() and take() operators can be implemented with an atomic fetch-and-increment operation, which may confer a performance advantage over a CAS-based loop. In addition we have independent PutCursor and TakeCursor cursors. Critically, a put() operation modifies PutCursor but does not access the TakeCursor and a take() operation modifies the TakeCursor cursor but does not access the PutCursor. This acts to reduce coherence traffic relative to some other queue designs. It's worth noting that slow threads or obstruction in one slot (or "lane") does not impede or obstruct operations in other slots -- this gives us some degree of obstruction isolation. PTLQueue is not lock-free, however. The implementation above is expressed with polite busy-waiting (Pause) but it's trivial to implement per-slot parking and unparking to deschedule waiting threads. It's also easy to convert the queue to a more general deque by replacing the PutCursor and TakeCursor cursors with Left/Front and Right/Back cursors that can move either direction. Specifically, to push and pop from the "left" side of the deque we would decrement and increment the Left cursor, respectively, and to push and pop from the "right" side of the deque we would increment and decrement the Right cursor, respectively. We used a variation of PTLQueue for message passing in our recent OPODIS 2013 paper. ul { list-style:none; padding-left:0; padding:0; margin:0; margin-left:0; } ul#myTagID { padding: 0px; margin: 0px; list-style:none; margin-left:0;} -- -- There's quite a bit of related literature in this area. I'll call out a few relevant references: Wilson's NYU Courant Institute UltraComputer dissertation from 1988 is classic and the canonical starting point : Operating System Data Structures for Shared-Memory MIMD Machines with Fetch-and-Add. Regarding provenance and priority, I think PTLQueue or queues effectively equivalent to PTLQueue have been independently rediscovered a number of times. See CB-Queue and BNPBV, below, for instance. But Wilson's dissertation anticipates the basic idea and seems to predate all the others. Gottlieb et al : Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors Orozco et al : CB-Queue in Toward high-throughput algorithms on many-core architectures which appeared in TACO 2012. Meneghin et al : BNPVB family in Performance evaluation of inter-thread communication mechanisms on multicore/multithreaded architecture Dmitry Vyukov : bounded MPMC queue (highly recommended) Alex Otenko : US8607249 (highly related). John Mellor-Crummey : Concurrent queues: Practical fetch-and-phi algorithms. Technical Report 229, Department of Computer Science, University of Rochester Thomasson : FIFO Distributed Bakery Algorithm (very similar to PTLQueue). Scott and Scherer : Dual Data Structures I'll propose an optimization left as an exercise for the reader. Say we wanted to reduce memory usage by eliminating inter-slot padding. Such padding is usually "dark" memory and otherwise unused and wasted. But eliminating the padding leaves us at risk of increased false sharing. Furthermore lets say it was usually the case that the PutCursor and TakeCursor were numerically close to each other. (That's true in some use cases). We might still reduce false sharing by incrementing the cursors by some value other than 1 that is not trivially small and is coprime with the number of slots. Alternatively, we might increment the cursor by one and mask as usual, resulting in a logical index. We then use that logical index value to index into a permutation table, yielding an effective index for use in the slot array. The permutation table would be constructed so that nearby logical indices would map to more distant effective indices. (Open question: what should that permutation look like? Possibly some perversion of a Gray code or De Bruijn sequence might be suitable). As an aside, say we need to busy-wait for some condition as follows : "while C == 0 : Pause". Lets say that C is usually non-zero, so we typically don't wait. But when C happens to be 0 we'll have to spin for some period, possibly brief. We can arrange for the code to be more machine-friendly with respect to the branch predictors by transforming the loop into : "if C == 0 : for { Pause; if C != 0 : break; }". Critically, we want to restructure the loop so there's one branch that controls entry and another that controls loop exit. A concern is that your compiler or JIT might be clever enough to transform this back to "while C == 0 : Pause". You can sometimes avoid this by inserting a call to a some type of very cheap "opaque" method that the compiler can't elide or reorder. On Solaris, for instance, you could use :"if C == 0 : { gethrtime(); for { Pause; if C != 0 : break; }}". It's worth noting the obvious duality between locks and queues. If you have strict FIFO lock implementation with local spinning and succession by direct handoff such as MCS or CLH,then you can usually transform that lock into a queue. Hidden commentary and annotations - invisible : * And of course there's a well-known duality between queues and locks, but I'll leave that topic for another blog post. * Compare and contrast : PTLQ vs PTL and MultiLane * Equivalent : Turn; seq; sequence; pos; position; ticket * Put = Lock; Deposit Take = identify and reserve slot; wait; extract & clear; unlock * conceptualize : Distinct PutLock and TakeLock implemented as ticket lock or PTL Distinct arrival cursors but share per-slot "Turn" variable provides exclusive role-based access to slot's mailbox field put() acquires exclusive access to a slot for purposes of "deposit" assigns slot round-robin and then acquires deposit access rights/perms to that slot take() acquires exclusive access to slot for purposes of "withdrawal" assigns slot round-robin and then acquires withdrawal access rights/perms to that slot At any given time, only one thread can have withdrawal access to a slot at any given time, only one thread can have deposit access to a slot Permissible for T1 to have deposit access and T2 to simultaneously have withdrawal access * round-robin for the purposes of; role-based; access mode; access role mailslot; mailbox; allocate/assign/identify slot rights; permission; license; access permission; * PTL/Ticket hybrid Asymmetric usage ; owner oblivious lock-unlock pairing K-exclusion add Grant cursor pass message m from lock to unlock via Slots[] array Cursor performs 2 functions : + PTL ticket + Assigns request to slot in round-robin fashion Deconstruct protocol : explication put() : allocate slot in round-robin fashion acquire PTL for "put" access store message into slot associated with PTL index take() : Acquire PTL for "take" access // doorway step seq = fetchAdd (&Grant, 1) s = &Slots[seq & Mask] // waiting phase while s-Turn != seq : pause Extract : wait for s-mailbox to be full v = s-mailbox s-mailbox = null Release PTL for both "put" and "take" access s-Turn = seq + Mask + 1 * Slot round-robin assignment and lock "doorway" protocol leverage the same cursor and FetchAdd operation on that cursor FetchAdd (&Cursor,1) + round-robin slot assignment and dispersal + PTL/ticket lock "doorway" step waiting phase is via "Turn" field in slot * PTLQueue uses 2 cursors -- put and take. Acquire "put" access to slot via PTL-like lock Acquire "take" access to slot via PTL-like lock 2 locks : put and take -- at most one thread can access slot's mailbox Both locks use same "turn" field Like multilane : 2 cursors : put and take slot is simple 1-capacity mailbox instead of queue Borrow per-slot turn/grant from PTL Provides strict FIFO Lock slot : put-vs-put take-vs-take at most one put accesses slot at any one time at most one put accesses take at any one time reduction to 1-vs-1 instead of N-vs-M concurrency Per slot locks for put/take Release put/take by advancing turn * is instrumental in ... * P-V Semaphore vs lock vs K-exclusion * See also : FastQueues-excerpt.java dice-etc/queue-mpmc-bounded-blocking-circular-xadd/ * PTLQueue is the same as PTLQB - identical * Expedient return; ASAP; prompt; immediately * Lamport's Bakery algorithm : doorway step then waiting phase Threads arriving at doorway obtain a unique ticket number Threads enter in ticket order * In the terminology of Reed and Kanodia a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer It can also be thought of as an optimization of Lamport's bakery lock was designed for fault-tolerance rather than performance Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers --

    Read the article

  • Is there a "golden ratio" in coding?

    - by badallen
    My coworkers and I often come up with silly ideas such as adding entries to Urban Dictionary that are inappropriate but completely make sense if you are a developer. Or making rap songs that are about delegates, reflections or closures in JS... Anyhow, here is what I brought up this afternoon which was immediately dismissed to be a stupid idea. So I want to see if I can get redemptions here. My idea is coming up with a Golden Ratio (or in the neighborhood of) between the number of classes per project versus the number of methods/functions per class versus the number of lines per method/function. I know this is silly and borderline, if not completely, useless, but just think of all the legacy methods or classes you have encountered that are absolutely horrid - like methods with 10000 lines or classes with 10000 methods. So Golden Ratio, anyone? :)

    Read the article

< Previous Page | 30 31 32 33 34 35 36 37 38 39 40 41  | Next Page >