Search Results

Search found 7671 results on 307 pages for 'slow browsing'.

Page 306/307 | < Previous Page | 302 303 304 305 306 307  | Next Page >

  • Toorcon14

    - by danx
    Toorcon 2012 Information Security Conference San Diego, CA, http://www.toorcon.org/ Dan Anderson, October 2012 It's almost Halloween, and we all know what that means—yes, of course, it's time for another Toorcon Conference! Toorcon is an annual conference for people interested in computer security. This includes the whole range of hackers, computer hobbyists, professionals, security consultants, press, law enforcement, prosecutors, FBI, etc. We're at Toorcon 14—see earlier blogs for some of the previous Toorcon's I've attended (back to 2003). This year's "con" was held at the Westin on Broadway in downtown San Diego, California. The following are not necessarily my views—I'm just the messenger—although I could have misquoted or misparaphrased the speakers. Also, I only reviewed some of the talks, below, which I attended and interested me. MalAndroid—the Crux of Android Infections, Aditya K. Sood Programming Weird Machines with ELF Metadata, Rebecca "bx" Shapiro Privacy at the Handset: New FCC Rules?, Valkyrie Hacking Measured Boot and UEFI, Dan Griffin You Can't Buy Security: Building the Open Source InfoSec Program, Boris Sverdlik What Journalists Want: The Investigative Reporters' Perspective on Hacking, Dave Maas & Jason Leopold Accessibility and Security, Anna Shubina Stop Patching, for Stronger PCI Compliance, Adam Brand McAfee Secure & Trustmarks — a Hacker's Best Friend, Jay James & Shane MacDougall MalAndroid—the Crux of Android Infections Aditya K. Sood, IOActive, Michigan State PhD candidate Aditya talked about Android smartphone malware. There's a lot of old Android software out there—over 50% Gingerbread (2.3.x)—and most have unpatched vulnerabilities. Of 9 Android vulnerabilities, 8 have known exploits (such as the old Gingerbread Global Object Table exploit). Android protection includes sandboxing, security scanner, app permissions, and screened Android app market. The Android permission checker has fine-grain resource control, policy enforcement. Android static analysis also includes a static analysis app checker (bouncer), and a vulnerablity checker. What security problems does Android have? User-centric security, which depends on the user to grant permission and make smart decisions. But users don't care or think about malware (the're not aware, not paranoid). All they want is functionality, extensibility, mobility Android had no "proper" encryption before Android 3.0 No built-in protection against social engineering and web tricks Alternative Android app markets are unsafe. Simply visiting some markets can infect Android Aditya classified Android Malware types as: Type A—Apps. These interact with the Android app framework. For example, a fake Netflix app. Or Android Gold Dream (game), which uploads user files stealthy manner to a remote location. Type K—Kernel. Exploits underlying Linux libraries or kernel Type H—Hybrid. These use multiple layers (app framework, libraries, kernel). These are most commonly used by Android botnets, which are popular with Chinese botnet authors What are the threats from Android malware? These incude leak info (contacts), banking fraud, corporate network attacks, malware advertising, malware "Hackivism" (the promotion of social causes. For example, promiting specific leaders of the Tunisian or Iranian revolutions. Android malware is frequently "masquerated". That is, repackaged inside a legit app with malware. To avoid detection, the hidden malware is not unwrapped until runtime. The malware payload can be hidden in, for example, PNG files. Less common are Android bootkits—there's not many around. What they do is hijack the Android init framework—alteering system programs and daemons, then deletes itself. For example, the DKF Bootkit (China). Android App Problems: no code signing! all self-signed native code execution permission sandbox — all or none alternate market places no robust Android malware detection at network level delayed patch process Programming Weird Machines with ELF Metadata Rebecca "bx" Shapiro, Dartmouth College, NH https://github.com/bx/elf-bf-tools @bxsays on twitter Definitions. "ELF" is an executable file format used in linking and loading executables (on UNIX/Linux-class machines). "Weird machine" uses undocumented computation sources (I think of them as unintended virtual machines). Some examples of "weird machines" are those that: return to weird location, does SQL injection, corrupts the heap. Bx then talked about using ELF metadata as (an uintended) "weird machine". Some ELF background: A compiler takes source code and generates a ELF object file (hello.o). A static linker makes an ELF executable from the object file. A runtime linker and loader takes ELF executable and loads and relocates it in memory. The ELF file has symbols to relocate functions and variables. ELF has two relocation tables—one at link time and another one at loading time: .rela.dyn (link time) and .dynsym (dynamic table). GOT: Global Offset Table of addresses for dynamically-linked functions. PLT: Procedure Linkage Tables—works with GOT. The memory layout of a process (not the ELF file) is, in order: program (+ heap), dynamic libraries, libc, ld.so, stack (which includes the dynamic table loaded into memory) For ELF, the "weird machine" is found and exploited in the loader. ELF can be crafted for executing viruses, by tricking runtime into executing interpreted "code" in the ELF symbol table. One can inject parasitic "code" without modifying the actual ELF code portions. Think of the ELF symbol table as an "assembly language" interpreter. It has these elements: instructions: Add, move, jump if not 0 (jnz) Think of symbol table entries as "registers" symbol table value is "contents" immediate values are constants direct values are addresses (e.g., 0xdeadbeef) move instruction: is a relocation table entry add instruction: relocation table "addend" entry jnz instruction: takes multiple relocation table entries The ELF weird machine exploits the loader by relocating relocation table entries. The loader will go on forever until told to stop. It stores state on stack at "end" and uses IFUNC table entries (containing function pointer address). The ELF weird machine, called "Brainfu*k" (BF) has: 8 instructions: pointer inc, dec, inc indirect, dec indirect, jump forward, jump backward, print. Three registers - 3 registers Bx showed example BF source code that implemented a Turing machine printing "hello, world". More interesting was the next demo, where bx modified ping. Ping runs suid as root, but quickly drops privilege. BF modified the loader to disable the library function call dropping privilege, so it remained as root. Then BF modified the ping -t argument to execute the -t filename as root. It's best to show what this modified ping does with an example: $ whoami bx $ ping localhost -t backdoor.sh # executes backdoor $ whoami root $ The modified code increased from 285948 bytes to 290209 bytes. A BF tool compiles "executable" by modifying the symbol table in an existing ELF executable. The tool modifies .dynsym and .rela.dyn table, but not code or data. Privacy at the Handset: New FCC Rules? "Valkyrie" (Christie Dudley, Santa Clara Law JD candidate) Valkyrie talked about mobile handset privacy. Some background: Senator Franken (also a comedian) became alarmed about CarrierIQ, where the carriers track their customers. Franken asked the FCC to find out what obligations carriers think they have to protect privacy. The carriers' response was that they are doing just fine with self-regulation—no worries! Carriers need to collect data, such as missed calls, to maintain network quality. But carriers also sell data for marketing. Verizon sells customer data and enables this with a narrow privacy policy (only 1 month to opt out, with difficulties). The data sold is not individually identifiable and is aggregated. But Verizon recommends, as an aggregation workaround to "recollate" data to other databases to identify customers indirectly. The FCC has regulated telephone privacy since 1934 and mobile network privacy since 2007. Also, the carriers say mobile phone privacy is a FTC responsibility (not FCC). FTC is trying to improve mobile app privacy, but FTC has no authority over carrier / customer relationships. As a side note, Apple iPhones are unique as carriers have extra control over iPhones they don't have with other smartphones. As a result iPhones may be more regulated. Who are the consumer advocates? Everyone knows EFF, but EPIC (Electrnic Privacy Info Center), although more obsecure, is more relevant. What to do? Carriers must be accountable. Opt-in and opt-out at any time. Carriers need incentive to grant users control for those who want it, by holding them liable and responsible for breeches on their clock. Location information should be added current CPNI privacy protection, and require "Pen/trap" judicial order to obtain (and would still be a lower standard than 4th Amendment). Politics are on a pro-privacy swing now, with many senators and the Whitehouse. There will probably be new regulation soon, and enforcement will be a problem, but consumers will still have some benefit. Hacking Measured Boot and UEFI Dan Griffin, JWSecure, Inc., Seattle, @JWSdan Dan talked about hacking measured UEFI boot. First some terms: UEFI is a boot technology that is replacing BIOS (has whitelisting and blacklisting). UEFI protects devices against rootkits. TPM - hardware security device to store hashs and hardware-protected keys "secure boot" can control at firmware level what boot images can boot "measured boot" OS feature that tracks hashes (from BIOS, boot loader, krnel, early drivers). "remote attestation" allows remote validation and control based on policy on a remote attestation server. Microsoft pushing TPM (Windows 8 required), but Google is not. Intel TianoCore is the only open source for UEFI. Dan has Measured Boot Tool at http://mbt.codeplex.com/ with a demo where you can also view TPM data. TPM support already on enterprise-class machines. UEFI Weaknesses. UEFI toolkits are evolving rapidly, but UEFI has weaknesses: assume user is an ally trust TPM implicitly, and attached to computer hibernate file is unprotected (disk encryption protects against this) protection migrating from hardware to firmware delays in patching and whitelist updates will UEFI really be adopted by the mainstream (smartphone hardware support, bank support, apathetic consumer support) You Can't Buy Security: Building the Open Source InfoSec Program Boris Sverdlik, ISDPodcast.com co-host Boris talked about problems typical with current security audits. "IT Security" is an oxymoron—IT exists to enable buiness, uptime, utilization, reporting, but don't care about security—IT has conflict of interest. There's no Magic Bullet ("blinky box"), no one-size-fits-all solution (e.g., Intrusion Detection Systems (IDSs)). Regulations don't make you secure. The cloud is not secure (because of shared data and admin access). Defense and pen testing is not sexy. Auditors are not solution (security not a checklist)—what's needed is experience and adaptability—need soft skills. Step 1: First thing is to Google and learn the company end-to-end before you start. Get to know the management team (not IT team), meet as many people as you can. Don't use arbitrary values such as CISSP scores. Quantitive risk assessment is a myth (e.g. AV*EF-SLE). Learn different Business Units, legal/regulatory obligations, learn the business and where the money is made, verify company is protected from script kiddies (easy), learn sensitive information (IP, internal use only), and start with low-hanging fruit (customer service reps and social engineering). Step 2: Policies. Keep policies short and relevant. Generic SANS "security" boilerplate policies don't make sense and are not followed. Focus on acceptable use, data usage, communications, physical security. Step 3: Implementation: keep it simple stupid. Open source, although useful, is not free (implementation cost). Access controls with authentication & authorization for local and remote access. MS Windows has it, otherwise use OpenLDAP, OpenIAM, etc. Application security Everyone tries to reinvent the wheel—use existing static analysis tools. Review high-risk apps and major revisions. Don't run different risk level apps on same system. Assume host/client compromised and use app-level security control. Network security VLAN != segregated because there's too many workarounds. Use explicit firwall rules, active and passive network monitoring (snort is free), disallow end user access to production environment, have a proxy instead of direct Internet access. Also, SSL certificates are not good two-factor auth and SSL does not mean "safe." Operational Controls Have change, patch, asset, & vulnerability management (OSSI is free). For change management, always review code before pushing to production For logging, have centralized security logging for business-critical systems, separate security logging from administrative/IT logging, and lock down log (as it has everything). Monitor with OSSIM (open source). Use intrusion detection, but not just to fulfill a checkbox: build rules from a whitelist perspective (snort). OSSEC has 95% of what you need. Vulnerability management is a QA function when done right: OpenVas and Seccubus are free. Security awareness The reality is users will always click everything. Build real awareness, not compliance driven checkbox, and have it integrated into the culture. Pen test by crowd sourcing—test with logging COSSP http://www.cossp.org/ - Comprehensive Open Source Security Project What Journalists Want: The Investigative Reporters' Perspective on Hacking Dave Maas, San Diego CityBeat Jason Leopold, Truthout.org The difference between hackers and investigative journalists: For hackers, the motivation varies, but method is same, technological specialties. For investigative journalists, it's about one thing—The Story, and they need broad info-gathering skills. J-School in 60 Seconds: Generic formula: Person or issue of pubic interest, new info, or angle. Generic criteria: proximity, prominence, timeliness, human interest, oddity, or consequence. Media awareness of hackers and trends: journalists becoming extremely aware of hackers with congressional debates (privacy, data breaches), demand for data-mining Journalists, use of coding and web development for Journalists, and Journalists busted for hacking (Murdock). Info gathering by investigative journalists include Public records laws. Federal Freedom of Information Act (FOIA) is good, but slow. California Public Records Act is a lot stronger. FOIA takes forever because of foot-dragging—it helps to be specific. Often need to sue (especially FBI). CPRA is faster, and requests can be vague. Dumps and leaks (a la Wikileaks) Journalists want: leads, protecting ourselves, our sources, and adapting tools for news gathering (Google hacking). Anonomity is important to whistleblowers. They want no digital footprint left behind (e.g., email, web log). They don't trust encryption, want to feel safe and secure. Whistleblower laws are very weak—there's no upside for whistleblowers—they have to be very passionate to do it. Accessibility and Security or: How I Learned to Stop Worrying and Love the Halting Problem Anna Shubina, Dartmouth College Anna talked about how accessibility and security are related. Accessibility of digital content (not real world accessibility). mostly refers to blind users and screenreaders, for our purpose. Accessibility is about parsing documents, as are many security issues. "Rich" executable content causes accessibility to fail, and often causes security to fail. For example MS Word has executable format—it's not a document exchange format—more dangerous than PDF or HTML. Accessibility is often the first and maybe only sanity check with parsing. They have no choice because someone may want to read what you write. Google, for example, is very particular about web browser you use and are bad at supporting other browsers. Uses JavaScript instead of links, often requiring mouseover to display content. PDF is a security nightmare. Executible format, embedded flash, JavaScript, etc. 15 million lines of code. Google Chrome doesn't handle PDF correctly, causing several security bugs. PDF has an accessibility checker and PDF tagging, to help with accessibility. But no PDF checker checks for incorrect tags, untagged content, or validates lists or tables. None check executable content at all. The "Halting Problem" is: can one decide whether a program will ever stop? The answer, in general, is no (Rice's theorem). The same holds true for accessibility checkers. Language-theoretic Security says complicated data formats are hard to parse and cannot be solved due to the Halting Problem. W3C Web Accessibility Guidelines: "Perceivable, Operable, Understandable, Robust" Not much help though, except for "Robust", but here's some gems: * all information should be parsable (paraphrasing) * if not parsable, cannot be converted to alternate formats * maximize compatibility in new document formats Executible webpages are bad for security and accessibility. They say it's for a better web experience. But is it necessary to stuff web pages with JavaScript for a better experience? A good example is The Drudge Report—it has hand-written HTML with no JavaScript, yet drives a lot of web traffic due to good content. A bad example is Google News—hidden scrollbars, guessing user input. Solutions: Accessibility and security problems come from same source Expose "better user experience" myth Keep your corner of Internet parsable Remember "Halting Problem"—recognize false solutions (checking and verifying tools) Stop Patching, for Stronger PCI Compliance Adam Brand, protiviti @adamrbrand, http://www.picfun.com/ Adam talked about PCI compliance for retail sales. Take an example: for PCI compliance, 50% of Brian's time (a IT guy), 960 hours/year was spent patching POSs in 850 restaurants. Often applying some patches make no sense (like fixing a browser vulnerability on a server). "Scanner worship" is overuse of vulnerability scanners—it gives a warm and fuzzy and it's simple (red or green results—fix reds). Scanners give a false sense of security. In reality, breeches from missing patches are uncommon—more common problems are: default passwords, cleartext authentication, misconfiguration (firewall ports open). Patching Myths: Myth 1: install within 30 days of patch release (but PCI §6.1 allows a "risk-based approach" instead). Myth 2: vendor decides what's critical (also PCI §6.1). But §6.2 requires user ranking of vulnerabilities instead. Myth 3: scan and rescan until it passes. But PCI §11.2.1b says this applies only to high-risk vulnerabilities. Adam says good recommendations come from NIST 800-40. Instead use sane patching and focus on what's really important. From NIST 800-40: Proactive: Use a proactive vulnerability management process: use change control, configuration management, monitor file integrity. Monitor: start with NVD and other vulnerability alerts, not scanner results. Evaluate: public-facing system? workstation? internal server? (risk rank) Decide:on action and timeline Test: pre-test patches (stability, functionality, rollback) for change control Install: notify, change control, tickets McAfee Secure & Trustmarks — a Hacker's Best Friend Jay James, Shane MacDougall, Tactical Intelligence Inc., Canada "McAfee Secure Trustmark" is a website seal marketed by McAfee. A website gets this badge if they pass their remote scanning. The problem is a removal of trustmarks act as flags that you're vulnerable. Easy to view status change by viewing McAfee list on website or on Google. "Secure TrustGuard" is similar to McAfee. Jay and Shane wrote Perl scripts to gather sites from McAfee and search engines. If their certification image changes to a 1x1 pixel image, then they are longer certified. Their scripts take deltas of scans to see what changed daily. The bottom line is change in TrustGuard status is a flag for hackers to attack your site. Entire idea of seals is silly—you're raising a flag saying if you're vulnerable.

    Read the article

  • Using BPEL Performance Statistics to Diagnose Performance Bottlenecks

    - by fip
    Tuning performance of Oracle SOA 11G applications could be challenging. Because SOA is a platform for you to build composite applications that connect many applications and "services", when the overall performance is slow, the bottlenecks could be anywhere in the system: the applications/services that SOA connects to, the infrastructure database, or the SOA server itself.How to quickly identify the bottleneck becomes crucial in tuning the overall performance. Fortunately, the BPEL engine in Oracle SOA 11G (and 10G, for that matter) collects BPEL Engine Performance Statistics, which show the latencies of low level BPEL engine activities. The BPEL engine performance statistics can make it a bit easier for you to identify the performance bottleneck. Although the BPEL engine performance statistics are always available, the access to and interpretation of them are somewhat obscure in the early and current (PS5) 11G versions. This blog attempts to offer instructions that help you to enable, retrieve and interpret the performance statistics, before the future versions provides a more pleasant user experience. Overview of BPEL Engine Performance Statistics  SOA BPEL has a feature of collecting some performance statistics and store them in memory. One MBean attribute, StatLastN, configures the size of the memory buffer to store the statistics. This memory buffer is a "moving window", in a way that old statistics will be flushed out by the new if the amount of data exceeds the buffer size. Since the buffer size is limited by StatLastN, impacts of statistics collection on performance is minimal. By default StatLastN=-1, which means no collection of performance data. Once the statistics are collected in the memory buffer, they can be retrieved via another MBean oracle.as.soainfra.bpel:Location=[Server Name],name=BPELEngine,type=BPELEngine.> My friend in Oracle SOA development wrote this simple 'bpelstat' web app that looks up and retrieves the performance data from the MBean and displays it in a human readable form. It does not have beautiful UI but it is fairly useful. Although in Oracle SOA 11.1.1.5 onwards the same statistics can be viewed via a more elegant UI under "request break down" at EM -> SOA Infrastructure -> Service Engines -> BPEL -> Statistics, some unsophisticated minds like mine may still prefer the simplicity of the 'bpelstat' JSP. One thing that simple JSP does do well is that you can save the page and send it to someone to further analyze Follows are the instructions of how to install and invoke the BPEL statistic JSP. My friend in SOA Development will soon blog about interpreting the statistics. Stay tuned. Step1: Enable BPEL Engine Statistics for Each SOA Servers via Enterprise Manager First st you need to set the StatLastN to some number as a way to enable the collection of BPEL Engine Performance Statistics EM Console -> soa-infra(Server Name) -> SOA Infrastructure -> SOA Administration -> BPEL Properties Click on "More BPEL Configuration Properties" Click on attribute "StatLastN", set its value to some integer number. Typically you want to set it 1000 or more. Step 2: Download and Deploy bpelstat.war File to Admin Server, Note: the WAR file contains a JSP that does NOT have any security restriction. You do NOT want to keep in your production server for a long time as it is a security hazard. Deactivate the war once you are done. Download the bpelstat.war to your local PC At WebLogic Console, Go to Deployments -> Install Click on the "upload your file(s)" Click the "Browse" button to upload the deployment to Admin Server Accept the uploaded file as the path, click next Check the default option "Install this deployment as an application" Check "AdminServer" as the target server Finish the rest of the deployment with default settings Console -> Deployments Check the box next to "bpelstat" application Click on the "Start" button. It will change the state of the app from "prepared" to "active" Step 3: Invoke the BPEL Statistic Tool The BPELStat tool merely call the MBean of BPEL server and collects and display the in-memory performance statics. You usually want to do that after some peak loads. Go to http://<admin-server-host>:<admin-server-port>/bpelstat Enter the correct admin hostname, port, username and password Enter the SOA Server Name from which you want to collect the performance statistics. For example, SOA_MS1, etc. Click Submit Keep doing the same for all SOA servers. Step 3: Interpret the BPEL Engine Statistics You will see a few categories of BPEL Statistics from the JSP Page. First it starts with the overall latency of BPEL processes, grouped by synchronous and asynchronous processes. Then it provides the further break down of the measurements through the life time of a BPEL request, which is called the "request break down". 1. Overall latency of BPEL processes The top of the page shows that the elapse time of executing the synchronous process TestSyncBPELProcess from the composite TestComposite averages at about 1543.21ms, while the elapse time of executing the asynchronous process TestAsyncBPELProcess from the composite TestComposite2 averages at about 1765.43ms. The maximum and minimum latency were also shown. Synchronous process statistics <statistics>     <stats key="default/TestComposite!2.0.2-ScopedJMSOSB*soa_bfba2527-a9ba-41a7-95c5-87e49c32f4ff/TestSyncBPELProcess" min="1234" max="4567" average="1543.21" count="1000">     </stats> </statistics> Asynchronous process statistics <statistics>     <stats key="default/TestComposite2!2.0.2-ScopedJMSOSB*soa_bfba2527-a9ba-41a7-95c5-87e49c32f4ff/TestAsyncBPELProcess" min="2234" max="3234" average="1765.43" count="1000">     </stats> </statistics> 2. Request break down Under the overall latency categorized by synchronous and asynchronous processes is the "Request breakdown". Organized by statistic keys, the Request breakdown gives finer grain performance statistics through the life time of the BPEL requests.It uses indention to show the hierarchy of the statistics. Request breakdown <statistics>     <stats key="eng-composite-request" min="0" max="0" average="0.0" count="0">         <stats key="eng-single-request" min="22" max="606" average="258.43" count="277">             <stats key="populate-context" min="0" max="0" average="0.0" count="248"> Please note that in SOA 11.1.1.6, the statistics under Request breakdown is aggregated together cross all the BPEL processes based on statistic keys. It does not differentiate between BPEL processes. If two BPEL processes happen to have the statistic that share same statistic key, the statistics from two BPEL processes will be aggregated together. Keep this in mind when we go through more details below. 2.1 BPEL process activity latencies A very useful measurement in the Request Breakdown is the performance statistics of the BPEL activities you put in your BPEL processes: Assign, Invoke, Receive, etc. The names of the measurement in the JSP page directly come from the names to assign to each BPEL activity. These measurements are under the statistic key "actual-perform" Example 1:  Follows is the measurement for BPEL activity "AssignInvokeCreditProvider_Input", which looks like the Assign activity in a BPEL process that assign an input variable before passing it to the invocation:                                <stats key="AssignInvokeCreditProvider_Input" min="1" max="8" average="1.9" count="153">                                     <stats key="sensor-send-activity-data" min="0" max="1" average="0.0" count="306">                                     </stats>                                     <stats key="sensor-send-variable-data" min="0" max="0" average="0.0" count="153">                                     </stats>                                     <stats key="monitor-send-activity-data" min="0" max="0" average="0.0" count="306">                                     </stats>                                 </stats> Note: because as previously mentioned that the statistics cross all BPEL processes are aggregated together based on statistic keys, if two BPEL processes happen to name their Invoke activity the same name, they will show up at one measurement (i.e. statistic key). Example 2: Follows is the measurement of BPEL activity called "InvokeCreditProvider". You can not only see that by average it takes 3.31ms to finish this call (pretty fast) but also you can see from the further break down that most of this 3.31 ms was spent on the "invoke-service".                                  <stats key="InvokeCreditProvider" min="1" max="13" average="3.31" count="153">                                     <stats key="initiate-correlation-set-again" min="0" max="0" average="0.0" count="153">                                     </stats>                                     <stats key="invoke-service" min="1" max="13" average="3.08" count="153">                                         <stats key="prep-call" min="0" max="1" average="0.04" count="153">                                         </stats>                                     </stats>                                     <stats key="initiate-correlation-set" min="0" max="0" average="0.0" count="153">                                     </stats>                                     <stats key="sensor-send-activity-data" min="0" max="0" average="0.0" count="306">                                     </stats>                                     <stats key="sensor-send-variable-data" min="0" max="0" average="0.0" count="153">                                     </stats>                                     <stats key="monitor-send-activity-data" min="0" max="0" average="0.0" count="306">                                     </stats>                                     <stats key="update-audit-trail" min="0" max="2" average="0.03" count="153">                                     </stats>                                 </stats> 2.2 BPEL engine activity latency Another type of measurements under Request breakdown are the latencies of underlying system level engine activities. These activities are not directly tied to a particular BPEL process or process activity, but they are critical factors in the overall engine performance. These activities include the latency of saving asynchronous requests to database, and latency of process dehydration. My friend Malkit Bhasin is working on providing more information on interpreting the statistics on engine activities on his blog (https://blogs.oracle.com/malkit/). I will update this blog once the information becomes available. Update on 2012-10-02: My friend Malkit Bhasin has published the detail interpretation of the BPEL service engine statistics at his blog http://malkit.blogspot.com/2012/09/oracle-bpel-engine-soa-suite.html.

    Read the article

  • Toorcon 15 (2013)

    - by danx
    The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

    Read the article

  • ASP.NET Frameworks and Raw Throughput Performance

    - by Rick Strahl
    A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point  or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services  (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012  removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line  looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this:   It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST   [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type.   ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results:   The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET  Web Api   Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

    Read the article

  • Help with Collision Resolution?

    - by Milo
    I'm trying to learn about physics by trying to make a simplified GTA 2 clone. My only problem is collision resolution. Everything else works great. I have a rigid body class and from there cars and a wheel class: class RigidBody extends Entity { //linear private Vector2D velocity = new Vector2D(); private Vector2D forces = new Vector2D(); private OBB2D predictionRect = new OBB2D(new Vector2D(), 1.0f, 1.0f, 0.0f); private float mass; private Vector2D deltaVec = new Vector2D(); private Vector2D v = new Vector2D(); //angular private float angularVelocity; private float torque; private float inertia; //graphical private Vector2D halfSize = new Vector2D(); private Bitmap image; private Matrix mat = new Matrix(); private float[] Vector2Ds = new float[2]; private Vector2D tangent = new Vector2D(); private static Vector2D worldRelVec = new Vector2D(); private static Vector2D relWorldVec = new Vector2D(); private static Vector2D pointVelVec = new Vector2D(); public RigidBody() { //set these defaults so we don't get divide by zeros mass = 1.0f; inertia = 1.0f; setLayer(LAYER_OBJECTS); } protected void rectChanged() { if(getWorld() != null) { getWorld().updateDynamic(this); } } //intialize out parameters public void initialize(Vector2D halfSize, float mass, Bitmap bitmap) { //store physical parameters this.halfSize = halfSize; this.mass = mass; image = bitmap; inertia = (1.0f / 20.0f) * (halfSize.x * halfSize.x) * (halfSize.y * halfSize.y) * mass; RectF rect = new RectF(); float scalar = 10.0f; rect.left = (int)-halfSize.x * scalar; rect.top = (int)-halfSize.y * scalar; rect.right = rect.left + (int)(halfSize.x * 2.0f * scalar); rect.bottom = rect.top + (int)(halfSize.y * 2.0f * scalar); setRect(rect); predictionRect.set(rect); } public void setLocation(Vector2D position, float angle) { getRect().set(position, getWidth(), getHeight(), angle); rectChanged(); } public void setPredictionLocation(Vector2D position, float angle) { getPredictionRect().set(position, getWidth(), getHeight(), angle); } public void setPredictionCenter(Vector2D center) { getPredictionRect().moveTo(center); } public void setPredictionAngle(float angle) { predictionRect.setAngle(angle); } public Vector2D getPosition() { return getRect().getCenter(); } public OBB2D getPredictionRect() { return predictionRect; } @Override public void update(float timeStep) { doUpdate(false,timeStep); } public void doUpdate(boolean prediction, float timeStep) { //integrate physics //linear Vector2D acceleration = Vector2D.scalarDivide(forces, mass); if(prediction) { Vector2D velocity = Vector2D.add(this.velocity, Vector2D.scalarMultiply(acceleration, timeStep)); Vector2D c = getRect().getCenter(); c = Vector2D.add(getRect().getCenter(), Vector2D.scalarMultiply(velocity , timeStep)); setPredictionCenter(c); //forces = new Vector2D(0,0); //clear forces } else { velocity.x += (acceleration.x * timeStep); velocity.y += (acceleration.y * timeStep); //velocity = Vector2D.add(velocity, Vector2D.scalarMultiply(acceleration, timeStep)); Vector2D c = getRect().getCenter(); v.x = getRect().getCenter().getX() + (velocity.x * timeStep); v.y = getRect().getCenter().getY() + (velocity.y * timeStep); deltaVec.x = v.x - c.x; deltaVec.y = v.y - c.y; deltaVec.normalize(); setCenter(v.x, v.y); forces.x = 0; //clear forces forces.y = 0; } //angular float angAcc = torque / inertia; if(prediction) { float angularVelocity = this.angularVelocity + angAcc * timeStep; setPredictionAngle(getAngle() + angularVelocity * timeStep); //torque = 0; //clear torque } else { angularVelocity += angAcc * timeStep; setAngle(getAngle() + angularVelocity * timeStep); torque = 0; //clear torque } } public void updatePrediction(float timeStep) { doUpdate(true, timeStep); } //take a relative Vector2D and make it a world Vector2D public Vector2D relativeToWorld(Vector2D relative) { mat.reset(); Vector2Ds[0] = relative.x; Vector2Ds[1] = relative.y; mat.postRotate(JMath.radToDeg(getAngle())); mat.mapVectors(Vector2Ds); relWorldVec.x = Vector2Ds[0]; relWorldVec.y = Vector2Ds[1]; return new Vector2D(Vector2Ds[0], Vector2Ds[1]); } //take a world Vector2D and make it a relative Vector2D public Vector2D worldToRelative(Vector2D world) { mat.reset(); Vector2Ds[0] = world.x; Vector2Ds[1] = world.y; mat.postRotate(JMath.radToDeg(-getAngle())); mat.mapVectors(Vector2Ds); return new Vector2D(Vector2Ds[0], Vector2Ds[1]); } //velocity of a point on body public Vector2D pointVelocity(Vector2D worldOffset) { tangent.x = -worldOffset.y; tangent.y = worldOffset.x; return Vector2D.add( Vector2D.scalarMultiply(tangent, angularVelocity) , velocity); } public void applyForce(Vector2D worldForce, Vector2D worldOffset) { //add linear force forces.x += worldForce.x; forces.y += worldForce.y; //add associated torque torque += Vector2D.cross(worldOffset, worldForce); } @Override public void draw( GraphicsContext c) { c.drawRotatedScaledBitmap(image, getPosition().x, getPosition().y, getWidth(), getHeight(), getAngle()); } public Vector2D getVelocity() { return velocity; } public void setVelocity(Vector2D velocity) { this.velocity = velocity; } public Vector2D getDeltaVec() { return deltaVec; } } Vehicle public class Wheel { private Vector2D forwardVec; private Vector2D sideVec; private float wheelTorque; private float wheelSpeed; private float wheelInertia; private float wheelRadius; private Vector2D position = new Vector2D(); public Wheel(Vector2D position, float radius) { this.position = position; setSteeringAngle(0); wheelSpeed = 0; wheelRadius = radius; wheelInertia = (radius * radius) * 1.1f; } public void setSteeringAngle(float newAngle) { Matrix mat = new Matrix(); float []vecArray = new float[4]; //forward Vector vecArray[0] = 0; vecArray[1] = 1; //side Vector vecArray[2] = -1; vecArray[3] = 0; mat.postRotate(newAngle / (float)Math.PI * 180.0f); mat.mapVectors(vecArray); forwardVec = new Vector2D(vecArray[0], vecArray[1]); sideVec = new Vector2D(vecArray[2], vecArray[3]); } public void addTransmissionTorque(float newValue) { wheelTorque += newValue; } public float getWheelSpeed() { return wheelSpeed; } public Vector2D getAnchorPoint() { return position; } public Vector2D calculateForce(Vector2D relativeGroundSpeed, float timeStep, boolean prediction) { //calculate speed of tire patch at ground Vector2D patchSpeed = Vector2D.scalarMultiply(Vector2D.scalarMultiply( Vector2D.negative(forwardVec), wheelSpeed), wheelRadius); //get velocity difference between ground and patch Vector2D velDifference = Vector2D.add(relativeGroundSpeed , patchSpeed); //project ground speed onto side axis Float forwardMag = new Float(0.0f); Vector2D sideVel = velDifference.project(sideVec); Vector2D forwardVel = velDifference.project(forwardVec, forwardMag); //calculate super fake friction forces //calculate response force Vector2D responseForce = Vector2D.scalarMultiply(Vector2D.negative(sideVel), 2.0f); responseForce = Vector2D.subtract(responseForce, forwardVel); float topSpeed = 500.0f; //calculate torque on wheel wheelTorque += forwardMag * wheelRadius; //integrate total torque into wheel wheelSpeed += wheelTorque / wheelInertia * timeStep; //top speed limit (kind of a hack) if(wheelSpeed > topSpeed) { wheelSpeed = topSpeed; } //clear our transmission torque accumulator wheelTorque = 0; //return force acting on body return responseForce; } public void setTransmissionTorque(float newValue) { wheelTorque = newValue; } public float getTransmissionTourque() { return wheelTorque; } public void setWheelSpeed(float speed) { wheelSpeed = speed; } } //our vehicle object public class Vehicle extends RigidBody { private Wheel [] wheels = new Wheel[4]; private boolean throttled = false; public void initialize(Vector2D halfSize, float mass, Bitmap bitmap) { //front wheels wheels[0] = new Wheel(new Vector2D(halfSize.x, halfSize.y), 0.45f); wheels[1] = new Wheel(new Vector2D(-halfSize.x, halfSize.y), 0.45f); //rear wheels wheels[2] = new Wheel(new Vector2D(halfSize.x, -halfSize.y), 0.75f); wheels[3] = new Wheel(new Vector2D(-halfSize.x, -halfSize.y), 0.75f); super.initialize(halfSize, mass, bitmap); } public void setSteering(float steering) { float steeringLock = 0.13f; //apply steering angle to front wheels wheels[0].setSteeringAngle(steering * steeringLock); wheels[1].setSteeringAngle(steering * steeringLock); } public void setThrottle(float throttle, boolean allWheel) { float torque = 85.0f; throttled = true; //apply transmission torque to back wheels if (allWheel) { wheels[0].addTransmissionTorque(throttle * torque); wheels[1].addTransmissionTorque(throttle * torque); } wheels[2].addTransmissionTorque(throttle * torque); wheels[3].addTransmissionTorque(throttle * torque); } public void setBrakes(float brakes) { float brakeTorque = 15.0f; //apply brake torque opposing wheel vel for (Wheel wheel : wheels) { float wheelVel = wheel.getWheelSpeed(); wheel.addTransmissionTorque(-wheelVel * brakeTorque * brakes); } } public void doUpdate(float timeStep, boolean prediction) { for (Wheel wheel : wheels) { float wheelVel = wheel.getWheelSpeed(); //apply negative force to naturally slow down car if(!throttled && !prediction) wheel.addTransmissionTorque(-wheelVel * 0.11f); Vector2D worldWheelOffset = relativeToWorld(wheel.getAnchorPoint()); Vector2D worldGroundVel = pointVelocity(worldWheelOffset); Vector2D relativeGroundSpeed = worldToRelative(worldGroundVel); Vector2D relativeResponseForce = wheel.calculateForce(relativeGroundSpeed, timeStep,prediction); Vector2D worldResponseForce = relativeToWorld(relativeResponseForce); applyForce(worldResponseForce, worldWheelOffset); } //no throttling yet this frame throttled = false; if(prediction) { super.updatePrediction(timeStep); } else { super.update(timeStep); } } @Override public void update(float timeStep) { doUpdate(timeStep,false); } public void updatePrediction(float timeStep) { doUpdate(timeStep,true); } public void inverseThrottle() { float scalar = 0.2f; for(Wheel wheel : wheels) { wheel.setTransmissionTorque(-wheel.getTransmissionTourque() * scalar); wheel.setWheelSpeed(-wheel.getWheelSpeed() * 0.1f); } } } And my big hack collision resolution: private void update() { camera.setPosition((vehicle.getPosition().x * camera.getScale()) - ((getWidth() ) / 2.0f), (vehicle.getPosition().y * camera.getScale()) - ((getHeight() ) / 2.0f)); //camera.move(input.getAnalogStick().getStickValueX() * 15.0f, input.getAnalogStick().getStickValueY() * 15.0f); if(input.isPressed(ControlButton.BUTTON_GAS)) { vehicle.setThrottle(1.0f, false); } if(input.isPressed(ControlButton.BUTTON_STEAL_CAR)) { vehicle.setThrottle(-1.0f, false); } if(input.isPressed(ControlButton.BUTTON_BRAKE)) { vehicle.setBrakes(1.0f); } vehicle.setSteering(input.getAnalogStick().getStickValueX()); //vehicle.update(16.6666666f / 1000.0f); boolean colided = false; vehicle.updatePrediction(16.66666f / 1000.0f); List<Entity> buildings = world.queryStaticSolid(vehicle,vehicle.getPredictionRect()); if(buildings.size() > 0) { colided = true; } if(!colided) { vehicle.update(16.66f / 1000.0f); } else { Vector2D delta = vehicle.getDeltaVec(); vehicle.setVelocity(Vector2D.negative(vehicle.getVelocity().multiply(0.2f)). add(delta.multiply(-1.0f))); vehicle.inverseThrottle(); } } Here is OBB public class OBB2D { // Corners of the box, where 0 is the lower left. private Vector2D corner[] = new Vector2D[4]; private Vector2D center = new Vector2D(); private Vector2D extents = new Vector2D(); private RectF boundingRect = new RectF(); private float angle; //Two edges of the box extended away from corner[0]. private Vector2D axis[] = new Vector2D[2]; private double origin[] = new double[2]; public OBB2D(Vector2D center, float w, float h, float angle) { set(center,w,h,angle); } public OBB2D(float left, float top, float width, float height) { set(new Vector2D(left + (width / 2), top + (height / 2)),width,height,0.0f); } public void set(Vector2D center,float w, float h,float angle) { Vector2D X = new Vector2D( (float)Math.cos(angle), (float)Math.sin(angle)); Vector2D Y = new Vector2D((float)-Math.sin(angle), (float)Math.cos(angle)); X = X.multiply( w / 2); Y = Y.multiply( h / 2); corner[0] = center.subtract(X).subtract(Y); corner[1] = center.add(X).subtract(Y); corner[2] = center.add(X).add(Y); corner[3] = center.subtract(X).add(Y); computeAxes(); extents.x = w / 2; extents.y = h / 2; computeDimensions(center,angle); } private void computeDimensions(Vector2D center,float angle) { this.center.x = center.x; this.center.y = center.y; this.angle = angle; boundingRect.left = Math.min(Math.min(corner[0].x, corner[3].x), Math.min(corner[1].x, corner[2].x)); boundingRect.top = Math.min(Math.min(corner[0].y, corner[1].y),Math.min(corner[2].y, corner[3].y)); boundingRect.right = Math.max(Math.max(corner[1].x, corner[2].x), Math.max(corner[0].x, corner[3].x)); boundingRect.bottom = Math.max(Math.max(corner[2].y, corner[3].y),Math.max(corner[0].y, corner[1].y)); } public void set(RectF rect) { set(new Vector2D(rect.centerX(),rect.centerY()),rect.width(),rect.height(),0.0f); } // Returns true if other overlaps one dimension of this. private boolean overlaps1Way(OBB2D other) { for (int a = 0; a < axis.length; ++a) { double t = other.corner[0].dot(axis[a]); // Find the extent of box 2 on axis a double tMin = t; double tMax = t; for (int c = 1; c < corner.length; ++c) { t = other.corner[c].dot(axis[a]); if (t < tMin) { tMin = t; } else if (t > tMax) { tMax = t; } } // We have to subtract off the origin // See if [tMin, tMax] intersects [0, 1] if ((tMin > 1 + origin[a]) || (tMax < origin[a])) { // There was no intersection along this dimension; // the boxes cannot possibly overlap. return false; } } // There was no dimension along which there is no intersection. // Therefore the boxes overlap. return true; } //Updates the axes after the corners move. Assumes the //corners actually form a rectangle. private void computeAxes() { axis[0] = corner[1].subtract(corner[0]); axis[1] = corner[3].subtract(corner[0]); // Make the length of each axis 1/edge length so we know any // dot product must be less than 1 to fall within the edge. for (int a = 0; a < axis.length; ++a) { axis[a] = axis[a].divide((axis[a].length() * axis[a].length())); origin[a] = corner[0].dot(axis[a]); } } public void moveTo(Vector2D center) { Vector2D centroid = (corner[0].add(corner[1]).add(corner[2]).add(corner[3])).divide(4.0f); Vector2D translation = center.subtract(centroid); for (int c = 0; c < 4; ++c) { corner[c] = corner[c].add(translation); } computeAxes(); computeDimensions(center,angle); } // Returns true if the intersection of the boxes is non-empty. public boolean overlaps(OBB2D other) { if(right() < other.left()) { return false; } if(bottom() < other.top()) { return false; } if(left() > other.right()) { return false; } if(top() > other.bottom()) { return false; } if(other.getAngle() == 0.0f && getAngle() == 0.0f) { return true; } return overlaps1Way(other) && other.overlaps1Way(this); } public Vector2D getCenter() { return center; } public float getWidth() { return extents.x * 2; } public float getHeight() { return extents.y * 2; } public void setAngle(float angle) { set(center,getWidth(),getHeight(),angle); } public float getAngle() { return angle; } public void setSize(float w,float h) { set(center,w,h,angle); } public float left() { return boundingRect.left; } public float right() { return boundingRect.right; } public float bottom() { return boundingRect.bottom; } public float top() { return boundingRect.top; } public RectF getBoundingRect() { return boundingRect; } public boolean overlaps(float left, float top, float right, float bottom) { if(right() < left) { return false; } if(bottom() < top) { return false; } if(left() > right) { return false; } if(top() > bottom) { return false; } return true; } }; What I do is when I predict a hit on the car, I force it back. It does not work that well and seems like a bad idea. What could I do to have more proper collision resolution. Such that if I hit a wall I will never get stuck in it and if I hit the side of a wall I can steer my way out of it. Thanks I found this nice ppt. It talks about pulling objects apart and calculating new velocities. How could I calc new velocities in my case? http://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CC8QFjAB&url=http%3A%2F%2Fcoitweb.uncc.edu%2F~tbarnes2%2FGameDesignFall05%2FSlides%2FCh4.2-CollDet.ppt&ei=x4ucULy5M6-N0QGRy4D4Cg&usg=AFQjCNG7FVDXWRdLv8_-T5qnFyYld53cTQ&cad=rja

    Read the article

  • C#/.NET Little Wonders: The ConcurrentDictionary

    - by James Michael Hare
    Once again we consider some of the lesser known classes and keywords of C#.  In this series of posts, we will discuss how the concurrent collections have been developed to help alleviate these multi-threading concerns.  Last week’s post began with a general introduction and discussed the ConcurrentStack<T> and ConcurrentQueue<T>.  Today's post discusses the ConcurrentDictionary<T> (originally I had intended to discuss ConcurrentBag this week as well, but ConcurrentDictionary had enough information to create a very full post on its own!).  Finally next week, we shall close with a discussion of the ConcurrentBag<T> and BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. Recap As you'll recall from the previous post, the original collections were object-based containers that accomplished synchronization through a Synchronized member.  While these were convenient because you didn't have to worry about writing your own synchronization logic, they were a bit too finely grained and if you needed to perform multiple operations under one lock, the automatic synchronization didn't buy much. With the advent of .NET 2.0, the original collections were succeeded by the generic collections which are fully type-safe, but eschew automatic synchronization.  This cuts both ways in that you have a lot more control as a developer over when and how fine-grained you want to synchronize, but on the other hand if you just want simple synchronization it creates more work. With .NET 4.0, we get the best of both worlds in generic collections.  A new breed of collections was born called the concurrent collections in the System.Collections.Concurrent namespace.  These amazing collections are fine-tuned to have best overall performance for situations requiring concurrent access.  They are not meant to replace the generic collections, but to simply be an alternative to creating your own locking mechanisms. Among those concurrent collections were the ConcurrentStack<T> and ConcurrentQueue<T> which provide classic LIFO and FIFO collections with a concurrent twist.  As we saw, some of the traditional methods that required calls to be made in a certain order (like checking for not IsEmpty before calling Pop()) were replaced in favor of an umbrella operation that combined both under one lock (like TryPop()). Now, let's take a look at the next in our series of concurrent collections!For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. ConcurrentDictionary – the fully thread-safe dictionary The ConcurrentDictionary<TKey,TValue> is the thread-safe counterpart to the generic Dictionary<TKey, TValue> collection.  Obviously, both are designed for quick – O(1) – lookups of data based on a key.  If you think of algorithms where you need lightning fast lookups of data and don’t care whether the data is maintained in any particular ordering or not, the unsorted dictionaries are generally the best way to go. Note: as a side note, there are sorted implementations of IDictionary, namely SortedDictionary and SortedList which are stored as an ordered tree and a ordered list respectively.  While these are not as fast as the non-sorted dictionaries – they are O(log2 n) – they are a great combination of both speed and ordering -- and still greatly outperform a linear search. Now, once again keep in mind that if all you need to do is load a collection once and then allow multi-threaded reading you do not need any locking.  Examples of this tend to be situations where you load a lookup or translation table once at program start, then keep it in memory for read-only reference.  In such cases locking is completely non-productive. However, most of the time when we need a concurrent dictionary we are interleaving both reads and updates.  This is where the ConcurrentDictionary really shines!  It achieves its thread-safety with no common lock to improve efficiency.  It actually uses a series of locks to provide concurrent updates, and has lockless reads!  This means that the ConcurrentDictionary gets even more efficient the higher the ratio of reads-to-writes you have. ConcurrentDictionary and Dictionary differences For the most part, the ConcurrentDictionary<TKey,TValue> behaves like it’s Dictionary<TKey,TValue> counterpart with a few differences.  Some notable examples of which are: Add() does not exist in the concurrent dictionary. This means you must use TryAdd(), AddOrUpdate(), or GetOrAdd().  It also means that you can’t use a collection initializer with the concurrent dictionary. TryAdd() replaced Add() to attempt atomic, safe adds. Because Add() only succeeds if the item doesn’t already exist, we need an atomic operation to check if the item exists, and if not add it while still under an atomic lock. TryUpdate() was added to attempt atomic, safe updates. If we want to update an item, we must make sure it exists first and that the original value is what we expected it to be.  If all these are true, we can update the item under one atomic step. TryRemove() was added to attempt atomic, safe removes. To safely attempt to remove a value we need to see if the key exists first, this checks for existence and removes under an atomic lock. AddOrUpdate() was added to attempt an thread-safe “upsert”. There are many times where you want to insert into a dictionary if the key doesn’t exist, or update the value if it does.  This allows you to make a thread-safe add-or-update. GetOrAdd() was added to attempt an thread-safe query/insert. Sometimes, you want to query for whether an item exists in the cache, and if it doesn’t insert a starting value for it.  This allows you to get the value if it exists and insert if not. Count, Keys, Values properties take a snapshot of the dictionary. Accessing these properties may interfere with add and update performance and should be used with caution. ToArray() returns a static snapshot of the dictionary. That is, the dictionary is locked, and then copied to an array as a O(n) operation.  GetEnumerator() is thread-safe and efficient, but allows dirty reads. Because reads require no locking, you can safely iterate over the contents of the dictionary.  The only downside is that, depending on timing, you may get dirty reads. Dirty reads during iteration The last point on GetEnumerator() bears some explanation.  Picture a scenario in which you call GetEnumerator() (or iterate using a foreach, etc.) and then, during that iteration the dictionary gets updated.  This may not sound like a big deal, but it can lead to inconsistent results if used incorrectly.  The problem is that items you already iterated over that are updated a split second after don’t show the update, but items that you iterate over that were updated a split second before do show the update.  Thus you may get a combination of items that are “stale” because you iterated before the update, and “fresh” because they were updated after GetEnumerator() but before the iteration reached them. Let’s illustrate with an example, let’s say you load up a concurrent dictionary like this: 1: // load up a dictionary. 2: var dictionary = new ConcurrentDictionary<string, int>(); 3:  4: dictionary["A"] = 1; 5: dictionary["B"] = 2; 6: dictionary["C"] = 3; 7: dictionary["D"] = 4; 8: dictionary["E"] = 5; 9: dictionary["F"] = 6; Then you have one task (using the wonderful TPL!) to iterate using dirty reads: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); And one task to attempt updates in a separate thread (probably): 1: // attempt updates in a separate thread 2: var updateTask = new Task(() => 3: { 4: // iterates, and updates the value by one 5: foreach (var pair in dictionary) 6: { 7: dictionary[pair.Key] = pair.Value + 1; 8: } 9: }); Now that we’ve done this, we can fire up both tasks and wait for them to complete: 1: // start both tasks 2: updateTask.Start(); 3: iterationTask.Start(); 4:  5: // wait for both to complete. 6: Task.WaitAll(updateTask, iterationTask); Now, if I you didn’t know about the dirty reads, you may have expected to see the iteration before the updates (such as A:1, B:2, C:3, D:4, E:5, F:6).  However, because the reads are dirty, we will quite possibly get a combination of some updated, some original.  My own run netted this result: 1: F:6 2: E:6 3: D:5 4: C:4 5: B:3 6: A:2 Note that, of course, iteration is not in order because ConcurrentDictionary, like Dictionary, is unordered.  Also note that both E and F show the value 6.  This is because the output task reached F before the update, but the updates for the rest of the items occurred before their output (probably because console output is very slow, comparatively). If we want to always guarantee that we will get a consistent snapshot to iterate over (that is, at the point we ask for it we see precisely what is in the dictionary and no subsequent updates during iteration), we should iterate over a call to ToArray() instead: 1: // attempt iteration in a separate thread 2: var iterationTask = new Task(() => 3: { 4: // iterates using a dirty read 5: foreach (var pair in dictionary.ToArray()) 6: { 7: Console.WriteLine(pair.Key + ":" + pair.Value); 8: } 9: }); The atomic Try…() methods As you can imagine TryAdd() and TryRemove() have few surprises.  Both first check the existence of the item to determine if it can be added or removed based on whether or not the key currently exists in the dictionary: 1: // try add attempts an add and returns false if it already exists 2: if (dictionary.TryAdd("G", 7)) 3: Console.WriteLine("G did not exist, now inserted with 7"); 4: else 5: Console.WriteLine("G already existed, insert failed."); TryRemove() also has the virtue of returning the value portion of the removed entry matching the given key: 1: // attempt to remove the value, if it exists it is removed and the original is returned 2: int removedValue; 3: if (dictionary.TryRemove("C", out removedValue)) 4: Console.WriteLine("Removed C and its value was " + removedValue); 5: else 6: Console.WriteLine("C did not exist, remove failed."); Now TryUpdate() is an interesting creature.  You might think from it’s name that TryUpdate() first checks for an item’s existence, and then updates if the item exists, otherwise it returns false.  Well, note quite... It turns out when you call TryUpdate() on a concurrent dictionary, you pass it not only the new value you want it to have, but also the value you expected it to have before the update.  If the item exists in the dictionary, and it has the value you expected, it will update it to the new value atomically and return true.  If the item is not in the dictionary or does not have the value you expected, it is not modified and false is returned. 1: // attempt to update the value, if it exists and if it has the expected original value 2: if (dictionary.TryUpdate("G", 42, 7)) 3: Console.WriteLine("G existed and was 7, now it's 42."); 4: else 5: Console.WriteLine("G either didn't exist, or wasn't 7."); The composite Add methods The ConcurrentDictionary also has composite add methods that can be used to perform updates and gets, with an add if the item is not existing at the time of the update or get. The first of these, AddOrUpdate(), allows you to add a new item to the dictionary if it doesn’t exist, or update the existing item if it does.  For example, let’s say you are creating a dictionary of counts of stock ticker symbols you’ve subscribed to from a market data feed: 1: public sealed class SubscriptionManager 2: { 3: private readonly ConcurrentDictionary<string, int> _subscriptions = new ConcurrentDictionary<string, int>(); 4:  5: // adds a new subscription, or increments the count of the existing one. 6: public void AddSubscription(string tickerKey) 7: { 8: // add a new subscription with count of 1, or update existing count by 1 if exists 9: var resultCount = _subscriptions.AddOrUpdate(tickerKey, 1, (symbol, count) => count + 1); 10:  11: // now check the result to see if we just incremented the count, or inserted first count 12: if (resultCount == 1) 13: { 14: // subscribe to symbol... 15: } 16: } 17: } Notice the update value factory Func delegate.  If the key does not exist in the dictionary, the add value is used (in this case 1 representing the first subscription for this symbol), but if the key already exists, it passes the key and current value to the update delegate which computes the new value to be stored in the dictionary.  The return result of this operation is the value used (in our case: 1 if added, existing value + 1 if updated). Likewise, the GetOrAdd() allows you to attempt to retrieve a value from the dictionary, and if the value does not currently exist in the dictionary it will insert a value.  This can be handy in cases where perhaps you wish to cache data, and thus you would query the cache to see if the item exists, and if it doesn’t you would put the item into the cache for the first time: 1: public sealed class PriceCache 2: { 3: private readonly ConcurrentDictionary<string, double> _cache = new ConcurrentDictionary<string, double>(); 4:  5: // adds a new subscription, or increments the count of the existing one. 6: public double QueryPrice(string tickerKey) 7: { 8: // check for the price in the cache, if it doesn't exist it will call the delegate to create value. 9: return _cache.GetOrAdd(tickerKey, symbol => GetCurrentPrice(symbol)); 10: } 11:  12: private double GetCurrentPrice(string tickerKey) 13: { 14: // do code to calculate actual true price. 15: } 16: } There are other variations of these two methods which vary whether a value is provided or a factory delegate, but otherwise they work much the same. Oddities with the composite Add methods The AddOrUpdate() and GetOrAdd() methods are totally thread-safe, on this you may rely, but they are not atomic.  It is important to note that the methods that use delegates execute those delegates outside of the lock.  This was done intentionally so that a user delegate (of which the ConcurrentDictionary has no control of course) does not take too long and lock out other threads. This is not necessarily an issue, per se, but it is something you must consider in your design.  The main thing to consider is that your delegate may get called to generate an item, but that item may not be the one returned!  Consider this scenario: A calls GetOrAdd and sees that the key does not currently exist, so it calls the delegate.  Now thread B also calls GetOrAdd and also sees that the key does not currently exist, and for whatever reason in this race condition it’s delegate completes first and it adds its new value to the dictionary.  Now A is done and goes to get the lock, and now sees that the item now exists.  In this case even though it called the delegate to create the item, it will pitch it because an item arrived between the time it attempted to create one and it attempted to add it. Let’s illustrate, assume this totally contrived example program which has a dictionary of char to int.  And in this dictionary we want to store a char and it’s ordinal (that is, A = 1, B = 2, etc).  So for our value generator, we will simply increment the previous value in a thread-safe way (perhaps using Interlocked): 1: public static class Program 2: { 3: private static int _nextNumber = 0; 4:  5: // the holder of the char to ordinal 6: private static ConcurrentDictionary<char, int> _dictionary 7: = new ConcurrentDictionary<char, int>(); 8:  9: // get the next id value 10: public static int NextId 11: { 12: get { return Interlocked.Increment(ref _nextNumber); } 13: } Then, we add a method that will perform our insert: 1: public static void Inserter() 2: { 3: for (int i = 0; i < 26; i++) 4: { 5: _dictionary.GetOrAdd((char)('A' + i), key => NextId); 6: } 7: } Finally, we run our test by starting two tasks to do this work and get the results… 1: public static void Main() 2: { 3: // 3 tasks attempting to get/insert 4: var tasks = new List<Task> 5: { 6: new Task(Inserter), 7: new Task(Inserter) 8: }; 9:  10: tasks.ForEach(t => t.Start()); 11: Task.WaitAll(tasks.ToArray()); 12:  13: foreach (var pair in _dictionary.OrderBy(p => p.Key)) 14: { 15: Console.WriteLine(pair.Key + ":" + pair.Value); 16: } 17: } If you run this with only one task, you get the expected A:1, B:2, ..., Z:26.  But running this in parallel you will get something a bit more complex.  My run netted these results: 1: A:1 2: B:3 3: C:4 4: D:5 5: E:6 6: F:7 7: G:8 8: H:9 9: I:10 10: J:11 11: K:12 12: L:13 13: M:14 14: N:15 15: O:16 16: P:17 17: Q:18 18: R:19 19: S:20 20: T:21 21: U:22 22: V:23 23: W:24 24: X:25 25: Y:26 26: Z:27 Notice that B is 3?  This is most likely because both threads attempted to call GetOrAdd() at roughly the same time and both saw that B did not exist, thus they both called the generator and one thread got back 2 and the other got back 3.  However, only one of those threads can get the lock at a time for the actual insert, and thus the one that generated the 3 won and the 3 was inserted and the 2 got discarded.  This is why on these methods your factory delegates should be careful not to have any logic that would be unsafe if the value they generate will be pitched in favor of another item generated at roughly the same time.  As such, it is probably a good idea to keep those generators as stateless as possible. Summary The ConcurrentDictionary is a very efficient and thread-safe version of the Dictionary generic collection.  It has all the benefits of type-safety that it’s generic collection counterpart does, and in addition is extremely efficient especially when there are more reads than writes concurrently. Tweet Technorati Tags: C#, .NET, Concurrent Collections, Collections, Little Wonders, Black Rabbit Coder,James Michael Hare

    Read the article

  • Much Ado About Nothing: Stub Objects

    - by user9154181
    The Solaris 11 link-editor (ld) contains support for a new type of object that we call a stub object. A stub object is a shared object, built entirely from mapfiles, that supplies the same linking interface as the real object, while containing no code or data. Stub objects cannot be executed — the runtime linker will kill any process that attempts to load one. However, you can link to a stub object as a dependency, allowing the stub to act as a proxy for the real version of the object. You may well wonder if there is a point to producing an object that contains nothing but linking interface. As it turns out, stub objects are very useful for building large bodies of code such as Solaris. In the last year, we've had considerable success in applying them to one of our oldest and thorniest build problems. In this discussion, I will describe how we came to invent these objects, and how we apply them to building Solaris. This posting explains where the idea for stub objects came from, and details our long and twisty journey from hallway idea to standard link-editor feature. I expect that these details are mainly of interest to those who work on Solaris and its makefiles, those who have done so in the past, and those who work with other similar bodies of code. A subsequent posting will omit the history and background details, and instead discuss how to build and use stub objects. If you are mainly interested in what stub objects are, and don't care about the underlying software war stories, I encourage you to skip ahead. The Long Road To Stubs This all started for me with an email discussion in May of 2008, regarding a change request that was filed in 2002, entitled: 4631488 lib/Makefile is too patient: .WAITs should be reduced This CR encapsulates a number of cronic issues with Solaris builds: We build Solaris with a parallel make (dmake) that tries to build as much of the code base in parallel as possible. There is a lot of code to build, and we've long made use of parallelized builds to get the job done quicker. This is even more important in today's world of massively multicore hardware. Solaris contains a large number of executables and shared objects. Executables depend on shared objects, and shared objects can depend on each other. Before you can build an object, you need to ensure that the objects it needs have been built. This implies a need for serialization, which is in direct opposition to the desire to build everying in parallel. To accurately build objects in the right order requires an accurate set of make rules defining the things that depend on each other. This sounds simple, but the reality is quite complex. In practice, having programmers explicitly specify these dependencies is a losing strategy: It's really hard to get right. It's really easy to get it wrong and never know it because things build anyway. Even if you get it right, it won't stay that way, because dependencies between objects can change over time, and make cannot help you detect such drifing. You won't know that you got it wrong until the builds break. That can be a long time after the change that triggered the breakage happened, making it hard to connect the cause and the effect. Usually this happens just before a release, when the pressure is on, its hard to think calmly, and there is no time for deep fixes. As a poor compromise, the libraries in core Solaris were built using a set of grossly incomplete hand written rules, supplemented with a number of dmake .WAIT directives used to group the libraries into sets of non-interacting groups that can be built in parallel because we think they don't depend on each other. From time to time, someone will suggest that we could analyze the built objects themselves to determine their dependencies and then generate make rules based on those relationships. This is possible, but but there are complications that limit the usefulness of that approach: To analyze an object, you have to build it first. This is a classic chicken and egg scenario. You could analyze the results of a previous build, but then you're not necessarily going to get accurate rules for the current code. It should be possible to build the code without having a built workspace available. The analysis will take time, and remember that we're constantly trying to make builds faster, not slower. By definition, such an approach will always be approximate, and therefore only incremantally more accurate than the hand written rules described above. The hand written rules are fast and cheap, while this idea is slow and complex, so we stayed with the hand written approach. Solaris was built that way, essentially forever, because these are genuinely difficult problems that had no easy answer. The makefiles were full of build races in which the right outcomes happened reliably for years until a new machine or a change in build server workload upset the accidental balance of things. After figuring out what had happened, you'd mutter "How did that ever work?", add another incomplete and soon to be inaccurate make dependency rule to the system, and move on. This was not a satisfying solution, as we tend to be perfectionists in the Solaris group, but we didn't have a better answer. It worked well enough, approximately. And so it went for years. We needed a different approach — a new idea to cut the Gordian Knot. In that discussion from May 2008, my fellow linker-alien Rod Evans had the initial spark that lead us to a game changing series of realizations: The link-editor is used to link objects together, but it only uses the ELF metadata in the object, consisting of symbol tables, ELF versioning sections, and similar data. Notably, it does not look at, or understand, the machine code that makes an object useful at runtime. If you had an object that only contained the ELF metadata for a dependency, but not the code or data, the link-editor would find it equally useful for linking, and would never know the difference. Call it a stub object. In the core Solaris OS, we require all objects to be built with a link-editor mapfile that describes all of its publically available functions and data. Could we build a stub object using the mapfile for the real object? It ought to be very fast to build stub objects, as there are no input objects to process. Unlike the real object, stub objects would not actually require any dependencies, and so, all of the stubs for the entire system could be built in parallel. When building the real objects, one could link against the stub objects instead of the real dependencies. This means that all the real objects can be built built in parallel too, without any serialization. We could replace a system that requires perfect makefile rules with a system that requires no ordering rules whatsoever. The results would be considerably more robust. We immediately realized that this idea had potential, but also that there were many details to sort out, lots of work to do, and that perhaps it wouldn't really pan out. As is often the case, it would be necessary to do the work and see how it turned out. Following that conversation, I set about trying to build a stub object. We determined that a faithful stub has to do the following: Present the same set of global symbols, with the same ELF versioning, as the real object. Functions are simple — it suffices to have a symbol of the right type, possibly, but not necessarily, referencing a null function in its text segment. Copy relocations make data more complicated to stub. The possibility of a copy relocation means that when you create a stub, the data symbols must have the actual size of the real data. Any error in this will go uncaught at link time, and will cause tragic failures at runtime that are very hard to diagnose. For reasons too obscure to go into here, involving tentative symbols, it is also important that the data reside in bss, or not, matching its placement in the real object. If the real object has more than one symbol pointing at the same data item, we call these aliased symbols. All data symbols in the stub object must exhibit the same aliasing as the real object. We imagined the stub library feature working as follows: A command line option to ld tells it to produce a stub rather than a real object. In this mode, only mapfiles are examined, and any object or shared libraries on the command line are are ignored. The extra information needed (function or data, size, and bss details) would be added to the mapfile. When building the real object instead of the stub, the extra information for building stubs would be validated against the resulting object to ensure that they match. In exploring these ideas, I immediately run headfirst into the reality of the original mapfile syntax, a subject that I would later write about as The Problem(s) With Solaris SVR4 Link-Editor Mapfiles. The idea of extending that poor language was a non-starter. Until a better mapfile syntax became available, which seemed unlikely in 2008, the solution could not involve extentions to the mapfile syntax. Instead, we cooked up the idea (hack) of augmenting mapfiles with stylized comments that would carry the necessary information. A typical definition might look like: # DATA(i386) __iob 0x3c0 # DATA(amd64,sparcv9) __iob 0xa00 # DATA(sparc) __iob 0x140 iob; A further problem then became clear: If we can't extend the mapfile syntax, then there's no good way to extend ld with an option to produce stub objects, and to validate them against the real objects. The idea of having ld read comments in a mapfile and parse them for content is an unacceptable hack. The entire point of comments is that they are strictly for the human reader, and explicitly ignored by the tool. Taking all of these speed bumps into account, I made a new plan: A perl script reads the mapfiles, generates some small C glue code to produce empty functions and data definitions, compiles and links the stub object from the generated glue code, and then deletes the generated glue code. Another perl script used after both objects have been built, to compare the real and stub objects, using data from elfdump, and validate that they present the same linking interface. By June 2008, I had written the above, and generated a stub object for libc. It was a useful prototype process to go through, and it allowed me to explore the ideas at a deep level. Ultimately though, the result was unsatisfactory as a basis for real product. There were so many issues: The use of stylized comments were fine for a prototype, but not close to professional enough for shipping product. The idea of having to document and support it was a large concern. The ideal solution for stub objects really does involve having the link-editor accept the same arguments used to build the real object, augmented with a single extra command line option. Any other solution, such as our prototype script, will require makefiles to be modified in deeper ways to support building stubs, and so, will raise barriers to converting existing code. A validation script that rederives what the linker knew when it built an object will always be at a disadvantage relative to the actual linker that did the work. A stub object should be identifyable as such. In the prototype, there was no tag or other metadata that would let you know that they weren't real objects. Being able to identify a stub object in this way means that the file command can tell you what it is, and that the runtime linker can refuse to try and run a program that loads one. At that point, we needed to apply this prototype to building Solaris. As you might imagine, the task of modifying all the makefiles in the core Solaris code base in order to do this is a massive task, and not something you'd enter into lightly. The quality of the prototype just wasn't good enough to justify that sort of time commitment, so I tabled the project, putting it on my list of long term things to think about, and moved on to other work. It would sit there for a couple of years. Semi-coincidentally, one of the projects I tacked after that was to create a new mapfile syntax for the Solaris link-editor. We had wanted to do something about the old mapfile syntax for many years. Others before me had done some paper designs, and a great deal of thought had already gone into the features it should, and should not have, but for various reasons things had never moved beyond the idea stage. When I joined Sun in late 2005, I got involved in reviewing those things and thinking about the problem. Now in 2008, fresh from relearning for the Nth time why the old mapfile syntax was a huge impediment to linker progress, it seemed like the right time to tackle the mapfile issue. Paving the way for proper stub object support was not the driving force behind that effort, but I certainly had them in mind as I moved forward. The new mapfile syntax, which we call version 2, integrated into Nevada build snv_135 in in February 2010: 6916788 ld version 2 mapfile syntax PSARC/2009/688 Human readable and extensible ld mapfile syntax In order to prove that the new mapfile syntax was adequate for general purpose use, I had also done an overhaul of the ON consolidation to convert all mapfiles to use the new syntax, and put checks in place that would ensure that no use of the old syntax would creep back in. That work went back into snv_144 in June 2010: 6916796 OSnet mapfiles should use version 2 link-editor syntax That was a big putback, modifying 517 files, adding 18 new files, and removing 110 old ones. I would have done this putback anyway, as the work was already done, and the benefits of human readable syntax are obvious. However, among the justifications listed in CR 6916796 was this We anticipate adding additional features to the new mapfile language that will be applicable to ON, and which will require all sharable object mapfiles to use the new syntax. I never explained what those additional features were, and no one asked. It was premature to say so, but this was a reference to stub objects. By that point, I had already put together a working prototype link-editor with the necessary support for stub objects. I was pleased to find that building stubs was indeed very fast. On my desktop system (Ultra 24), an amd64 stub for libc can can be built in a fraction of a second: % ptime ld -64 -z stub -o stubs/libc.so.1 -G -hlibc.so.1 \ -ztext -zdefs -Bdirect ... real 0.019708910 user 0.010101680 sys 0.008528431 In order to go from prototype to integrated link-editor feature, I knew that I would need to prove that stub objects were valuable. And to do that, I knew that I'd have to switch the Solaris ON consolidation to use stub objects and evaluate the outcome. And in order to do that experiment, ON would first need to be converted to version 2 mapfiles. Sub-mission accomplished. Normally when you design a new feature, you can devise reasonably small tests to show it works, and then deploy it incrementally, letting it prove its value as it goes. The entire point of stub objects however was to demonstrate that they could be successfully applied to an extremely large and complex code base, and specifically to solve the Solaris build issues detailed above. There was no way to finesse the matter — in order to move ahead, I would have to successfully use stub objects to build the entire ON consolidation and demonstrate their value. In software, the need to boil the ocean can often be a warning sign that things are trending in the wrong direction. Conversely, sometimes progress demands that you build something large and new all at once. A big win, or a big loss — sometimes all you can do is try it and see what happens. And so, I spent some time staring at ON makefiles trying to get a handle on how things work, and how they'd have to change. It's a big and messy world, full of complex interactions, unspecified dependencies, special cases, and knowledge of arcane makefile features... ...and so, I backed away, put it down for a few months and did other work... ...until the fall, when I felt like it was time to stop thinking and pondering (some would say stalling) and get on with it. Without stubs, the following gives a simplified high level view of how Solaris is built: An initially empty directory known as the proto, and referenced via the ROOT makefile macro is established to receive the files that make up the Solaris distribution. A top level setup rule creates the proto area, and performs operations needed to initialize the workspace so that the main build operations can be launched, such as copying needed header files into the proto area. Parallel builds are launched to build the kernel (usr/src/uts), libraries (usr/src/lib), and commands. The install makefile target builds each item and delivers a copy to the proto area. All libraries and executables link against the objects previously installed in the proto, implying the need to synchronize the order in which things are built. Subsequent passes run lint, and do packaging. Given this structure, the additions to use stub objects are: A new second proto area is established, known as the stub proto and referenced via the STUBROOT makefile macro. The stub proto has the same structure as the real proto, but is used to hold stub objects. All files in the real proto are delivered as part of the Solaris product. In contrast, the stub proto is used to build the product, and then thrown away. A new target is added to library Makefiles called stub. This rule builds the stub objects. The ld command is designed so that you can build a stub object using the same ld command line you'd use to build the real object, with the addition of a single -z stub option. This means that the makefile rules for building the stub objects are very similar to those used to build the real objects, and many existing makefile definitions can be shared between them. A new target is added to the Makefiles called stubinstall which delivers the stub objects built by the stub rule into the stub proto. These rules reuse much of existing plumbing used by the existing install rule. The setup rule runs stubinstall over the entire lib subtree as part of its initialization. All libraries and executables link against the objects in the stub proto rather than the main proto, and can therefore be built in parallel without any synchronization. There was no small way to try this that would yield meaningful results. I would have to take a leap of faith and edit approximately 1850 makefiles and 300 mapfiles first, trusting that it would all work out. Once the editing was done, I'd type make and see what happened. This took about 6 weeks to do, and there were many dark days when I'd question the entire project, or struggle to understand some of the many twisted and complex situations I'd uncover in the makefiles. I even found a couple of new issues that required changes to the new stub object related code I'd added to ld. With a substantial amount of encouragement and help from some key people in the Solaris group, I eventually got the editing done and stub objects for the entire workspace built. I found that my desktop system could build all the stub objects in the workspace in roughly a minute. This was great news, as it meant that use of the feature is effectively free — no one was likely to notice or care about the cost of building them. After another week of typing make, fixing whatever failed, and doing it again, I succeeded in getting a complete build! The next step was to remove all of the make rules and .WAIT statements dedicated to controlling the order in which libraries under usr/src/lib are built. This came together pretty quickly, and after a few more speed bumps, I had a workspace that built cleanly and looked like something you might actually be able to integrate someday. This was a significant milestone, but there was still much left to do. I turned to doing full nightly builds. Every type of build (open, closed, OpenSolaris, export, domestic) had to be tried. Each type failed in a new and unique way, requiring some thinking and rework. As things came together, I became aware of things that could have been done better, simpler, or cleaner, and those things also required some rethinking, the seeking of wisdom from others, and some rework. After another couple of weeks, it was in close to final form. My focus turned towards the end game and integration. This was a huge workspace, and needed to go back soon, before changes in the gate would made merging increasingly difficult. At this point, I knew that the stub objects had greatly simplified the makefile logic and uncovered a number of race conditions, some of which had been there for years. I assumed that the builds were faster too, so I did some builds intended to quantify the speedup in build time that resulted from this approach. It had never occurred to me that there might not be one. And so, I was very surprised to find that the wall clock build times for a stock ON workspace were essentially identical to the times for my stub library enabled version! This is why it is important to always measure, and not just to assume. One can tell from first principles, based on all those removed dependency rules in the library makefile, that the stub object version of ON gives dmake considerably more opportunities to overlap library construction. Some hypothesis were proposed, and shot down: Could we have disabled dmakes parallel feature? No, a quick check showed things being build in parallel. It was suggested that we might be I/O bound, and so, the threads would be mostly idle. That's a plausible explanation, but system stats didn't really support it. Plus, the timing between the stub and non-stub cases were just too suspiciously identical. Are our machines already handling as much parallelism as they are capable of, and unable to exploit these additional opportunities? Once again, we didn't see the evidence to back this up. Eventually, a more plausible and obvious reason emerged: We build the libraries and commands (usr/src/lib, usr/src/cmd) in parallel with the kernel (usr/src/uts). The kernel is the long leg in that race, and so, wall clock measurements of build time are essentially showing how long it takes to build uts. Although it would have been nice to post a huge speedup immediately, we can take solace in knowing that stub objects simplify the makefiles and reduce the possibility of race conditions. The next step in reducing build time should be to find ways to reduce or overlap the uts part of the builds. When that leg of the build becomes shorter, then the increased parallelism in the libs and commands will pay additional dividends. Until then, we'll just have to settle for simpler and more robust. And so, I integrated the link-editor support for creating stub objects into snv_153 (November 2010) with 6993877 ld should produce stub objects PSARC/2010/397 ELF Stub Objects followed by the work to convert the ON consolidation in snv_161 (February 2011) with 7009826 OSnet should use stub objects 4631488 lib/Makefile is too patient: .WAITs should be reduced This was a huge putback, with 2108 modified files, 8 new files, and 2 removed files. Due to the size, I was allowed a window after snv_160 closed in which to do the putback. It went pretty smoothly for something this big, a few more preexisting race conditions would be discovered and addressed over the next few weeks, and things have been quiet since then. Conclusions and Looking Forward Solaris has been built with stub objects since February. The fact that developers no longer specify the order in which libraries are built has been a big success, and we've eliminated an entire class of build error. That's not to say that there are no build races left in the ON makefiles, but we've taken a substantial bite out of the problem while generally simplifying and improving things. The introduction of a stub proto area has also opened some interesting new possibilities for other build improvements. As this article has become quite long, and as those uses do not involve stub objects, I will defer that discussion to a future article.

    Read the article

  • PTLQueue : a scalable bounded-capacity MPMC queue

    - by Dave
    Title: Fast concurrent MPMC queue -- I've used the following concurrent queue algorithm enough that it warrants a blog entry. I'll sketch out the design of a fast and scalable multiple-producer multiple-consumer (MPSC) concurrent queue called PTLQueue. The queue has bounded capacity and is implemented via a circular array. Bounded capacity can be a useful property if there's a mismatch between producer rates and consumer rates where an unbounded queue might otherwise result in excessive memory consumption by virtue of the container nodes that -- in some queue implementations -- are used to hold values. A bounded-capacity queue can provide flow control between components. Beware, however, that bounded collections can also result in resource deadlock if abused. The put() and take() operators are partial and wait for the collection to become non-full or non-empty, respectively. Put() and take() do not allocate memory, and are not vulnerable to the ABA pathologies. The PTLQueue algorithm can be implemented equally well in C/C++ and Java. Partial operators are often more convenient than total methods. In many use cases if the preconditions aren't met, there's nothing else useful the thread can do, so it may as well wait via a partial method. An exception is in the case of work-stealing queues where a thief might scan a set of queues from which it could potentially steal. Total methods return ASAP with a success-failure indication. (It's tempting to describe a queue or API as blocking or non-blocking instead of partial or total, but non-blocking is already an overloaded concurrency term. Perhaps waiting/non-waiting or patient/impatient might be better terms). It's also trivial to construct partial operators by busy-waiting via total operators, but such constructs may be less efficient than an operator explicitly and intentionally designed to wait. A PTLQueue instance contains an array of slots, where each slot has volatile Turn and MailBox fields. The array has power-of-two length allowing mod/div operations to be replaced by masking. We assume sensible padding and alignment to reduce the impact of false sharing. (On x86 I recommend 128-byte alignment and padding because of the adjacent-sector prefetch facility). Each queue also has PutCursor and TakeCursor cursor variables, each of which should be sequestered as the sole occupant of a cache line or sector. You can opt to use 64-bit integers if concerned about wrap-around aliasing in the cursor variables. Put(null) is considered illegal, but the caller or implementation can easily check for and convert null to a distinguished non-null proxy value if null happens to be a value you'd like to pass. Take() will accordingly convert the proxy value back to null. An advantage of PTLQueue is that you can use atomic fetch-and-increment for the partial methods. We initialize each slot at index I with (Turn=I, MailBox=null). Both cursors are initially 0. All shared variables are considered "volatile" and atomics such as CAS and AtomicFetchAndIncrement are presumed to have bidirectional fence semantics. Finally T is the templated type. I've sketched out a total tryTake() method below that allows the caller to poll the queue. tryPut() has an analogous construction. Zebra stripping : alternating row colors for nice-looking code listings. See also google code "prettify" : https://code.google.com/p/google-code-prettify/ Prettify is a javascript module that yields the HTML/CSS/JS equivalent of pretty-print. -- pre:nth-child(odd) { background-color:#ff0000; } pre:nth-child(even) { background-color:#0000ff; } border-left: 11px solid #ccc; margin: 1.7em 0 1.7em 0.3em; background-color:#BFB; font-size:12px; line-height:65%; " // PTLQueue : Put(v) : // producer : partial method - waits as necessary assert v != null assert Mask = 1 && (Mask & (Mask+1)) == 0 // Document invariants // doorway step // Obtain a sequence number -- ticket // As a practical concern the ticket value is temporally unique // The ticket also identifies and selects a slot auto tkt = AtomicFetchIncrement (&PutCursor, 1) slot * s = &Slots[tkt & Mask] // waiting phase : // wait for slot's generation to match the tkt value assigned to this put() invocation. // The "generation" is implicitly encoded as the upper bits in the cursor // above those used to specify the index : tkt div (Mask+1) // The generation serves as an epoch number to identify a cohort of threads // accessing disjoint slots while s-Turn != tkt : Pause assert s-MailBox == null s-MailBox = v // deposit and pass message Take() : // consumer : partial method - waits as necessary auto tkt = AtomicFetchIncrement (&TakeCursor,1) slot * s = &Slots[tkt & Mask] // 2-stage waiting : // First wait for turn for our generation // Acquire exclusive "take" access to slot's MailBox field // Then wait for the slot to become occupied while s-Turn != tkt : Pause // Concurrency in this section of code is now reduced to just 1 producer thread // vs 1 consumer thread. // For a given queue and slot, there will be most one Take() operation running // in this section. // Consumer waits for producer to arrive and make slot non-empty // Extract message; clear mailbox; advance Turn indicator // We have an obvious happens-before relation : // Put(m) happens-before corresponding Take() that returns that same "m" for T v = s-MailBox if v != null : s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 // unlock slot to admit next producer and consumer return v Pause tryTake() : // total method - returns ASAP with failure indication for auto tkt = TakeCursor slot * s = &Slots[tkt & Mask] if s-Turn != tkt : return null T v = s-MailBox // presumptive return value if v == null : return null // ratify tkt and v values and commit by advancing cursor if CAS (&TakeCursor, tkt, tkt+1) != tkt : continue s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 return v The basic idea derives from the Partitioned Ticket Lock "PTL" (US20120240126-A1) and the MultiLane Concurrent Bag (US8689237). The latter is essentially a circular ring-buffer where the elements themselves are queues or concurrent collections. You can think of the PTLQueue as a partitioned ticket lock "PTL" augmented to pass values from lock to unlock via the slots. Alternatively, you could conceptualize of PTLQueue as a degenerate MultiLane bag where each slot or "lane" consists of a simple single-word MailBox instead of a general queue. Each lane in PTLQueue also has a private Turn field which acts like the Turn (Grant) variables found in PTL. Turn enforces strict FIFO ordering and restricts concurrency on the slot mailbox field to at most one simultaneous put() and take() operation. PTL uses a single "ticket" variable and per-slot Turn (grant) fields while MultiLane has distinct PutCursor and TakeCursor cursors and abstract per-slot sub-queues. Both PTL and MultiLane advance their cursor and ticket variables with atomic fetch-and-increment. PTLQueue borrows from both PTL and MultiLane and has distinct put and take cursors and per-slot Turn fields. Instead of a per-slot queues, PTLQueue uses a simple single-word MailBox field. PutCursor and TakeCursor act like a pair of ticket locks, conferring "put" and "take" access to a given slot. PutCursor, for instance, assigns an incoming put() request to a slot and serves as a PTL "Ticket" to acquire "put" permission to that slot's MailBox field. To better explain the operation of PTLQueue we deconstruct the operation of put() and take() as follows. Put() first increments PutCursor obtaining a new unique ticket. That ticket value also identifies a slot. Put() next waits for that slot's Turn field to match that ticket value. This is tantamount to using a PTL to acquire "put" permission on the slot's MailBox field. Finally, having obtained exclusive "put" permission on the slot, put() stores the message value into the slot's MailBox. Take() similarly advances TakeCursor, identifying a slot, and then acquires and secures "take" permission on a slot by waiting for Turn. Take() then waits for the slot's MailBox to become non-empty, extracts the message, and clears MailBox. Finally, take() advances the slot's Turn field, which releases both "put" and "take" access to the slot's MailBox. Note the asymmetry : put() acquires "put" access to the slot, but take() releases that lock. At any given time, for a given slot in a PTLQueue, at most one thread has "put" access and at most one thread has "take" access. This restricts concurrency from general MPMC to 1-vs-1. We have 2 ticket locks -- one for put() and one for take() -- each with its own "ticket" variable in the form of the corresponding cursor, but they share a single "Grant" egress variable in the form of the slot's Turn variable. Advancing the PutCursor, for instance, serves two purposes. First, we obtain a unique ticket which identifies a slot. Second, incrementing the cursor is the doorway protocol step to acquire the per-slot mutual exclusion "put" lock. The cursors and operations to increment those cursors serve double-duty : slot-selection and ticket assignment for locking the slot's MailBox field. At any given time a slot MailBox field can be in one of the following states: empty with no pending operations -- neutral state; empty with one or more waiting take() operations pending -- deficit; occupied with no pending operations; occupied with one or more waiting put() operations -- surplus; empty with a pending put() or pending put() and take() operations -- transitional; or occupied with a pending take() or pending put() and take() operations -- transitional. The partial put() and take() operators can be implemented with an atomic fetch-and-increment operation, which may confer a performance advantage over a CAS-based loop. In addition we have independent PutCursor and TakeCursor cursors. Critically, a put() operation modifies PutCursor but does not access the TakeCursor and a take() operation modifies the TakeCursor cursor but does not access the PutCursor. This acts to reduce coherence traffic relative to some other queue designs. It's worth noting that slow threads or obstruction in one slot (or "lane") does not impede or obstruct operations in other slots -- this gives us some degree of obstruction isolation. PTLQueue is not lock-free, however. The implementation above is expressed with polite busy-waiting (Pause) but it's trivial to implement per-slot parking and unparking to deschedule waiting threads. It's also easy to convert the queue to a more general deque by replacing the PutCursor and TakeCursor cursors with Left/Front and Right/Back cursors that can move either direction. Specifically, to push and pop from the "left" side of the deque we would decrement and increment the Left cursor, respectively, and to push and pop from the "right" side of the deque we would increment and decrement the Right cursor, respectively. We used a variation of PTLQueue for message passing in our recent OPODIS 2013 paper. ul { list-style:none; padding-left:0; padding:0; margin:0; margin-left:0; } ul#myTagID { padding: 0px; margin: 0px; list-style:none; margin-left:0;} -- -- There's quite a bit of related literature in this area. I'll call out a few relevant references: Wilson's NYU Courant Institute UltraComputer dissertation from 1988 is classic and the canonical starting point : Operating System Data Structures for Shared-Memory MIMD Machines with Fetch-and-Add. Regarding provenance and priority, I think PTLQueue or queues effectively equivalent to PTLQueue have been independently rediscovered a number of times. See CB-Queue and BNPBV, below, for instance. But Wilson's dissertation anticipates the basic idea and seems to predate all the others. Gottlieb et al : Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors Orozco et al : CB-Queue in Toward high-throughput algorithms on many-core architectures which appeared in TACO 2012. Meneghin et al : BNPVB family in Performance evaluation of inter-thread communication mechanisms on multicore/multithreaded architecture Dmitry Vyukov : bounded MPMC queue (highly recommended) Alex Otenko : US8607249 (highly related). John Mellor-Crummey : Concurrent queues: Practical fetch-and-phi algorithms. Technical Report 229, Department of Computer Science, University of Rochester Thomasson : FIFO Distributed Bakery Algorithm (very similar to PTLQueue). Scott and Scherer : Dual Data Structures I'll propose an optimization left as an exercise for the reader. Say we wanted to reduce memory usage by eliminating inter-slot padding. Such padding is usually "dark" memory and otherwise unused and wasted. But eliminating the padding leaves us at risk of increased false sharing. Furthermore lets say it was usually the case that the PutCursor and TakeCursor were numerically close to each other. (That's true in some use cases). We might still reduce false sharing by incrementing the cursors by some value other than 1 that is not trivially small and is coprime with the number of slots. Alternatively, we might increment the cursor by one and mask as usual, resulting in a logical index. We then use that logical index value to index into a permutation table, yielding an effective index for use in the slot array. The permutation table would be constructed so that nearby logical indices would map to more distant effective indices. (Open question: what should that permutation look like? Possibly some perversion of a Gray code or De Bruijn sequence might be suitable). As an aside, say we need to busy-wait for some condition as follows : "while C == 0 : Pause". Lets say that C is usually non-zero, so we typically don't wait. But when C happens to be 0 we'll have to spin for some period, possibly brief. We can arrange for the code to be more machine-friendly with respect to the branch predictors by transforming the loop into : "if C == 0 : for { Pause; if C != 0 : break; }". Critically, we want to restructure the loop so there's one branch that controls entry and another that controls loop exit. A concern is that your compiler or JIT might be clever enough to transform this back to "while C == 0 : Pause". You can sometimes avoid this by inserting a call to a some type of very cheap "opaque" method that the compiler can't elide or reorder. On Solaris, for instance, you could use :"if C == 0 : { gethrtime(); for { Pause; if C != 0 : break; }}". It's worth noting the obvious duality between locks and queues. If you have strict FIFO lock implementation with local spinning and succession by direct handoff such as MCS or CLH,then you can usually transform that lock into a queue. Hidden commentary and annotations - invisible : * And of course there's a well-known duality between queues and locks, but I'll leave that topic for another blog post. * Compare and contrast : PTLQ vs PTL and MultiLane * Equivalent : Turn; seq; sequence; pos; position; ticket * Put = Lock; Deposit Take = identify and reserve slot; wait; extract & clear; unlock * conceptualize : Distinct PutLock and TakeLock implemented as ticket lock or PTL Distinct arrival cursors but share per-slot "Turn" variable provides exclusive role-based access to slot's mailbox field put() acquires exclusive access to a slot for purposes of "deposit" assigns slot round-robin and then acquires deposit access rights/perms to that slot take() acquires exclusive access to slot for purposes of "withdrawal" assigns slot round-robin and then acquires withdrawal access rights/perms to that slot At any given time, only one thread can have withdrawal access to a slot at any given time, only one thread can have deposit access to a slot Permissible for T1 to have deposit access and T2 to simultaneously have withdrawal access * round-robin for the purposes of; role-based; access mode; access role mailslot; mailbox; allocate/assign/identify slot rights; permission; license; access permission; * PTL/Ticket hybrid Asymmetric usage ; owner oblivious lock-unlock pairing K-exclusion add Grant cursor pass message m from lock to unlock via Slots[] array Cursor performs 2 functions : + PTL ticket + Assigns request to slot in round-robin fashion Deconstruct protocol : explication put() : allocate slot in round-robin fashion acquire PTL for "put" access store message into slot associated with PTL index take() : Acquire PTL for "take" access // doorway step seq = fetchAdd (&Grant, 1) s = &Slots[seq & Mask] // waiting phase while s-Turn != seq : pause Extract : wait for s-mailbox to be full v = s-mailbox s-mailbox = null Release PTL for both "put" and "take" access s-Turn = seq + Mask + 1 * Slot round-robin assignment and lock "doorway" protocol leverage the same cursor and FetchAdd operation on that cursor FetchAdd (&Cursor,1) + round-robin slot assignment and dispersal + PTL/ticket lock "doorway" step waiting phase is via "Turn" field in slot * PTLQueue uses 2 cursors -- put and take. Acquire "put" access to slot via PTL-like lock Acquire "take" access to slot via PTL-like lock 2 locks : put and take -- at most one thread can access slot's mailbox Both locks use same "turn" field Like multilane : 2 cursors : put and take slot is simple 1-capacity mailbox instead of queue Borrow per-slot turn/grant from PTL Provides strict FIFO Lock slot : put-vs-put take-vs-take at most one put accesses slot at any one time at most one put accesses take at any one time reduction to 1-vs-1 instead of N-vs-M concurrency Per slot locks for put/take Release put/take by advancing turn * is instrumental in ... * P-V Semaphore vs lock vs K-exclusion * See also : FastQueues-excerpt.java dice-etc/queue-mpmc-bounded-blocking-circular-xadd/ * PTLQueue is the same as PTLQB - identical * Expedient return; ASAP; prompt; immediately * Lamport's Bakery algorithm : doorway step then waiting phase Threads arriving at doorway obtain a unique ticket number Threads enter in ticket order * In the terminology of Reed and Kanodia a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer It can also be thought of as an optimization of Lamport's bakery lock was designed for fault-tolerance rather than performance Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers --

    Read the article

  • Informed TDD &ndash; Kata &ldquo;To Roman Numerals&rdquo;

    - by Ralf Westphal
    Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/05/28/informed-tdd-ndash-kata-ldquoto-roman-numeralsrdquo.aspxIn a comment on my article on what I call Informed TDD (ITDD) reader gustav asked how this approach would apply to the kata “To Roman Numerals”. And whether ITDD wasn´t a violation of TDD´s principle of leaving out “advanced topics like mocks”. I like to respond with this article to his questions. There´s more to say than fits into a commentary. Mocks and TDD I don´t see in how far TDD is avoiding or opposed to mocks. TDD and mocks are orthogonal. TDD is about pocess, mocks are about structure and costs. Maybe by moving forward in tiny red+green+refactor steps less need arises for mocks. But then… if the functionality you need to implement requires “expensive” resource access you can´t avoid using mocks. Because you don´t want to constantly run all your tests against the real resource. True, in ITDD mocks seem to be in almost inflationary use. That´s not what you usually see in TDD demonstrations. However, there´s a reason for that as I tried to explain. I don´t use mocks as proxies for “expensive” resource. Rather they are stand-ins for functionality not yet implemented. They allow me to get a test green on a high level of abstraction. That way I can move forward in a top-down fashion. But if you think of mocks as “advanced” or if you don´t want to use a tool like JustMock, then you don´t need to use mocks. You just need to stand the sight of red tests for a little longer ;-) Let me show you what I mean by that by doing a kata. ITDD for “To Roman Numerals” gustav asked for the kata “To Roman Numerals”. I won´t explain the requirements again. You can find descriptions and TDD demonstrations all over the internet, like this one from Corey Haines. Now here is, how I would do this kata differently. 1. Analyse A demonstration of TDD should never skip the analysis phase. It should be made explicit. The requirements should be formalized and acceptance test cases should be compiled. “Formalization” in this case to me means describing the API of the required functionality. “[D]esign a program to work with Roman numerals” like written in this “requirement document” is not enough to start software development. Coding should only begin, if the interface between the “system under development” and its context is clear. If this interface is not readily recognizable from the requirements, it has to be developed first. Exploration of interface alternatives might be in order. It might be necessary to show several interface mock-ups to the customer – even if that´s you fellow developer. Designing the interface is a task of it´s own. It should not be mixed with implementing the required functionality behind the interface. Unfortunately, though, this happens quite often in TDD demonstrations. TDD is used to explore the API and implement it at the same time. To me that´s a violation of the Single Responsibility Principle (SRP) which not only should hold for software functional units but also for tasks or activities. In the case of this kata the API fortunately is obvious. Just one function is needed: string ToRoman(int arabic). And it lives in a class ArabicRomanConversions. Now what about acceptance test cases? There are hardly any stated in the kata descriptions. Roman numerals are explained, but no specific test cases from the point of view of a customer. So I just “invent” some acceptance test cases by picking roman numerals from a wikipedia article. They are supposed to be just “typical examples” without special meaning. Given the acceptance test cases I then try to develop an understanding of the problem domain. I´ll spare you that. The domain is trivial and is explain in almost all kata descriptions. How roman numerals are built is not difficult to understand. What´s more difficult, though, might be to find an efficient solution to convert into them automatically. 2. Solve The usual TDD demonstration skips a solution finding phase. Like the interface exploration it´s mixed in with the implementation. But I don´t think this is how it should be done. I even think this is not how it really works for the people demonstrating TDD. They´re simplifying their true software development process because they want to show a streamlined TDD process. I doubt this is helping anybody. Before you code you better have a plan what to code. This does not mean you have to do “Big Design Up-Front”. It just means: Have a clear picture of the logical solution in your head before you start to build a physical solution (code). Evidently such a solution can only be as good as your understanding of the problem. If that´s limited your solution will be limited, too. Fortunately, in the case of this kata your understanding does not need to be limited. Thus the logical solution does not need to be limited or preliminary or tentative. That does not mean you need to know every line of code in advance. It just means you know the rough structure of your implementation beforehand. Because it should mirror the process described by the logical or conceptual solution. Here´s my solution approach: The arabic “encoding” of numbers represents them as an ordered set of powers of 10. Each digit is a factor to multiply a power of ten with. The “encoding” 123 is the short form for a set like this: {1*10^2, 2*10^1, 3*10^0}. And the number is the sum of the set members. The roman “encoding” is different. There is no base (like 10 for arabic numbers), there are just digits of different value, and they have to be written in descending order. The “encoding” XVI is short for [10, 5, 1]. And the number is still the sum of the members of this list. The roman “encoding” thus is simpler than the arabic. Each “digit” can be taken at face value. No multiplication with a base required. But what about IV which looks like a contradiction to the above rule? It is not – if you accept roman “digits” not to be limited to be single characters only. Usually I, V, X, L, C, D, M are viewed as “digits”, and IV, IX etc. are viewed as nuisances preventing a simple solution. All looks different, though, once IV, IX etc. are taken as “digits”. Then MCMLIV is just a sum: M+CM+L+IV which is 1000+900+50+4. Whereas before it would have been understood as M-C+M+L-I+V – which is more difficult because here some “digits” get subtracted. Here´s the list of roman “digits” with their values: {1, I}, {4, IV}, {5, V}, {9, IX}, {10, X}, {40, XL}, {50, L}, {90, XC}, {100, C}, {400, CD}, {500, D}, {900, CM}, {1000, M} Since I take IV, IX etc. as “digits” translating an arabic number becomes trivial. I just need to find the values of the roman “digits” making up the number, e.g. 1954 is made up of 1000, 900, 50, and 4. I call those “digits” factors. If I move from the highest factor (M=1000) to the lowest (I=1) then translation is a two phase process: Find all the factors Translate the factors found Compile the roman representation Translation is just a look-up. Finding, though, needs some calculation: Find the highest remaining factor fitting in the value Remember and subtract it from the value Repeat with remaining value and remaining factors Please note: This is just an algorithm. It´s not code, even though it might be close. Being so close to code in my solution approach is due to the triviality of the problem. In more realistic examples the conceptual solution would be on a higher level of abstraction. With this solution in hand I finally can do what TDD advocates: find and prioritize test cases. As I can see from the small process description above, there are two aspects to test: Test the translation Test the compilation Test finding the factors Testing the translation primarily means to check if the map of factors and digits is comprehensive. That´s simple, even though it might be tedious. Testing the compilation is trivial. Testing factor finding, though, is a tad more complicated. I can think of several steps: First check, if an arabic number equal to a factor is processed correctly (e.g. 1000=M). Then check if an arabic number consisting of two consecutive factors (e.g. 1900=[M,CM]) is processed correctly. Then check, if a number consisting of the same factor twice is processed correctly (e.g. 2000=[M,M]). Finally check, if an arabic number consisting of non-consecutive factors (e.g. 1400=[M,CD]) is processed correctly. I feel I can start an implementation now. If something becomes more complicated than expected I can slow down and repeat this process. 3. Implement First I write a test for the acceptance test cases. It´s red because there´s no implementation even of the API. That´s in conformance with “TDD lore”, I´d say: Next I implement the API: The acceptance test now is formally correct, but still red of course. This will not change even now that I zoom in. Because my goal is not to most quickly satisfy these tests, but to implement my solution in a stepwise manner. That I do by “faking” it: I just “assume” three functions to represent the transformation process of my solution: My hypothesis is that those three functions in conjunction produce correct results on the API-level. I just have to implement them correctly. That´s what I´m trying now – one by one. I start with a simple “detail function”: Translate(). And I start with all the test cases in the obvious equivalence partition: As you can see I dare to test a private method. Yes. That´s a white box test. But as you´ll see it won´t make my tests brittle. It serves a purpose right here and now: it lets me focus on getting one aspect of my solution right. Here´s the implementation to satisfy the test: It´s as simple as possible. Right how TDD wants me to do it: KISS. Now for the second equivalence partition: translating multiple factors. (It´a pattern: if you need to do something repeatedly separate the tests for doing it once and doing it multiple times.) In this partition I just need a single test case, I guess. Stepping up from a single translation to multiple translations is no rocket science: Usually I would have implemented the final code right away. Splitting it in two steps is just for “educational purposes” here. How small your implementation steps are is a matter of your programming competency. Some “see” the final code right away before their mental eye – others need to work their way towards it. Having two tests I find more important. Now for the next low hanging fruit: compilation. It´s even simpler than translation. A single test is enough, I guess. And normally I would not even have bothered to write that one, because the implementation is so simple. I don´t need to test .NET framework functionality. But again: if it serves the educational purpose… Finally the most complicated part of the solution: finding the factors. There are several equivalence partitions. But still I decide to write just a single test, since the structure of the test data is the same for all partitions: Again, I´m faking the implementation first: I focus on just the first test case. No looping yet. Faking lets me stay on a high level of abstraction. I can write down the implementation of the solution without bothering myself with details of how to actually accomplish the feat. That´s left for a drill down with a test of the fake function: There are two main equivalence partitions, I guess: either the first factor is appropriate or some next. The implementation seems easy. Both test cases are green. (Of course this only works on the premise that there´s always a matching factor. Which is the case since the smallest factor is 1.) And the first of the equivalence partitions on the higher level also is satisfied: Great, I can move on. Now for more than a single factor: Interestingly not just one test becomes green now, but all of them. Great! You might say, then I must have done not the simplest thing possible. And I would reply: I don´t care. I did the most obvious thing. But I also find this loop very simple. Even simpler than a recursion of which I had thought briefly during the problem solving phase. And by the way: Also the acceptance tests went green: Mission accomplished. At least functionality wise. Now I´ve to tidy up things a bit. TDD calls for refactoring. Not uch refactoring is needed, because I wrote the code in top-down fashion. I faked it until I made it. I endured red tests on higher levels while lower levels weren´t perfected yet. But this way I saved myself from refactoring tediousness. At the end, though, some refactoring is required. But maybe in a different way than you would expect. That´s why I rather call it “cleanup”. First I remove duplication. There are two places where factors are defined: in Translate() and in Find_factors(). So I factor the map out into a class constant. Which leads to a small conversion in Find_factors(): And now for the big cleanup: I remove all tests of private methods. They are scaffolding tests to me. They only have temporary value. They are brittle. Only acceptance tests need to remain. However, I carry over the single “digit” tests from Translate() to the acceptance test. I find them valuable to keep, since the other acceptance tests only exercise a subset of all roman “digits”. This then is my final test class: And this is the final production code: Test coverage as reported by NCrunch is 100%: Reflexion Is this the smallest possible code base for this kata? Sure not. You´ll find more concise solutions on the internet. But LOC are of relatively little concern – as long as I can understand the code quickly. So called “elegant” code, however, often is not easy to understand. The same goes for KISS code – especially if left unrefactored, as it is often the case. That´s why I progressed from requirements to final code the way I did. I first understood and solved the problem on a conceptual level. Then I implemented it top down according to my design. I also could have implemented it bottom-up, since I knew some bottom of the solution. That´s the leaves of the functional decomposition tree. Where things became fuzzy, since the design did not cover any more details as with Find_factors(), I repeated the process in the small, so to speak: fake some top level, endure red high level tests, while first solving a simpler problem. Using scaffolding tests (to be thrown away at the end) brought two advantages: Encapsulation of the implementation details was not compromised. Naturally private methods could stay private. I did not need to make them internal or public just to be able to test them. I was able to write focused tests for small aspects of the solution. No need to test everything through the solution root, the API. The bottom line thus for me is: Informed TDD produces cleaner code in a systematic way. It conforms to core principles of programming: Single Responsibility Principle and/or Separation of Concerns. Distinct roles in development – being a researcher, being an engineer, being a craftsman – are represented as different phases. First find what, what there is. Then devise a solution. Then code the solution, manifest the solution in code. Writing tests first is a good practice. But it should not be taken dogmatic. And above all it should not be overloaded with purposes. And finally: moving from top to bottom through a design produces refactored code right away. Clean code thus almost is inevitable – and not left to a refactoring step at the end which is skipped often for different reasons.   PS: Yes, I have done this kata several times. But that has only an impact on the time needed for phases 1 and 2. I won´t skip them because of that. And there are no shortcuts during implementation because of that.

    Read the article

  • Metro apps crash on startup, driver or permissions issue?

    - by Vee
    After installing Win8 x64 RC, Metro apps worked correctly, but desktop OpenGL apps were slow and unresponsive. I installed the latest Win8 nVidia drivers, and the OpenGL apps started working correctly. At the same time, because of annoying permission messages, I changed the C:\ drive and all its files ownerships to my user, and gave it full permission. I restarted my pc after installing the drivers, and now Metro apps only show the splash screen, then crash. I tried installing other versions of the nVidia drivers, with the same result. My GPU is a GeForce GTX275. Is this a known problem with nVidia drivers? Or maybe changing the ownership of C:\ is the real problem? Thank you. More information (after looking in the event viewer) I've managed to find the problem and the error in the Event Viewer. I still cannot solve it. Here's the information I found by opening the Mail app and letting it crash: Log Name: Microsoft-Windows-TWinUI/Operational Source: Microsoft-Windows-Immersive-Shell Date: 07/06/2012 15.54.17 Event ID: 5961 Task Category: (5961) Level: Error Keywords: User: VEE-PC\Vittorio Computer: vee-pc Description: Activation of the app microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail for the Windows.Launch contract failed with error: The app didn't start.. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-Immersive-Shell" Guid="{315A8872-923E-4EA2-9889-33CD4754BF64}" /> <EventID>5961</EventID> <Version>0</Version> <Level>2</Level> <Task>5961</Task> <Opcode>0</Opcode> <Keywords>0x4000000000000000</Keywords> <TimeCreated SystemTime="2012-06-07T13:54:17.472416600Z" /> <EventRecordID>6524</EventRecordID> <Correlation /> <Execution ProcessID="3008" ThreadID="6756" /> <Channel>Microsoft-Windows-TWinUI/Operational</Channel> <Computer>vee-pc</Computer> <Security UserID="S-1-5-21-2753614643-3522538917-4071044258-1001" /> </System> <EventData> <Data Name="AppId">microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail</Data> <Data Name="ContractId">Windows.Launch</Data> <Data Name="ErrorCode">-2144927141</Data> </EventData> </Event> Found other stuff, this is another error that appears when opening a Metro app: Log Name: Application Source: ESENT Date: 07/06/2012 16.01.00 Event ID: 490 Task Category: General Level: Error Keywords: Classic User: N/A Computer: vee-pc Description: svchost (1376) SRUJet: An attempt to open the file "C:\Windows\system32\SRU\SRU.log" for read / write access failed with system error 5 (0x00000005): "Access is denied. ". The open file operation will fail with error -1032 (0xfffffbf8). Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="ESENT" /> <EventID Qualifiers="0">490</EventID> <Level>2</Level> <Task>1</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2012-06-07T14:01:00.000000000Z" /> <EventRecordID>11854</EventRecordID> <Channel>Application</Channel> <Computer>vee-pc</Computer> <Security /> </System> <EventData> <Data>svchost</Data> <Data>1376</Data> <Data>SRUJet: </Data> <Data>C:\Windows\system32\SRU\SRU.log</Data> <Data>-1032 (0xfffffbf8)</Data> <Data>5 (0x00000005)</Data> <Data>Access is denied. </Data> </EventData> </Event> After changing permissions again (adding Everyone and Creator Owner to System32), the "access denied to sru.log" error disappears, but this one appears in its place: Log Name: Application Source: Microsoft-Windows-Immersive-Shell Date: 07/06/2012 16.16.34 Event ID: 2486 Task Category: (2414) Level: Error Keywords: (64),Process Lifetime Manager User: VEE-PC\Vittorio Computer: vee-pc Description: App microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail did not launch within its allotted time. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-Immersive-Shell" Guid="{315A8872-923E-4EA2-9889-33CD4754BF64}" /> <EventID>2486</EventID> <Version>0</Version> <Level>2</Level> <Task>2414</Task> <Opcode>0</Opcode> <Keywords>0x2000000000000042</Keywords> <TimeCreated SystemTime="2012-06-07T14:16:34.616499600Z" /> <EventRecordID>11916</EventRecordID> <Correlation /> <Execution ProcessID="3008" ThreadID="6996" /> <Channel>Application</Channel> <Computer>vee-pc</Computer> <Security UserID="S-1-5-21-2753614643-3522538917-4071044258-1001" /> </System> <EventData> <Data Name="ApplicationId">microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail</Data> </EventData> </Event> Now I'm stuck. It tells me "Activation of app microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail failed with error: The app didn't start. See the Microsoft-Windows-TWinUI/Operational log for additional information." but I can't find the Microsoft-Windows-TWinUI/Operational log. I'm starting a bounty. I found the TWinUI/Operational log. It only tells me: Log Name: Microsoft-Windows-TWinUI/Operational Source: Microsoft-Windows-Immersive-Shell Date: 07/06/2012 16.28.57 Event ID: 5961 Task Category: (5961) Level: Error Keywords: User: VEE-PC\Vittorio Computer: vee-pc Description: Activation of the app microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail for the Windows.BackgroundTasks contract failed with error: The app didn't start.. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-Immersive-Shell" Guid="{315A8872-923E-4EA2-9889-33CD4754BF64}" /> <EventID>5961</EventID> <Version>0</Version> <Level>2</Level> <Task>5961</Task> <Opcode>0</Opcode> <Keywords>0x4000000000000000</Keywords> <TimeCreated SystemTime="2012-06-07T14:28:57.238140800Z" /> <EventRecordID>6536</EventRecordID> <Correlation /> <Execution ProcessID="3008" ThreadID="2624" /> <Channel>Microsoft-Windows-TWinUI/Operational</Channel> <Computer>vee-pc</Computer> <Security UserID="S-1-5-21-2753614643-3522538917-4071044258-1001" /> </System> <EventData> <Data Name="AppId">microsoft.windowscommunicationsapps_8wekyb3d8bbwe!Microsoft.WindowsLive.Mail</Data> <Data Name="ContractId">Windows.BackgroundTasks</Data> <Data Name="ErrorCode">-2144927141</Data> </EventData> </Event> I need to go deeper. I found a forum thread that told me to look for "DCOM" errors. I found this one related to the app crash "The server Microsoft.WindowsLive.Mail.wwa did not register with DCOM within the required timeout." Log Name: System Source: Microsoft-Windows-DistributedCOM Date: 07/06/2012 16.46.45 Event ID: 10010 Task Category: None Level: Error Keywords: Classic User: VEE-PC\Vittorio Computer: vee-pc Description: The server Microsoft.WindowsLive.Mail.wwa did not register with DCOM within the required timeout. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-DistributedCOM" Guid="{1B562E86-B7AA-4131-BADC-B6F3A001407E}" EventSourceName="DCOM" /> <EventID Qualifiers="0">10010</EventID> <Version>0</Version> <Level>2</Level> <Task>0</Task> <Opcode>0</Opcode> <Keywords>0x8080000000000000</Keywords> <TimeCreated SystemTime="2012-06-07T14:46:45.586943800Z" /> <EventRecordID>2763</EventRecordID> <Correlation /> <Execution ProcessID="804" ThreadID="2364" /> <Channel>System</Channel> <Computer>vee-pc</Computer> <Security UserID="S-1-5-21-2753614643-3522538917-4071044258-1001" /> </System> <EventData> <Data Name="param1">Microsoft.WindowsLive.Mail.wwa</Data> </EventData> </Event>

    Read the article

  • high load average, high wait, dmesg raid error messages (debian nfs server)

    - by John Stumbles
    Debian 6 on HP proliant (2 CPU) with raid (2*1.5T RAID1 + 2*2T RAID1 joined RAID0 to make 3.5T) running mainly nfs & imapd (plus samba for windows share & local www for previewing web pages); with local ubuntu desktop client mounting $HOME, laptops accessing imap & odd files (e.g. videos) via nfs/smb; boxes connected 100baseT or wifi via home router/switch uname -a Linux prole 2.6.32-5-686 #1 SMP Wed Jan 11 12:29:30 UTC 2012 i686 GNU/Linux Setup has been working for months but prone to intermittently going very slow (user experience on desktop mounting $HOME from server, or laptop playing videos) and now consistently so bad I've had to delve into it to try to find what's wrong(!) Server seems OK at low load e.g. (laptop) client (with $HOME on local disk) connecting to server's imapd and nfs mounting RAID to access 1 file: top shows load ~ 0.1 or less, 0 wait but when (desktop) client mounts $HOME and starts user KDE session (all accessing server) then top shows e.g. top - 13:41:17 up 3:43, 3 users, load average: 9.29, 9.55, 8.27 Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.4%sy, 0.0%ni, 49.0%id, 49.7%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 903856k total, 851784k used, 52072k free, 171152k buffers Swap: 0k total, 0k used, 0k free, 476896k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3935 root 20 0 2456 1088 784 R 2 0.1 0:00.02 top 1 root 20 0 2028 680 584 S 0 0.1 0:01.14 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.12 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 7 root 20 0 0 0 0 S 0 0.0 0:00.16 ksoftirqd/1 8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 9 root 20 0 0 0 0 S 0 0.0 0:00.42 events/0 10 root 20 0 0 0 0 S 0 0.0 0:02.26 events/1 11 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset 12 root 20 0 0 0 0 S 0 0.0 0:00.00 khelper 13 root 20 0 0 0 0 S 0 0.0 0:00.00 netns 14 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr 15 root 20 0 0 0 0 S 0 0.0 0:00.00 pm 16 root 20 0 0 0 0 S 0 0.0 0:00.02 sync_supers 17 root 20 0 0 0 0 S 0 0.0 0:00.02 bdi-default 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/0 19 root 20 0 0 0 0 S 0 0.0 0:00.00 kintegrityd/1 20 root 20 0 0 0 0 S 0 0.0 0:00.02 kblockd/0 21 root 20 0 0 0 0 S 0 0.0 0:00.08 kblockd/1 22 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpid 23 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_notify 24 root 20 0 0 0 0 S 0 0.0 0:00.00 kacpi_hotplug 25 root 20 0 0 0 0 S 0 0.0 0:00.00 kseriod 28 root 20 0 0 0 0 S 0 0.0 0:04.19 kondemand/0 29 root 20 0 0 0 0 S 0 0.0 0:02.93 kondemand/1 30 root 20 0 0 0 0 S 0 0.0 0:00.00 khungtaskd 31 root 20 0 0 0 0 S 0 0.0 0:00.18 kswapd0 32 root 25 5 0 0 0 S 0 0.0 0:00.00 ksmd 33 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/0 34 root 20 0 0 0 0 S 0 0.0 0:00.00 aio/1 35 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/0 36 root 20 0 0 0 0 S 0 0.0 0:00.00 crypto/1 203 root 20 0 0 0 0 S 0 0.0 0:00.00 ksuspend_usbd 204 root 20 0 0 0 0 S 0 0.0 0:00.00 khubd 205 root 20 0 0 0 0 S 0 0.0 0:00.00 ata/0 206 root 20 0 0 0 0 S 0 0.0 0:00.00 ata/1 207 root 20 0 0 0 0 S 0 0.0 0:00.14 ata_aux 208 root 20 0 0 0 0 S 0 0.0 0:00.01 scsi_eh_0 dmesg suggests there's a disk problem: .............. (previous episode) [13276.966004] raid1:md0: read error corrected (8 sectors at 489900360 on sdc7) [13276.966043] raid1: sdb7: redirecting sector 489898312 to another mirror [13279.569186] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13279.569211] ata4.00: irq_stat 0x40000008 [13279.569230] ata4.00: failed command: READ FPDMA QUEUED [13279.569257] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13279.569262] res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13279.569306] ata4.00: status: { DRDY ERR } [13279.569321] ata4.00: error: { UNC } [13279.575362] ata4.00: configured for UDMA/133 [13279.575388] ata4: EH complete [13283.169224] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13283.169246] ata4.00: irq_stat 0x40000008 [13283.169263] ata4.00: failed command: READ FPDMA QUEUED [13283.169289] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13283.169294] res 41/40:00:07:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13283.169331] ata4.00: status: { DRDY ERR } [13283.169345] ata4.00: error: { UNC } [13283.176071] ata4.00: configured for UDMA/133 [13283.176104] ata4: EH complete [13286.224814] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13286.224837] ata4.00: irq_stat 0x40000008 [13286.224853] ata4.00: failed command: READ FPDMA QUEUED [13286.224879] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13286.224884] res 41/40:00:06:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13286.224922] ata4.00: status: { DRDY ERR } [13286.224935] ata4.00: error: { UNC } [13286.231277] ata4.00: configured for UDMA/133 [13286.231303] ata4: EH complete [13288.802623] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13288.802646] ata4.00: irq_stat 0x40000008 [13288.802662] ata4.00: failed command: READ FPDMA QUEUED [13288.802688] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13288.802693] res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13288.802731] ata4.00: status: { DRDY ERR } [13288.802745] ata4.00: error: { UNC } [13288.808901] ata4.00: configured for UDMA/133 [13288.808927] ata4: EH complete [13291.380430] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13291.380453] ata4.00: irq_stat 0x40000008 [13291.380470] ata4.00: failed command: READ FPDMA QUEUED [13291.380496] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13291.380501] res 41/40:00:05:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13291.380577] ata4.00: status: { DRDY ERR } [13291.380594] ata4.00: error: { UNC } [13291.386517] ata4.00: configured for UDMA/133 [13291.386543] ata4: EH complete [13294.347147] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 [13294.347169] ata4.00: irq_stat 0x40000008 [13294.347186] ata4.00: failed command: READ FPDMA QUEUED [13294.347211] ata4.00: cmd 60/08:00:00:6a:05/00:00:23:00:00/40 tag 0 ncq 4096 in [13294.347217] res 41/40:00:06:6a:05/00:00:23:00:00/40 Emask 0x409 (media error) <F> [13294.347254] ata4.00: status: { DRDY ERR } [13294.347268] ata4.00: error: { UNC } [13294.353556] ata4.00: configured for UDMA/133 [13294.353583] sd 3:0:0:0: [sdc] Unhandled sense code [13294.353590] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [13294.353599] sd 3:0:0:0: [sdc] Sense Key : Medium Error [current] [descriptor] [13294.353610] Descriptor sense data with sense descriptors (in hex): [13294.353616] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [13294.353635] 23 05 6a 06 [13294.353644] sd 3:0:0:0: [sdc] Add. Sense: Unrecovered read error - auto reallocate failed [13294.353657] sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 23 05 6a 00 00 00 08 00 [13294.353675] end_request: I/O error, dev sdc, sector 587557382 [13294.353726] ata4: EH complete [13294.366953] raid1:md0: read error corrected (8 sectors at 489900544 on sdc7) [13294.366992] raid1: sdc7: redirecting sector 489898496 to another mirror and they're happening quite frequently, which I guess is liable to account for the performance problem(?) # dmesg | grep mirror [12433.561822] raid1: sdc7: redirecting sector 489900464 to another mirror [12449.428933] raid1: sdb7: redirecting sector 489900504 to another mirror [12464.807016] raid1: sdb7: redirecting sector 489900512 to another mirror [12480.196222] raid1: sdb7: redirecting sector 489900520 to another mirror [12495.585413] raid1: sdb7: redirecting sector 489900528 to another mirror [12510.974424] raid1: sdb7: redirecting sector 489900536 to another mirror [12526.374933] raid1: sdb7: redirecting sector 489900544 to another mirror [12542.619938] raid1: sdc7: redirecting sector 489900608 to another mirror [12559.431328] raid1: sdc7: redirecting sector 489900616 to another mirror [12576.553866] raid1: sdc7: redirecting sector 489900624 to another mirror [12592.065265] raid1: sdc7: redirecting sector 489900632 to another mirror [12607.621121] raid1: sdc7: redirecting sector 489900640 to another mirror [12623.165856] raid1: sdc7: redirecting sector 489900648 to another mirror [12638.699474] raid1: sdc7: redirecting sector 489900656 to another mirror [12655.610881] raid1: sdc7: redirecting sector 489900664 to another mirror [12672.255617] raid1: sdc7: redirecting sector 489900672 to another mirror [12672.288746] raid1: sdc7: redirecting sector 489900680 to another mirror [12672.332376] raid1: sdc7: redirecting sector 489900688 to another mirror [12672.362935] raid1: sdc7: redirecting sector 489900696 to another mirror [12674.201177] raid1: sdc7: redirecting sector 489900704 to another mirror [12698.045050] raid1: sdc7: redirecting sector 489900712 to another mirror [12698.089309] raid1: sdc7: redirecting sector 489900720 to another mirror [12698.111999] raid1: sdc7: redirecting sector 489900728 to another mirror [12698.134006] raid1: sdc7: redirecting sector 489900736 to another mirror [12719.034376] raid1: sdc7: redirecting sector 489900744 to another mirror [12734.545775] raid1: sdc7: redirecting sector 489900752 to another mirror [12734.590014] raid1: sdc7: redirecting sector 489900760 to another mirror [12734.624050] raid1: sdc7: redirecting sector 489900768 to another mirror [12734.647308] raid1: sdc7: redirecting sector 489900776 to another mirror [12734.664657] raid1: sdc7: redirecting sector 489900784 to another mirror [12734.710642] raid1: sdc7: redirecting sector 489900792 to another mirror [12734.721919] raid1: sdc7: redirecting sector 489900800 to another mirror [12734.744732] raid1: sdc7: redirecting sector 489900808 to another mirror [12734.779330] raid1: sdc7: redirecting sector 489900816 to another mirror [12782.604564] raid1: sdb7: redirecting sector 1242934216 to another mirror [12798.264153] raid1: sdc7: redirecting sector 1242935080 to another mirror [13245.832193] raid1: sdb7: redirecting sector 489898296 to another mirror [13261.376929] raid1: sdb7: redirecting sector 489898304 to another mirror [13276.966043] raid1: sdb7: redirecting sector 489898312 to another mirror [13294.366992] raid1: sdc7: redirecting sector 489898496 to another mirror although the arrays are still running on all disks - they haven't given up on any yet: # cat /proc/mdstat Personalities : [raid1] [raid0] md10 : active raid0 md0[0] md1[1] 3368770048 blocks super 1.2 512k chunks md1 : active raid1 sde2[2] sdd2[1] 1464087824 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb7[0] sdc7[2] 1904684920 blocks super 1.2 [2/2] [UU] unused devices: <none> So I think I have some idea what the problem is but I am not a linux sysadmin expert by the remotest stretch of the imagination and would really appreciate some clue checking here with my diagnosis and what do I need to do: obviously I need to source another drive for sdc. (I'm guessing I could buy a larger drive if the price is right: I'm thinking that one day I'll need to grow the size of the array and that would be one less drive to replace with a larger one) then use mdadm to fail out the existing sdc, remove it and fit the new drive fdisk the new drive with the same size partition for the array as the old one had use mdadm to add the new drive into the array that sound OK?

    Read the article

  • C# .Net 3.5 Asynchronous Socket Server Performance Problem

    - by iBrAaAa
    I'm developing an Asynchronous Game Server using .Net Socket Asynchronous Model( BeginAccept/EndAccept...etc.) The problem I'm facing is described like that: When I have only one client connected, the server response time is very fast but once a second client connects, the server response time increases too much. I've measured the time from a client sends a message to the server until it gets the reply in both cases. I found that the average time in case of one client is about 17ms and in case of 2 clients about 280ms!!! What I really see is that: When 2 clients are connected and only one of them is moving(i.e. requesting service from the server) it is equivalently equal to the case when only one client is connected(i.e. fast response). However, when the 2 clients move at the same time(i.e. requests service from the server at the same time) their motion becomes very slow (as if the server replies each one of them in order i.e. not simultaneously). Basically, what I am doing is that: When a client requests a permission for motion from the server and the server grants him the request, the server then broadcasts the new position of the client to all the players. So if two clients are moving in the same time, the server is eventually trying to broadcast to both clients the new position of each of them at the same time. EX: Client1 asks to go to position (2,2) Client2 asks to go to position (5,5) Server sends to each of Client1 & Client2 the same two messages: message1: "Client1 at (2,2)" message2: "Client2 at (5,5)" I believe that the problem comes from the fact that Socket class is thread safe according MSDN documentation http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.aspx. (NOT SURE THAT IT IS THE PROBLEM) Below is the code for the server: /// /// This class is responsible for handling packet receiving and sending /// public class NetworkManager { /// /// An integer to hold the server port number to be used for the connections. Its default value is 5000. /// private readonly int port = 5000; /// /// hashtable contain all the clients connected to the server. /// key: player Id /// value: socket /// private readonly Hashtable connectedClients = new Hashtable(); /// /// An event to hold the thread to wait for a new client /// private readonly ManualResetEvent resetEvent = new ManualResetEvent(false); /// /// keeps track of the number of the connected clients /// private int clientCount; /// /// The socket of the server at which the clients connect /// private readonly Socket mainSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp); /// /// The socket exception that informs that a client is disconnected /// private const int ClientDisconnectedErrorCode = 10054; /// /// The only instance of this class. /// private static readonly NetworkManager networkManagerInstance = new NetworkManager(); /// /// A delegate for the new client connected event. /// /// the sender object /// the event args public delegate void NewClientConnected(Object sender, SystemEventArgs e); /// /// A delegate for the position update message reception. /// /// the sender object /// the event args public delegate void PositionUpdateMessageRecieved(Object sender, PositionUpdateEventArgs e); /// /// The event which fires when a client sends a position message /// public PositionUpdateMessageRecieved PositionUpdateMessageEvent { get; set; } /// /// keeps track of the number of the connected clients /// public int ClientCount { get { return clientCount; } } /// /// A getter for this class instance. /// /// only instance. public static NetworkManager NetworkManagerInstance { get { return networkManagerInstance; } } private NetworkManager() {} /// Starts the game server and holds this thread alive /// public void StartServer() { //Bind the mainSocket to the server IP address and port mainSocket.Bind(new IPEndPoint(IPAddress.Any, port)); //The server starts to listen on the binded socket with max connection queue //1024 mainSocket.Listen(1024); //Start accepting clients asynchronously mainSocket.BeginAccept(OnClientConnected, null); //Wait until there is a client wants to connect resetEvent.WaitOne(); } /// /// Receives connections of new clients and fire the NewClientConnected event /// private void OnClientConnected(IAsyncResult asyncResult) { Interlocked.Increment(ref clientCount); ClientInfo newClient = new ClientInfo { WorkerSocket = mainSocket.EndAccept(asyncResult), PlayerId = clientCount }; //Add the new client to the hashtable and increment the number of clients connectedClients.Add(newClient.PlayerId, newClient); //fire the new client event informing that a new client is connected to the server if (NewClientEvent != null) { NewClientEvent(this, System.EventArgs.Empty); } newClient.WorkerSocket.BeginReceive(newClient.Buffer, 0, BasePacket.GetMaxPacketSize(), SocketFlags.None, new AsyncCallback(WaitForData), newClient); //Start accepting clients asynchronously again mainSocket.BeginAccept(OnClientConnected, null); } /// Waits for the upcoming messages from different clients and fires the proper event according to the packet type. /// /// private void WaitForData(IAsyncResult asyncResult) { ClientInfo sendingClient = null; try { //Take the client information from the asynchronous result resulting from the BeginReceive sendingClient = asyncResult.AsyncState as ClientInfo; // If client is disconnected, then throw a socket exception // with the correct error code. if (!IsConnected(sendingClient.WorkerSocket)) { throw new SocketException(ClientDisconnectedErrorCode); } //End the pending receive request sendingClient.WorkerSocket.EndReceive(asyncResult); //Fire the appropriate event FireMessageTypeEvent(sendingClient.ConvertBytesToPacket() as BasePacket); // Begin receiving data from this client sendingClient.WorkerSocket.BeginReceive(sendingClient.Buffer, 0, BasePacket.GetMaxPacketSize(), SocketFlags.None, new AsyncCallback(WaitForData), sendingClient); } catch (SocketException e) { if (e.ErrorCode == ClientDisconnectedErrorCode) { // Close the socket. if (sendingClient.WorkerSocket != null) { sendingClient.WorkerSocket.Close(); sendingClient.WorkerSocket = null; } // Remove it from the hash table. connectedClients.Remove(sendingClient.PlayerId); if (ClientDisconnectedEvent != null) { ClientDisconnectedEvent(this, new ClientDisconnectedEventArgs(sendingClient.PlayerId)); } } } catch (Exception e) { // Begin receiving data from this client sendingClient.WorkerSocket.BeginReceive(sendingClient.Buffer, 0, BasePacket.GetMaxPacketSize(), SocketFlags.None, new AsyncCallback(WaitForData), sendingClient); } } /// /// Broadcasts the input message to all the connected clients /// /// public void BroadcastMessage(BasePacket message) { byte[] bytes = message.ConvertToBytes(); foreach (ClientInfo client in connectedClients.Values) { client.WorkerSocket.BeginSend(bytes, 0, bytes.Length, SocketFlags.None, SendAsync, client); } } /// /// Sends the input message to the client specified by his ID. /// /// /// The message to be sent. /// The id of the client to receive the message. public void SendToClient(BasePacket message, int id) { byte[] bytes = message.ConvertToBytes(); (connectedClients[id] as ClientInfo).WorkerSocket.BeginSend(bytes, 0, bytes.Length, SocketFlags.None, SendAsync, connectedClients[id]); } private void SendAsync(IAsyncResult asyncResult) { ClientInfo currentClient = (ClientInfo)asyncResult.AsyncState; currentClient.WorkerSocket.EndSend(asyncResult); } /// Fires the event depending on the type of received packet /// /// The received packet. void FireMessageTypeEvent(BasePacket packet) { switch (packet.MessageType) { case MessageType.PositionUpdateMessage: if (PositionUpdateMessageEvent != null) { PositionUpdateMessageEvent(this, new PositionUpdateEventArgs(packet as PositionUpdatePacket)); } break; } } } The events fired are handled in a different class, here are the event handling code for the PositionUpdateMessage (Other handlers are irrelevant): private readonly Hashtable onlinePlayers = new Hashtable(); /// /// Constructor that creates a new instance of the GameController class. /// private GameController() { //Start the server server = new Thread(networkManager.StartServer); server.Start(); //Create an event handler for the NewClientEvent of networkManager networkManager.PositionUpdateMessageEvent += OnPositionUpdateMessageReceived; } /// /// this event handler is called when a client asks for movement. /// private void OnPositionUpdateMessageReceived(object sender, PositionUpdateEventArgs e) { Point currentLocation = ((PlayerData)onlinePlayers[e.PositionUpdatePacket.PlayerId]).Position; Point locationRequested = e.PositionUpdatePacket.Position; ((PlayerData)onlinePlayers[e.PositionUpdatePacket.PlayerId]).Position = locationRequested; // Broadcast the new position networkManager.BroadcastMessage(new PositionUpdatePacket { Position = locationRequested, PlayerId = e.PositionUpdatePacket.PlayerId }); }

    Read the article

  • Large Object Heap Fragmentation

    - by Paul Ruane
    The C#/.NET application I am working on is suffering from a slow memory leak. I have used CDB with SOS to try to determine what is happening but the data does not seem to make any sense so I was hoping one of you may have experienced this before. The application is running on the 64 bit framework. It is continuously calculating and serialising data to a remote host and is hitting the Large Object Heap (LOH) a fair bit. However, most of the LOH objects I expect to be transient: once the calculation is complete and has been sent to the remote host, the memory should be freed. What I am seeing, however, is a large number of (live) object arrays interleaved with free blocks of memory, e.g., taking a random segment from the LOH: 0:000> !DumpHeap 000000005b5b1000 000000006351da10 Address MT Size ... 000000005d4f92e0 0000064280c7c970 16147872 000000005e45f880 00000000001661d0 1901752 Free 000000005e62fd38 00000642788d8ba8 1056 <-- 000000005e630158 00000000001661d0 5988848 Free 000000005ebe6348 00000642788d8ba8 1056 000000005ebe6768 00000000001661d0 6481336 Free 000000005f214d20 00000642788d8ba8 1056 000000005f215140 00000000001661d0 7346016 Free 000000005f9168a0 00000642788d8ba8 1056 000000005f916cc0 00000000001661d0 7611648 Free 00000000600591c0 00000642788d8ba8 1056 00000000600595e0 00000000001661d0 264808 Free ... Obviously I would expect this to be the case if my application were creating long-lived, large objects during each calculation. (It does do this and I accept there will be a degree of LOH fragmentation but that is not the problem here.) The problem is the very small (1056 byte) object arrays you can see in the above dump which I cannot see in code being created and which are remaining rooted somehow. Also note that CDB is not reporting the type when the heap segment is dumped: I am not sure if this is related or not. If I dump the marked (<--) object, CDB/SOS reports it fine: 0:015> !DumpObj 000000005e62fd38 Name: System.Object[] MethodTable: 00000642788d8ba8 EEClass: 00000642789d7660 Size: 1056(0x420) bytes Array: Rank 1, Number of elements 128, Type CLASS Element Type: System.Object Fields: None The elements of the object array are all strings and the strings are recognisable as from our application code. Also, I am unable to find their GC roots as the !GCRoot command hangs and never comes back (I have even tried leaving it overnight). So, I would very much appreciate it if anyone could shed any light as to why these small (<85k) object arrays are ending up on the LOH: what situations will .NET put a small object array in there? Also, does anyone happen to know of an alternative way of ascertaining the roots of these objects? Thanks in advance. Update 1 Another theory I came up with late yesterday is that these object arrays started out large but have been shrunk leaving the blocks of free memory that are evident in the memory dumps. What makes me suspicious is that the object arrays always appear to be 1056 bytes long (128 elements), 128 * 8 for the references and 32 bytes of overhead. The idea is that perhaps some unsafe code in a library or in the CLR is corrupting the number of elements field in the array header. Bit of a long shot I know... Update 2 Thanks to Brian Rasmussen (see accepted answer) the problem has been identified as fragmentation of the LOH caused by the string intern table! I wrote a quick test application to confirm this: static void Main() { const int ITERATIONS = 100000; for (int index = 0; index < ITERATIONS; ++index) { string str = "NonInterned" + index; Console.Out.WriteLine(str); } Console.Out.WriteLine("Continue."); Console.In.ReadLine(); for (int index = 0; index < ITERATIONS; ++index) { string str = string.Intern("Interned" + index); Console.Out.WriteLine(str); } Console.Out.WriteLine("Continue?"); Console.In.ReadLine(); } The application first creates and dereferences unique strings in a loop. This is just to prove that the memory does not leak in this scenario. Obviously it should not and it does not. In the second loop, unique strings are created and interned. This action roots them in the intern table. What I did not realise is how the intern table is represented. It appears it consists of a set of pages -- object arrays of 128 string elements -- that are created in the LOH. This is more evident in CDB/SOS: 0:000> .loadby sos mscorwks 0:000> !EEHeap -gc Number of GC Heaps: 1 generation 0 starts at 0x00f7a9b0 generation 1 starts at 0x00e79c3c generation 2 starts at 0x00b21000 ephemeral segment allocation context: none segment begin allocated size 00b20000 00b21000 010029bc 0x004e19bc(5118396) Large object heap starts at 0x01b21000 segment begin allocated size 01b20000 01b21000 01b8ade0 0x00069de0(433632) Total Size 0x54b79c(5552028) ------------------------------ GC Heap Size 0x54b79c(5552028) Taking a dump of the LOH segment reveals the pattern I saw in the leaking application: 0:000> !DumpHeap 01b21000 01b8ade0 ... 01b8a120 793040bc 528 01b8a330 00175e88 16 Free 01b8a340 793040bc 528 01b8a550 00175e88 16 Free 01b8a560 793040bc 528 01b8a770 00175e88 16 Free 01b8a780 793040bc 528 01b8a990 00175e88 16 Free 01b8a9a0 793040bc 528 01b8abb0 00175e88 16 Free 01b8abc0 793040bc 528 01b8add0 00175e88 16 Free total 1568 objects Statistics: MT Count TotalSize Class Name 00175e88 784 12544 Free 793040bc 784 421088 System.Object[] Total 1568 objects Note that the object array size is 528 (rather than 1056) because my workstation is 32 bit and the application server is 64 bit. The object arrays are still 128 elements long. So the moral to this story is to be very careful interning. If the string you are interning is not known to be a member of a finite set then your application will leak due to fragmentation of the LOH, at least in version 2 of the CLR. In our application's case, there is general code in the deserialisation code path that interns entity identifiers during unmarshalling: I now strongly suspect this is the culprit. However, the developer's intentions were obviously good as they wanted to make sure that if the same entity is deserialised multiple times then only one instance of the identifier string will be maintained in memory.

    Read the article

  • Hosted bug tracking system with mercurial repositories (Summary of options & request for opinions)

    - by Mark Booth
    The Question What hosted mercurial repository/bug tracking system or systems have you used? Would you recommend it to others? Are there serious flaws, either in the repository hosting or the bug tracking features that would make it difficult to recommend it? Do you have any other experiences with it or opinions of it that you would like to share? If you have used other non mercurial hosted repository/bug tracking systems, how does it compare? (If I understand correctly, the best format for this type of community-wiki style question is one answer per option, if you have experienced if several) Background I have been looking into options for setting up a bug/issue tracking database and found some valuable advice in this thread and this. But then I got to thinking that a hosted solution might not only solve the problem of tracking bugs, but might also solve the problem we have accessing our mercurial source code repositories while at customer sites around the world. Since we currently have no way to serve mercurial repositories over ssl, when I am at a customer site I have to connect my laptop via VPN to my work network and access the mercurial repositories over a samba share (even if it is just to synce twice a day). This is excruciatingly slow on high latency networks and can be impossible with some customers' firewalls. Even if we could run a TRAC or Redmine server here (thanks turnkey), I'm not sure it would be much quicker as our internet connection is over-stretched as it is. What I would like is for developers to be able to be able to push/pull to/from a remote repository, servicing engineers to be able to pull from a remote repository and for customers (both internal and external) to be able to submit bug/issue reports. Initial options The two options I found were Assembla and Jira. Looking at Assembla I thought the 'group' price looked reasonable, but after enquiring, found that each workspace could only contain a single repository. Since each of our products might have up to a dozen repositories (mostly for libraries) which need to be managed seperately for each product, I could see it getting expensive really quickly. On the plus side, it appears that 'users' are just workspace members, so you can have as many client users (people who can only submit support tickets and track their own tickets) without using up your user allocation. Jira only charges based on the number of users, unfortunately client users also count towards this, if you want them to be able to track their tickets. If you only want clients to be able to submit untracked issues, you can let them submit anonymously, but that doesn't feel very professional to me. More options Looking through MercurialHosting page that @Paidhi suggested, I've added the options which appear to offer private repositories, along with another that I found with a web search. Prices are as per their website today (29th March 2010). Corrections welcome in the future. Anyway, here is my summary, according to the information given on their websites: Assembla, http://www.assembla.com/, looks to be a reasonable price, but suffers only one repository per workspace, so three projects with 6 repos each would use up most of the spaces associated with a $99/month professional account (20 spaces). Bug tracking is based on Trac. Mercurial+Trac support was announced in a blog entry in 2007, but they only list SVN and Git on their Features web page. Cost: $24, $49, $99 & $249/month for 40, 40, unlimited, unlimited users and 1, 10, 20, 100 workspaces. SSL based push/pull? Website https login. BitBucket, http://bitbucket.org/plans/, is primarily a mercurial hosting site for open source projects, with SSL support, but they have an integrated bug tracker and they are cheap for private repositories. It has it’s own issues tracker, but also integrates with Lighthouse & FogBugz. Cost: $0, $5, $12, $50 & $100/month for 1, 5, 15, 25 & 150 private repositories. SSL based push/pull. No https on website login, but supports OpenID, so you can chose an OpenID provider with https login. Codebase HQ, http://www.codebasehq.com/, supports Hg and is almost as cheap as BitBucket. Cost: £5, £13, £21 & £40/month for 3, 15, 30 & 60 active projects, unlimited repositories, unlimited users (except 10 users at £5/month) and 0.5, 2, 4 & 10GB. SSL based push/pull? Website https login? Firefly, http://www.activestate.com/firefly/, by ActiveState looks interesting, but the website is a little light on details, such as whether you can only have one repository per project or not. Cost: $9, $19, & £39/month for 1, 5 & 30 private projects, with a 0.5, 1.5 & 3 GB storage limit. SSL based push/pull? Website https login. Jira, http://www.atlassian.com/software/jira/, isn’t limited by the number of repositories you can have, but by ‘user’. It could work out quite expensive if we want client users to be able to track their issues, since they would need a full user account to be created for them. Also, while there is a Mercurial extension to support jira, there is no ‘Advanced integration’ for Mercurial from Atlassian Fisheye. Cost: $150, $300, $400, $500, $700/month for 10, 25, 50, 100, 100+ users. SSL based push/pull? Website https login. Kiln & FogBugz On Demand, http://fogcreek.com/Kiln/IntrotoOnDemand.html, integrates Kilns mercurial DVCS features with FogBugz, where the combined package is much cheaper than the component parts. Also, the Fogbugz integration is supposedly excellent. *8’) Cost: £30/developer/month ($5/d/m more than either on their own). SSL based push/pull? SourceRepo, http://sourcerepo.com/, also supports HG and is even cheaper than BitBucket & Codebase. Cost: $4, $7 & $13/month for 1, unlimited & unlimited repositories/trac/redmine instances and 500MB, 1GB & 3GB storage. SSL based push/pull. Website https login. Edit: 29th March 2010 & Bounty I split this question into sections, made the questions themselves more explicit, added other options from the research I have done since my first posting and made this community wiki, since I now understand what CW is for. *8') Also, I've added a bounty to encourage people to offer their opinions. At the end of the bounty period, I will award the bounty to whoever writes the best review (good or bad), irrespective of the number of up/down votes it gets. Given that it's probably more important to avoid bad providers than find the absolute best one, 'bad reviews' could be considered more important than good ones.

    Read the article

  • Asset Pipeline acting up

    - by Abram
    Ok, so my asset pipeline has suddenly started acting up on my development machine. JS functions that previously worked are now throwing "not a function" errors.. I know I must be doing something wrong. A minute ago the datatables jquery function was working, then it was throwing an error, then it was working, and now it's not working or throwing an error. Here is my application.js //= require jquery //= require jquery-ui //= require jquery_ujs //= require_self //= require_tree . //= require dataTables/jquery.dataTables //= require dataTables/jquery.dataTables.bootstrap //= require bootstrap //= require bootstrap-tooltip //= require bootstrap-popover //= require bootstrap-tab //= require bootstrap-modal //= require bootstrap-alert //= require bootstrap-dropdown //= require jquery.ui.addresspicker //= require raty //= require jquery.alphanumeric //= require jquery.formrestrict //= require select2 //= require chosen/chosen.jquery //= require highcharts //= require jquery.lazyload Here is some of my layout header: <%= stylesheet_link_tag "application", media: "all" %> <%= yield(:scripthead) %> <%= javascript_include_tag "application" %> <%= csrf_meta_tags %> <%= yield(:head) %> Above I am using the yield to load up online scripts from google as they're only needed on some pages, and generally slow down the site if included in the application layout. I tried removing the yield but things were still broken, even after clearing the browser cache and running rake assets:clean (just to be on the safe side). Here's what shows up between CSS and metatags (for a page with nothin in the yield scripthead): <script src="/assets/jquery.js?body=1" type="text/javascript"></script> <script src="/assets/jquery-ui.js?body=1" type="text/javascript"></script> <script src="/assets/jquery_ujs.js?body=1" type="text/javascript"></script> <script src="/assets/application.js?body=1" type="text/javascript"></script> <script src="/assets/aidmodels.js?body=1" type="text/javascript"></script> <script src="/assets/audio.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-alert.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-dropdown.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-modal.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-popover.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-tab.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-tooltip.js?body=1" type="text/javascript"></script> <script src="/assets/branches.js?body=1" type="text/javascript"></script> <script src="/assets/charts.js?body=1" type="text/javascript"></script> <script src="/assets/chosen/backup_chosen.jquery.js?body=1" type="text/javascript"></script> <script src="/assets/chosen/chosen.jquery.js?body=1" type="text/javascript"></script> <script src="/assets/consumers.js?body=1" type="text/javascript"></script> <script src="/assets/dispensers.js?body=1" type="text/javascript"></script> <script src="/assets/favorites.js?body=1" type="text/javascript"></script> <script src="/assets/features.js?body=1" type="text/javascript"></script> <script src="/assets/generic_styles.js?body=1" type="text/javascript"></script> <script src="/assets/gmaps4rails/gmaps4rails.base.js?body=1" type="text/javascript"></script> <script src="/assets/gmaps4rails/gmaps4rails.bing.js?body=1" type="text/javascript"></script> <script src="/assets/gmaps4rails/gmaps4rails.googlemaps.js?body=1" type="text/javascript"></script> <script src="/assets/gmaps4rails/gmaps4rails.mapquest.js?body=1" type="text/javascript"></script> <script src="/assets/gmaps4rails/gmaps4rails.openlayers.js?body=1" type="text/javascript"></script> <script src="/assets/highcharts.js?body=1" type="text/javascript"></script> <script src="/assets/jquery-ui-1.8.18.custom.min.js?body=1" type="text/javascript"></script> <script src="/assets/jquery.alphanumeric.js?body=1" type="text/javascript"></script> <script src="/assets/jquery.formrestrict.js?body=1" type="text/javascript"></script> <script src="/assets/jquery.lazyload.js?body=1" type="text/javascript"></script> <script src="/assets/jquery.ui.addresspicker.js?body=1" type="text/javascript"></script> <script src="/assets/likes.js?body=1" type="text/javascript"></script> <script src="/assets/messages.js?body=1" type="text/javascript"></script> <script src="/assets/overalls.js?body=1" type="text/javascript"></script> <script src="/assets/pages.js?body=1" type="text/javascript"></script> <script src="/assets/questions.js?body=1" type="text/javascript"></script> <script src="/assets/raty.js?body=1" type="text/javascript"></script> <script src="/assets/reviews.js?body=1" type="text/javascript"></script> <script src="/assets/sessions.js?body=1" type="text/javascript"></script> <script src="/assets/styles.js?body=1" type="text/javascript"></script> <script src="/assets/tickets.js?body=1" type="text/javascript"></script> <script src="/assets/universities.js?body=1" type="text/javascript"></script> <script src="/assets/users.js?body=1" type="text/javascript"></script> <script src="/assets/dataTables/jquery.dataTables.js?body=1" type="text/javascript"></script> <script src="/assets/dataTables/jquery.dataTables.bootstrap.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-transition.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-affix.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-button.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-carousel.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-collapse.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-scrollspy.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap-typeahead.js?body=1" type="text/javascript"></script> <script src="/assets/bootstrap.js?body=1" type="text/javascript"></script> <script src="/assets/select2.js?body=1" type="text/javascript"></script> From application.rb: config.assets.initialize_on_precompile = false # Enable the asset pipeline config.assets.enabled = true config.action_controller.assets_dir = "#{File.dirname(File.dirname(__FILE__))}/public" # Version of your assets, change this if you want to expire all your assets config.assets.version = '1.0' I'm sorry, I'm not sure what else to include to help with this puzzle, but any advise would be appreciated. I was having no problems before I started trying to upload to heroku and now everything's gone haywire. EDIT: In the console at the moment I'm seeing Uncaught TypeError: Cannot read property 'Constructor' of undefined bootstrap-popover.js:33 Uncaught ReferenceError: google is not defined jquery.ui.addresspicker.js:25 Uncaught TypeError: Object [object Object] has no method 'popover' overall:476

    Read the article

  • Compass - Lucene Full text search. Structure and Best Practice.

    - by Rob
    Hi, I have played about with the tutorial and Compass itself for a bit now. I have just started to ramp up the use of it and have found that the performance slows drastically. I am certain that this is due to my mappings and the relationships that I have between entities and was looking for suggestions about how this should be best done. Also as a side question I wanted to know if a variable is in an @searchableComponent but is not defined as @searchable when the component object is pulled out of Compass results will you be able to access that variable? I have 3 main classes that I want to search on - Provider, Location and Activity. They are all inter-related - a Provider can have many locations and activites and has an address; A Location has 1 provider, many activities and an address; An activity has 1 provider and many locations. I have a join table between activity and Location called ActivityLocation that can later provider additional information about the relationship. My classes are mapped to compass as shown below for provider location activity and address. This works but gives a huge index and searches on it are comparatively slow, any advice would be great. Cheers, Rob @Searchable public class AbstractActivity extends LightEntity implements Serializable { /** * Serialisation ID */ private static final long serialVersionUID = 3445339493203407152L; @SearchableId (name="actID") private Integer activityId =-1; @SearchableComponent() private Provider provider; @SearchableComponent(prefix = "activity") private Category category; private String status; @SearchableProperty (name = "activityName") @SearchableMetaData (name = "activityshortactivityName") private String activityName; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "activityshortDescription") private String shortDescription; @SearchableProperty (name = "abRating") @SearchableMetaData (name = "activityabRating") private Integer abRating; private String contactName; private String phoneNumber; private String faxNumber; private String email; private String url; @SearchableProperty (name = "completed") @SearchableMetaData (name = "activitycompleted") private Boolean completed= false; @SearchableProperty (name = "isprivate") @SearchableMetaData (name = "activityisprivate") private Boolean isprivate= false; private Boolean subs= false; private Boolean newsfeed= true; private Set news = new HashSet(0); private Set ActivitySession = new HashSet(0); private Set payments = new HashSet(0); private Set userclub = new HashSet(0); private Set ActivityOpeningTimes = new HashSet(0); private Set Events = new HashSet(0); private User creator; private Set userInterested = new HashSet(0); boolean freeEdit = false; private Integer activityType =0; @SearchableComponent (maxDepth=2) private Set activityLocations = new HashSet(0); private Double userRating = -1.00; Getters and Setters .... @Searchable public class AbstractLocation extends LightEntity implements Serializable { /** * Serialisation ID */ private static final long serialVersionUID = 3445339493203407152L; @SearchableId (name="locationID") private Integer locationId; @SearchableComponent (prefix = "location") private Category category; @SearchableComponent (maxDepth=1) private Provider provider; @SearchableProperty (name = "status") @SearchableMetaData (name = "locationstatus") private String status; @SearchableProperty private String locationName; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "locationshortDescription") private String shortDescription; @SearchableProperty (name = "abRating") @SearchableMetaData (name = "locationabRating") private Integer abRating; private Integer boolUseProviderDetails; @SearchableProperty (name = "contactName") @SearchableMetaData (name = "locationcontactName") private String contactName; @SearchableComponent private Address address; @SearchableProperty (name = "phoneNumber") @SearchableMetaData (name = "locationphoneNumber") private String phoneNumber; @SearchableProperty (name = "faxNumber") @SearchableMetaData (name = "locationfaxNumber") private String faxNumber; @SearchableProperty (name = "email") @SearchableMetaData (name = "locationemail") private String email; @SearchableProperty (name = "url") @SearchableMetaData (name = "locationurl") private String url; @SearchableProperty (name = "completed") @SearchableMetaData (name = "locationcompleted") private Boolean completed= false; @SearchableProperty (name = "isprivate") @SearchableMetaData (name = "locationisprivate") private Boolean isprivate= false; @SearchableComponent private Set activityLocations = new HashSet(0); private Set LocationOpeningTimes = new HashSet(0); private Set events = new HashSet(0); @SearchableProperty (name = "adult_cost") @SearchableMetaData (name = "locationadult_cost") private String adult_cost =""; @SearchableProperty (name = "child_cost") @SearchableMetaData (name = "locationchild_cost") private String child_cost =""; @SearchableProperty (name = "family_cost") @SearchableMetaData (name = "locationfamily_cost") private String family_cost =""; private Double userRating = -1.00; private Set costs = new HashSet(0); private String cost_caveats =""; Getters and Setters .... @Searchable public class AbstractActivitylocations implements java.io.Serializable { /** * */ private static final long serialVersionUID = 1365110541466870626L; @SearchableId (name="id") private Integer id; @SearchableComponent private Activity activity; @SearchableComponent private Location location; Getters and Setters..... @Searchable public class AbstractProvider extends LightEntity implements Serializable { private static final long serialVersionUID = 3060354043429663058L; @SearchableId private Integer providerId = -1; @SearchableComponent (prefix = "provider") private Category category; @SearchableProperty (name = "businessName") @SearchableMetaData (name = "providerbusinessName") private String businessName; @SearchableProperty (name = "contactName") @SearchableMetaData (name = "providercontactName") private String contactName; @SearchableComponent private Address address; @SearchableProperty (name = "phoneNumber") @SearchableMetaData (name = "providerphoneNumber") private String phoneNumber; @SearchableProperty (name = "faxNumber") @SearchableMetaData (name = "providerfaxNumber") private String faxNumber; @SearchableProperty (name = "email") @SearchableMetaData (name = "provideremail") private String email; @SearchableProperty (name = "url") @SearchableMetaData (name = "providerurl") private String url; @SearchableProperty (name = "status") @SearchableMetaData (name = "providerstatus") private String status; @SearchableProperty (name = "notes") @SearchableMetaData (name = "providernotes") private String notes; @SearchableProperty (name = "shortDescription") @SearchableMetaData (name = "providershortDescription") private String shortDescription; private Boolean completed = false; private Boolean isprivate = false; private Double userRating = -1.00; private Integer ABRating = 1; @SearchableComponent private Set locations = new HashSet(0); @SearchableComponent private Set activities = new HashSet(0); private Set ProviderOpeningTimes = new HashSet(0); private User creator; boolean freeEdit = false; Getters and Setters... Thanks for reading!! Rob

    Read the article

  • Something is making my page perform an Ajax call multiple times... [read: I've never been more frust

    - by Jack Webb-Heller
    NOTE: This is a long question. I've explained all the 'basics' at the top and then there's some further (optional) information for if you need it. Hi folks Basically last night this started happening at about 9PM whilst I was trying to restructure my code to make it a bit nicer for the designer to add a few bits to. I tried to fix it until 2AM at which point I gave up. Came back to it this morning, still baffled. I'll be honest with you, I'm a pretty bad Javascript developer. Since starting this project Javascript has been completely new to me and I've just learn as I went along. So please forgive me if my code structure is really bad (perhaps give a couple of pointers on how to improve it?). So, to the problem: to reproduce it, visit http://furnace.howcode.com (it's far from complete). This problem is a little confusing but I'd really appreciate the help. So in the second column you'll see three tabs The 'Newest' tab is selected by default. Scroll to the bottom, and 3 further results should be dynamically fetched via Ajax. Now click on the 'Top Rated' tab. You'll see all the results, but ordered by rating Scroll to the bottom of 'Top Rated'. You'll see SIX results returned. This is where it goes wrong. Only a further three should be returned (there are 18 entries in total). If you're observant you'll notice two 'blocks' of 3 returned. The first 'block' is the second page of results from the 'Newest' tab. The second block is what I just want returned. Did that make any sense? Never mind! So basically I checked this out in Firebug. What happens is, from a 'Clean' page (first load, nothing done) it calls ONE POST request to http://furnace.howcode.com/code/loadmore . But every time you load a new one of the tabs, it makes an ADDITIONAL POST request each time where there should normally only be ONE. So, can you help me? I'd really appreciate it! At this point you could start independent investigation or read on for a little further (optional) information. Thanks! Jack Further Info (may be irrelevant but here for reference): It's almost like there's some Javascript code or something being left behind that duplicates it each time. I thought it might be this code that I use to detect when the browser is scrolled to the bottom: var col = $('#col2'); col.scroll(function(){ if (col.outerHeight() == (col.get(0).scrollHeight - col.scrollTop())) loadMore(1); }); So what I thought was that code was left behind, and so every time you scroll #col2 (which contains different data for each tab) it detected that and added it for #newest as well. So, I made each tab click give #col2 a dynamic class - either .newestcol, .featuredcol, or .topratedcol. And then I changed the var col=$('.newestcol');dynamically so it would only detect it individually for each tab (makin' any sense?!). But hey, that didn't do anything. Another useful tidbit: here's the PHP for http://furnace.howcode.com/code/loadmore: $kind = $this->input->post('kind'); if ($kind == 1){ // kind is 1 - newest $start = $this->input->post('currentpage'); $data['query'] = "SELECT code.id AS codeid, code.title AS codetitle, code.summary AS codesummary, code.author AS codeauthor, code.rating AS rating, code.date, code_tags.*, tags.*, users.firstname AS authorname, users.id AS authorid, GROUP_CONCAT(tags.tag SEPARATOR ', ') AS taggroup FROM code, code_tags, tags, users WHERE users.id = code.author AND code_tags.code_id = code.id AND tags.id = code_tags.tag_id GROUP BY code_id ORDER BY date DESC LIMIT $start, 15 "; $this->load->view('code/ajaxlist',$data); } elseif ($kind == 2) { // kind is 2 - featured So my jQuery code sends a variable 'kind'. If it's 1, it runs the query for Newest, etc. etc. The PHP code for furnace.howcode.com/code/ajaxlist is: <?php // Our query base // SELECT * FROM code ORDER BY date DESC $query = $this->db->query($query); foreach($query->result() as $row) { ?> <script type="text/javascript"> $('#title-<?php echo $row->codeid;?>').click(function() { var form_data = { id: <?php echo $row->codeid; ?> }; $('#col3').fadeOut('slow', function() { $.ajax({ url: "<?php echo site_url('code/viewajax');?>", type: 'POST', data: form_data, success: function(msg) { $('#col3').html(msg); $('#col3').fadeIn('fast'); } }); }); }); </script> <div class="result"> <div class="resulttext"> <div id="title-<?php echo $row->codeid; ?>" class="title"> <?php echo anchor('#',$row->codetitle); ?> </div> <div class="summary"> <?php echo $row->codesummary; ?> </div> <!-- Now insert the 5-star rating system --> <?php include($_SERVER['DOCUMENT_ROOT']."/fivestars/5star.php");?> <div class="bottom"> <div class="author"> Submitted by <?php echo anchor('auth/profile/'.$row->authorid,''.$row->authorname);?> </div> <?php // Now we need to take the GROUP_CONCATted tags and split them using the magic of PHP into seperate tags $tagarray = explode(", ", $row->taggroup); foreach ($tagarray as $tag) { ?> <div class="tagbutton" href="#"> <span><?php echo $tag; ?></span> </div> <?php } ?> </div> </div> </div> <?php } echo "&nbsp;";?> <script type="text/javascript"> var newpage = <?php echo $this->input->post('currentpage') + 15;?>; </script> So that's everything in PHP. The rest you should be able to view with Firebug or by viewing the Source code. I've put all the Tab/clicking/Ajaxloading bits in the tags at the very bottom. There's a comment before it all kicks off. Thanks so much for your help!

    Read the article

  • rotating bitmaps. In code.

    - by Marco van de Voort
    Is there a faster way to rotate a large bitmap by 90 or 270 degrees than simply doing a nested loop with inverted coordinates? The bitmaps are 8bpp and typically 2048*2400*8bpp Currently I do this by simply copying with argument inversion, roughly (pseudo code: for x = 0 to 2048-1 for y = 0 to 2048-1 dest[x][y]=src[y][x]; (In reality I do it with pointers, for a bit more speed, but that is roughly the same magnitude) GDI is quite slow with large images, and GPU load/store times for textures (GF7 cards) are in the same magnitude as the current CPU time. Any tips, pointers? An in-place algorithm would even be better, but speed is more important than being in-place. Target is Delphi, but it is more an algorithmic question. SSE(2) vectorization no problem, it is a big enough problem for me to code it in assembler Duplicates How do you rotate a two dimensional array?. Follow up to Nils' answer Image 2048x2700 - 2700x2048 Compiler Turbo Explorer 2006 with optimization on. Windows: Power scheme set to "Always on". (important!!!!) Machine: Core2 6600 (2.4 GHz) time with old routine: 32ms (step 1) time with stepsize 8 : 12ms time with stepsize 16 : 10ms time with stepsize 32+ : 9ms Meanwhile I also tested on a Athlon 64 X2 (5200+ iirc), and the speed up there was slightly more than a factor four (80 to 19 ms). The speed up is well worth it, thanks. Maybe that during the summer months I'll torture myself with a SSE(2) version. However I already thought about how to tackle that, and I think I'll run out of SSE2 registers for an straight implementation: for n:=0 to 7 do begin load r0, <source+n*rowsize> shift byte from r0 into r1 shift byte from r0 into r2 .. shift byte from r0 into r8 end; store r1, <target> store r2, <target+1*<rowsize> .. store r8, <target+7*<rowsize> So 8x8 needs 9 registers, but 32-bits SSE only has 8. Anyway that is something for the summer months :-) Note that the pointer thing is something that I do out of instinct, but it could be there is actually something to it, if your dimensions are not hardcoded, the compiler can't turn the mul into a shift. While muls an sich are cheap nowadays, they also generate more register pressure afaik. The code (validated by subtracting result from the "naieve" rotate1 implementation): const stepsize = 32; procedure rotatealign(Source: tbw8image; Target:tbw8image); var stepsx,stepsy,restx,resty : Integer; RowPitchSource, RowPitchTarget : Integer; pSource, pTarget,ps1,ps2 : pchar; x,y,i,j: integer; rpstep : integer; begin RowPitchSource := source.RowPitch; // bytes to jump to next line. Can be negative (includes alignment) RowPitchTarget := target.RowPitch; rpstep:=RowPitchTarget*stepsize; stepsx:=source.ImageWidth div stepsize; stepsy:=source.ImageHeight div stepsize; // check if mod 16=0 here for both dimensions, if so -> SSE2. for y := 0 to stepsy - 1 do begin psource:=source.GetImagePointer(0,y*stepsize); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(target.imagewidth-(y+1)*stepsize,0); for x := 0 to stepsx - 1 do begin for i := 0 to stepsize - 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[stepsize-1-i]; // (maxx-i,0); for j := 0 to stepsize - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize); inc(ptarget,rpstep); end; end; // 3 more areas to do, with dimensions // - stepsy*stepsize * restx // right most column of restx width // - stepsx*stepsize * resty // bottom row with resty height // - restx*resty // bottom-right rectangle. restx:=source.ImageWidth mod stepsize; // typically zero because width is // typically 1024 or 2048 resty:=source.Imageheight mod stepsize; if restx>0 then begin // one loop less, since we know this fits in one line of "blocks" psource:=source.GetImagePointer(source.ImageWidth-restx,0); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(Target.imagewidth-stepsize,Target.imageheight-restx); for y := 0 to stepsy - 1 do begin for i := 0 to stepsize - 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[stepsize-1-i]; // (maxx-i,0); for j := 0 to restx - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize*RowPitchSource); dec(ptarget,stepsize); end; end; if resty>0 then begin // one loop less, since we know this fits in one line of "blocks" psource:=source.GetImagePointer(0,source.ImageHeight-resty); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(0,0); for x := 0 to stepsx - 1 do begin for i := 0 to resty- 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[resty-1-i]; // (maxx-i,0); for j := 0 to stepsize - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize); inc(ptarget,rpstep); end; end; if (resty>0) and (restx>0) then begin // another loop less, since only one block psource:=source.GetImagePointer(source.ImageWidth-restx,source.ImageHeight-resty); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(0,target.ImageHeight-restx); for i := 0 to resty- 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[resty-1-i]; // (maxx-i,0); for j := 0 to restx - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; end; end;

    Read the article

  • Suggestions for duplicate file finder algorithm (using C)

    - by Andrei Ciobanu
    Hello, I wanted to write a program that test if two files are duplicates (have exactly the same content). First I test if the files have the same sizes, and if they have i start to compare their contents. My first idea, was to "split" the files into fixed size blocks, then start a thread for every block, fseek to startup character of every block and continue the comparisons in parallel. When a comparison from a thread fails, the other working threads are canceled, and the program exits out of the thread spawning loop. The code looks like this: dupf.h #ifndef __NM__DUPF__H__ #define __NM__DUPF__H__ #define NUM_THREADS 15 #define BLOCK_SIZE 8192 /* Thread argument structure */ struct thread_arg_s { const char *name_f1; /* First file name */ const char *name_f2; /* Second file name */ int cursor; /* Where to seek in the file */ }; typedef struct thread_arg_s thread_arg; /** * 'arg' is of type thread_arg. * Checks if the specified file blocks are * duplicates. */ void *check_block_dup(void *arg); /** * Checks if two files are duplicates */ int check_dup(const char *name_f1, const char *name_f2); /** * Returns a valid pointer to a file. * If the file (given by the path/name 'fname') cannot be opened * in 'mode', the program is interrupted an error message is shown. **/ FILE *safe_fopen(const char *name, const char *mode); #endif dupf.c #include <errno.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include "dupf.h" FILE *safe_fopen(const char *fname, const char *mode) { FILE *f = NULL; f = fopen(fname, mode); if (f == NULL) { char emsg[255]; sprintf(emsg, "FOPEN() %s\t", fname); perror(emsg); exit(-1); } return (f); } void *check_block_dup(void *arg) { const char *name_f1 = NULL, *name_f2 = NULL; /* File names */ FILE *f1 = NULL, *f2 = NULL; /* Streams */ int cursor = 0; /* Reading cursor */ char buff_f1[BLOCK_SIZE], buff_f2[BLOCK_SIZE]; /* Character buffers */ int rchars_1, rchars_2; /* Readed characters */ /* Initializing variables from 'arg' */ name_f1 = ((thread_arg*)arg)->name_f1; name_f2 = ((thread_arg*)arg)->name_f2; cursor = ((thread_arg*)arg)->cursor; /* Opening files */ f1 = safe_fopen(name_f1, "r"); f2 = safe_fopen(name_f2, "r"); /* Setup cursor in files */ fseek(f1, cursor, SEEK_SET); fseek(f2, cursor, SEEK_SET); /* Initialize buffers */ rchars_1 = fread(buff_f1, 1, BLOCK_SIZE, f1); rchars_2 = fread(buff_f2, 1, BLOCK_SIZE, f2); if (rchars_1 != rchars_2) { /* fread failed to read the same portion. * program cannot continue */ perror("ERROR WHEN READING BLOCK"); exit(-1); } while (rchars_1-->0) { if (buff_f1[rchars_1] != buff_f2[rchars_1]) { /* Different characters */ fclose(f1); fclose(f2); pthread_exit("notdup"); } } /* Close streams */ fclose(f1); fclose(f2); pthread_exit("dup"); } int check_dup(const char *name_f1, const char *name_f2) { int num_blocks = 0; /* Number of 'blocks' to check */ int num_tsp = 0; /* Number of threads spawns */ int tsp_iter = 0; /* Iterator for threads spawns */ pthread_t *tsp_threads = NULL; thread_arg *tsp_threads_args = NULL; int tsp_threads_iter = 0; int thread_c_res = 0; /* Thread creation result */ int thread_j_res = 0; /* Thread join res */ int loop_res = 0; /* Function result */ int cursor; struct stat buf_f1; struct stat buf_f2; if (name_f1 == NULL || name_f2 == NULL) { /* Invalid input parameters */ perror("INVALID FNAMES\t"); return (-1); } if (stat(name_f1, &buf_f1) != 0 || stat(name_f2, &buf_f2) != 0) { /* Stat fails */ char emsg[255]; sprintf(emsg, "STAT() ERROR: %s %s\t", name_f1, name_f2); perror(emsg); return (-1); } if (buf_f1.st_size != buf_f2.st_size) { /* File have different sizes */ return (1); } /* Files have the same size, function exec. is continued */ num_blocks = (buf_f1.st_size / BLOCK_SIZE) + 1; num_tsp = (num_blocks / NUM_THREADS) + 1; cursor = 0; for (tsp_iter = 0; tsp_iter < num_tsp; tsp_iter++) { loop_res = 0; /* Create threads array for this spawn */ tsp_threads = malloc(NUM_THREADS * sizeof(*tsp_threads)); if (tsp_threads == NULL) { perror("TSP_THREADS ALLOC FAILURE\t"); return (-1); } /* Create arguments for every thread in the current spawn */ tsp_threads_args = malloc(NUM_THREADS * sizeof(*tsp_threads_args)); if (tsp_threads_args == NULL) { perror("TSP THREADS ARGS ALLOCA FAILURE\t"); return (-1); } /* Initialize arguments and create threads */ for (tsp_threads_iter = 0; tsp_threads_iter < NUM_THREADS; tsp_threads_iter++) { if (cursor >= buf_f1.st_size) { break; } tsp_threads_args[tsp_threads_iter].name_f1 = name_f1; tsp_threads_args[tsp_threads_iter].name_f2 = name_f2; tsp_threads_args[tsp_threads_iter].cursor = cursor; thread_c_res = pthread_create( &tsp_threads[tsp_threads_iter], NULL, check_block_dup, (void*)&tsp_threads_args[tsp_threads_iter]); if (thread_c_res != 0) { perror("THREAD CREATION FAILURE"); return (-1); } cursor+=BLOCK_SIZE; } /* Join last threads and get their status */ while (tsp_threads_iter-->0) { void *thread_res = NULL; thread_j_res = pthread_join(tsp_threads[tsp_threads_iter], &thread_res); if (thread_j_res != 0) { perror("THREAD JOIN FAILURE"); return (-1); } if (strcmp((char*)thread_res, "notdup")==0) { loop_res++; /* Closing other threads and exiting by condition * from loop. */ while (tsp_threads_iter-->0) { pthread_cancel(tsp_threads[tsp_threads_iter]); } } } free(tsp_threads); free(tsp_threads_args); if (loop_res > 0) { break; } } return (loop_res > 0) ? 1 : 0; } The function works fine (at least for what I've tested). Still, some guys from #C (freenode) suggested that the solution is overly complicated, and it may perform poorly because of parallel reading on hddisk. What I want to know: Is the threaded approach flawed by default ? Is fseek() so slow ? Is there a way to somehow map the files to memory and then compare them ?

    Read the article

  • how do you make a "concurrent queue safe" lazy loader (singleton manager) in objective-c

    - by Rich
    Hi, I made this class that turns any object into a singleton, but I know that it's not "concurrent queue safe." Could someone please explain to me how to do this, or better yet, show me the code. To be clear I want to know how to use this with operation queues and dispatch queues (NSOperationQueue and Grand Central Dispatch) on iOS. Thanks in advance, Rich EDIT: I had an idea for how to do it. If someone could confirm it for me I'll do it and post the code. The idea is that proxies make queues all on their own. So if I make a mutable proxy (like Apple does in key-value coding/observing) for any object that it's supposed to return, and always return the same proxy for the same object/identifier pair (using the same kind of lazy loading technique as I used to create the singletons), the proxies would automatically queue up the any messages to the singletons, and make it totally thread safe. IMHO this seems like a lot of work to do, so I don't want to do it if it's not gonna work, or if it's gonna slow my apps down to a crawl. Here's my non-thread safe code: RMSingletonCollector.h // // RMSingletonCollector.h // RMSingletonCollector // // Created by Rich Meade-Miller on 2/11/11. // Copyright 2011 Rich Meade-Miller. All rights reserved. // #import <Foundation/Foundation.h> #import "RMWeakObjectRef.h" struct RMInitializerData { // The method may take one argument. // required SEL designatedInitializer; // data to pass to the initializer or nil. id data; }; typedef struct RMInitializerData RMInitializerData; RMInitializerData RMInitializerDataMake(SEL initializer, id data); @interface NSObject (SingletonCollector) // Returns the selector and data to pass to it (if the selector takes an argument) for use when initializing the singleton. // If you override this DO NOT call super. + (RMInitializerData)designatedInitializerForIdentifier:(NSString *)identifier; @end @interface RMSingletonCollector : NSObject { } + (id)collectionObjectForType:(NSString *)className identifier:(NSString *)identifier; + (id<RMWeakObjectReference>)referenceForObjectOfType:(NSString *)className identifier:(NSString *)identifier; + (void)destroyCollection; + (void)destroyCollectionObjectForType:(NSString *)className identifier:(NSString *)identifier; @end // ==--==--==--==--==Notifications==--==--==--==--== extern NSString *const willDestroySingletonCollection; extern NSString *const willDestroySingletonCollectionObject; RMSingletonCollector.m // // RMSingletonCollector.m // RMSingletonCollector // // Created by Rich Meade-Miller on 2/11/11. // Copyright 2011 Rich Meade-Miller. All rights reserved. // #import "RMSingletonCollector.h" #import <objc/objc-runtime.h> NSString *const willDestroySingletonCollection = @"willDestroySingletonCollection"; NSString *const willDestroySingletonCollectionObject = @"willDestroySingletonCollectionObject"; RMInitializerData RMInitializerDataMake(SEL initializer, id data) { RMInitializerData newData; newData.designatedInitializer = initializer; newData.data = data; return newData; } @implementation NSObject (SingletonCollector) + (RMInitializerData)designatedInitializerForIdentifier:(NSString *)identifier { return RMInitializerDataMake(@selector(init), nil); } @end @interface RMSingletonCollector () + (NSMutableDictionary *)singletonCollection; + (void)setSingletonCollection:(NSMutableDictionary *)newSingletonCollection; @end @implementation RMSingletonCollector static NSMutableDictionary *singletonCollection = nil; + (NSMutableDictionary *)singletonCollection { if (singletonCollection != nil) { return singletonCollection; } NSMutableDictionary *collection = [[NSMutableDictionary alloc] initWithCapacity:1]; [self setSingletonCollection:collection]; [collection release]; return singletonCollection; } + (void)setSingletonCollection:(NSMutableDictionary *)newSingletonCollection { if (newSingletonCollection != singletonCollection) { [singletonCollection release]; singletonCollection = [newSingletonCollection retain]; } } + (id)collectionObjectForType:(NSString *)className identifier:(NSString *)identifier { id obj; NSString *key; if (identifier) { key = [className stringByAppendingFormat:@".%@", identifier]; } else { key = className; } if (obj = [[self singletonCollection] objectForKey:key]) { return obj; } // dynamic creation. // get a class for Class classForName = NSClassFromString(className); if (classForName) { obj = objc_msgSend(classForName, @selector(alloc)); // if the initializer takes an argument... RMInitializerData initializerData = [classForName designatedInitializerForIdentifier:identifier]; if (initializerData.data) { // pass it. obj = objc_msgSend(obj, initializerData.designatedInitializer, initializerData.data); } else { obj = objc_msgSend(obj, initializerData.designatedInitializer); } [singletonCollection setObject:obj forKey:key]; [obj release]; } else { // raise an exception if there is no class for the specified name. NSException *exception = [NSException exceptionWithName:@"com.RMDev.RMSingletonCollector.failed_to_find_class" reason:[NSString stringWithFormat:@"SingletonCollector couldn't find class for name: %@", [className description]] userInfo:nil]; [exception raise]; [exception release]; } return obj; } + (id<RMWeakObjectReference>)referenceForObjectOfType:(NSString *)className identifier:(NSString *)identifier { id obj = [self collectionObjectForType:className identifier:identifier]; RMWeakObjectRef *objectRef = [[RMWeakObjectRef alloc] initWithObject:obj identifier:identifier]; return [objectRef autorelease]; } + (void)destroyCollection { NSDictionary *userInfo = [singletonCollection copy]; [[NSNotificationCenter defaultCenter] postNotificationName:willDestroySingletonCollection object:self userInfo:userInfo]; [userInfo release]; // release the collection and set it to nil. [self setSingletonCollection:nil]; } + (void)destroyCollectionObjectForType:(NSString *)className identifier:(NSString *)identifier { NSString *key; if (identifier) { key = [className stringByAppendingFormat:@".%@", identifier]; } else { key = className; } [[NSNotificationCenter defaultCenter] postNotificationName:willDestroySingletonCollectionObject object:[singletonCollection objectForKey:key] userInfo:nil]; [singletonCollection removeObjectForKey:key]; } @end RMWeakObjectRef.h // // RMWeakObjectRef.h // RMSingletonCollector // // Created by Rich Meade-Miller on 2/12/11. // Copyright 2011 Rich Meade-Miller. All rights reserved. // // In order to offset the performance loss from always having to search the dictionary, I made a retainable, weak object reference class. #import <Foundation/Foundation.h> @protocol RMWeakObjectReference <NSObject> @property (nonatomic, assign, readonly) id objectRef; @property (nonatomic, retain, readonly) NSString *className; @property (nonatomic, retain, readonly) NSString *objectIdentifier; @end @interface RMWeakObjectRef : NSObject <RMWeakObjectReference> { id objectRef; NSString *className; NSString *objectIdentifier; } - (RMWeakObjectRef *)initWithObject:(id)object identifier:(NSString *)identifier; - (void)objectWillBeDestroyed:(NSNotification *)notification; @end RMWeakObjectRef.m // // RMWeakObjectRef.m // RMSingletonCollector // // Created by Rich Meade-Miller on 2/12/11. // Copyright 2011 Rich Meade-Miller. All rights reserved. // #import "RMWeakObjectRef.h" #import "RMSingletonCollector.h" @implementation RMWeakObjectRef @dynamic objectRef; @synthesize className, objectIdentifier; - (RMWeakObjectRef *)initWithObject:(id)object identifier:(NSString *)identifier { if (self = [super init]) { NSString *classNameForObject = NSStringFromClass([object class]); className = classNameForObject; objectIdentifier = identifier; objectRef = object; [[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(objectWillBeDestroyed:) name:willDestroySingletonCollectionObject object:object]; [[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(objectWillBeDestroyed:) name:willDestroySingletonCollection object:[RMSingletonCollector class]]; } return self; } - (id)objectRef { if (objectRef) { return objectRef; } objectRef = [RMSingletonCollector collectionObjectForType:className identifier:objectIdentifier]; return objectRef; } - (void)objectWillBeDestroyed:(NSNotification *)notification { objectRef = nil; } - (void)dealloc { [[NSNotificationCenter defaultCenter] removeObserver:self]; [className release]; [super dealloc]; } @end

    Read the article

  • Dealing with external processes

    - by Jesse Aldridge
    I've been working on a gui app that needs to manage external processes. Working with external processes leads to a lot of issues that can make a programmer's life difficult. I feel like maintenence on this app is taking an unacceptably long time. I've been trying to list the things that make working with external processes difficult so that I can come up with ways of mitigating the pain. This kind of turned into a rant which I thought I'd post here in order to get some feedback and to provide some guidance to anybody thinking about sailing into these very murky waters. Here's what I've got so far: Output from the child can get mixed up with output from the parent. This can make both outputs misleading and hard to read. It can be hard to tell what came from where. It becomes harder to figure out what's going on when things are asynchronous. Here's a contrived example: import textwrap, os, time from subprocess import Popen test_path = 'test_file.py' with open(test_path, 'w') as file: file.write(textwrap.dedent(''' import time for i in range(3): print 'Hello %i' % i time.sleep(1)''')) proc = Popen('python -B "%s"' % test_path) for i in range(3): print 'Hello %i' % i time.sleep(1) os.remove(test_path) I guess I could have the child process write its output to a file. But it can be annoying to have to open up a file every time I want to see the result of a print statement. If I have code for the child process I could add a label, something like print 'child: Hello %i', but it can be annoying to do that for every print. And it adds some noise to the output. And of course I can't do it if I don't have access to the code. I could manually manage the process output. But then you open up a huge can of worms with threads and polling and stuff like that. A simple solution is to treat processes like synchronous functions, that is, no further code executes until the process completes. In other words, make the process block. But that doesn't work if you're building a gui app. Which brings me to the next problem... Blocking processes cause the gui to become unresponsive. import textwrap, sys, os from subprocess import Popen from PyQt4.QtGui import * from PyQt4.QtCore import * test_path = 'test_file.py' with open(test_path, 'w') as file: file.write(textwrap.dedent(''' import time for i in range(3): print 'Hello %i' % i time.sleep(1)''')) app = QApplication(sys.argv) button = QPushButton('Launch process') def launch_proc(): # Can't move the window until process completes proc = Popen('python -B "%s"' % test_path) proc.communicate() button.connect(button, SIGNAL('clicked()'), launch_proc) button.show() app.exec_() os.remove(test_path) Qt provides a process wrapper of its own called QProcess which can help with this. You can connect functions to signals to capture output relatively easily. This is what I'm currently using. But I'm finding that all these signals behave suspiciously like goto statements and can lead to spaghetti code. I think I want to get sort-of blocking behavior by having the 'finished' signal from QProcess call a function containing all the code that comes after the process call. I think that should work but I'm still a bit fuzzy on the details... Stack traces get interrupted when you go from the child process back to the parent process. If a normal function screws up, you get a nice complete stack trace with filenames and line numbers. If a subprocess screws up, you'll be lucky if you get any output at all. You end up having to do a lot more detective work everytime something goes wrong. Speaking of which, output has a way of disappearing when dealing external processes. Like if you run something via the windows 'cmd' command, the console will pop up, execute the code, and then disappear before you have a chance to see the output. You have to pass the /k flag to make it stick around. Similar issues seem to crop up all the time. I suppose both problems 3 and 4 have the same root cause: no exception handling. Exception handling is meant to be used with functions, it doesn't work with processes. Maybe there's some way to get something like exception handling for processes? I guess that's what stderr is for? But dealing with two different streams can be annoying in itself. Maybe I should look into this more... Processes can hang and stick around in the background without you realizing it. So you end up yelling at your computer cuz it's going so slow until you finally bring up your task manager and see 30 instances of the same process hanging out in the background. Also, hanging background processes can interefere with other instances of the process in various fun ways, such as causing permissions errors by holding a handle to a file or someting like that. It seems like an easy solution to this would be to have the parent process kill the child process on exit if the child process didn't close itself. But if the parent process crashes, cleanup code might not get called and the child can be left hanging. Also, if the parent waits for the child to complete, and the child is in an infinite loop or something, you can end up with two hanging processes. This problem can tie in to problem 2 for extra fun, causing your gui to stop responding entirely and force you to kill everything with the task manager. F***ing quotes Parameters often need to be passed to processes. This is a headache in itself. Especially if you're dealing with file paths. Say... 'C:/My Documents/whatever/'. If you don't have quotes, the string will often be split at the space and interpreted as two arguments. If you need nested quotes you can use ' and ". But if you need to use more than two layers of quotes, you have to do some nasty escaping, for example: "cmd /k 'python \'path 1\' \'path 2\''". A good solution to this problem is passing parameters as a list rather than as a single string. Subprocess allows you to do this. Can't easily return data from a subprocess. You can use stdout of course. But what if you want to throw a print in there for debugging purposes? That's gonna screw up the parent if it's expecting output formatted a certain way. In functions you can print one string and return another and everything works just fine. Obscure command-line flags and a crappy terminal based help system. These are problems I often run into when using os level apps. Like the /k flag I mentioned, for holding a cmd window open, who's idea was that? Unix apps don't tend to be much friendlier in this regard. Hopefully you can use google or StackOverflow to find the answer you need. But if not, you've got a lot of boring reading and frusterating trial and error to do. External factors. This one's kind of fuzzy. But when you leave the relatively sheltered harbor of your own scripts to deal with external processes you find yourself having to deal with the "outside world" to a much greater extent. And that's a scary place. All sorts of things can go wrong. Just to give a random example: the cwd in which a process is run can modify it's behavior. There are probably other issues, but those are the ones I've written down so far. Any other snags you'd like to add? Any suggestions for dealing with these problems?

    Read the article

  • .Net 3.5 Asynchronous Socket Server Performance Problem

    - by iBrAaAa
    I'm developing an Asynchronous Game Server using .Net Socket Asynchronous Model( BeginAccept/EndAccept...etc.) The problem I'm facing is described like that: When I have only one client connected, the server response time is very fast but once a second client connects, the server response time increases too much. I've measured the time from a client sends a message to the server until it gets the reply in both cases. I found that the average time in case of one client is about 17ms and in case of 2 clients about 280ms!!! What I really see is that: When 2 clients are connected and only one of them is moving(i.e. requesting service from the server) it is equivalently equal to the case when only one client is connected(i.e. fast response). However, when the 2 clients move at the same time(i.e. requests service from the server at the same time) their motion becomes very slow (as if the server replies each one of them in order i.e. not simultaneously). Basically, what I am doing is that: When a client requests a permission for motion from the server and the server grants him the request, the server then broadcasts the new position of the client to all the players. So if two clients are moving in the same time, the server is eventually trying to broadcast to both clients the new position of each of them at the same time. EX: Client1 asks to go to position (2,2) Client2 asks to go to position (5,5) Server sends to each of Client1 & Client2 the same two messages: message1: "Client1 at (2,2)" message2: "Client2 at (5,5)" I believe that the problem comes from the fact that Socket class is thread safe according MSDN documentation http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.aspx. (NOT SURE THAT IT IS THE PROBLEM) Below is the code for the server: /// /// This class is responsible for handling packet receiving and sending /// public class NetworkManager { /// /// An integer to hold the server port number to be used for the connections. Its default value is 5000. /// private readonly int port = 5000; /// /// hashtable contain all the clients connected to the server. /// key: player Id /// value: socket /// private readonly Hashtable connectedClients = new Hashtable(); /// /// An event to hold the thread to wait for a new client /// private readonly ManualResetEvent resetEvent = new ManualResetEvent(false); /// /// keeps track of the number of the connected clients /// private int clientCount; /// /// The socket of the server at which the clients connect /// private readonly Socket mainSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp); /// /// The socket exception that informs that a client is disconnected /// private const int ClientDisconnectedErrorCode = 10054; /// /// The only instance of this class. /// private static readonly NetworkManager networkManagerInstance = new NetworkManager(); /// /// A delegate for the new client connected event. /// /// the sender object /// the event args public delegate void NewClientConnected(Object sender, SystemEventArgs e); /// /// A delegate for the position update message reception. /// /// the sender object /// the event args public delegate void PositionUpdateMessageRecieved(Object sender, PositionUpdateEventArgs e); /// /// The event which fires when a client sends a position message /// public PositionUpdateMessageRecieved PositionUpdateMessageEvent { get; set; } /// /// keeps track of the number of the connected clients /// public int ClientCount { get { return clientCount; } } /// /// A getter for this class instance. /// /// only instance. public static NetworkManager NetworkManagerInstance { get { return networkManagerInstance; } } private NetworkManager() {} /// Starts the game server and holds this thread alive /// public void StartServer() { //Bind the mainSocket to the server IP address and port mainSocket.Bind(new IPEndPoint(IPAddress.Any, port)); //The server starts to listen on the binded socket with max connection queue //1024 mainSocket.Listen(1024); //Start accepting clients asynchronously mainSocket.BeginAccept(OnClientConnected, null); //Wait until there is a client wants to connect resetEvent.WaitOne(); } /// /// Receives connections of new clients and fire the NewClientConnected event /// private void OnClientConnected(IAsyncResult asyncResult) { Interlocked.Increment(ref clientCount); ClientInfo newClient = new ClientInfo { WorkerSocket = mainSocket.EndAccept(asyncResult), PlayerId = clientCount }; //Add the new client to the hashtable and increment the number of clients connectedClients.Add(newClient.PlayerId, newClient); //fire the new client event informing that a new client is connected to the server if (NewClientEvent != null) { NewClientEvent(this, System.EventArgs.Empty); } newClient.WorkerSocket.BeginReceive(newClient.Buffer, 0, BasePacket.GetMaxPacketSize(), SocketFlags.None, new AsyncCallback(WaitForData), newClient); //Start accepting clients asynchronously again mainSocket.BeginAccept(OnClientConnected, null); } /// Waits for the upcoming messages from different clients and fires the proper event according to the packet type. /// /// private void WaitForData(IAsyncResult asyncResult) { ClientInfo sendingClient = null; try { //Take the client information from the asynchronous result resulting from the BeginReceive sendingClient = asyncResult.AsyncState as ClientInfo; // If client is disconnected, then throw a socket exception // with the correct error code. if (!IsConnected(sendingClient.WorkerSocket)) { throw new SocketException(ClientDisconnectedErrorCode); } //End the pending receive request sendingClient.WorkerSocket.EndReceive(asyncResult); //Fire the appropriate event FireMessageTypeEvent(sendingClient.ConvertBytesToPacket() as BasePacket); // Begin receiving data from this client sendingClient.WorkerSocket.BeginReceive(sendingClient.Buffer, 0, BasePacket.GetMaxPacketSize(), SocketFlags.None, new AsyncCallback(WaitForData), sendingClient); } catch (SocketException e) { if (e.ErrorCode == ClientDisconnectedErrorCode) { // Close the socket. if (sendingClient.WorkerSocket != null) { sendingClient.WorkerSocket.Close(); sendingClient.WorkerSocket = null; } // Remove it from the hash table. connectedClients.Remove(sendingClient.PlayerId); if (ClientDisconnectedEvent != null) { ClientDisconnectedEvent(this, new ClientDisconnectedEventArgs(sendingClient.PlayerId)); } } } catch (Exception e) { // Begin receiving data from this client sendingClient.WorkerSocket.BeginReceive(sendingClient.Buffer, 0, BasePacket.GetMaxPacketSize(), SocketFlags.None, new AsyncCallback(WaitForData), sendingClient); } } /// /// Broadcasts the input message to all the connected clients /// /// public void BroadcastMessage(BasePacket message) { byte[] bytes = message.ConvertToBytes(); foreach (ClientInfo client in connectedClients.Values) { client.WorkerSocket.BeginSend(bytes, 0, bytes.Length, SocketFlags.None, SendAsync, client); } } /// /// Sends the input message to the client specified by his ID. /// /// /// The message to be sent. /// The id of the client to receive the message. public void SendToClient(BasePacket message, int id) { byte[] bytes = message.ConvertToBytes(); (connectedClients[id] as ClientInfo).WorkerSocket.BeginSend(bytes, 0, bytes.Length, SocketFlags.None, SendAsync, connectedClients[id]); } private void SendAsync(IAsyncResult asyncResult) { ClientInfo currentClient = (ClientInfo)asyncResult.AsyncState; currentClient.WorkerSocket.EndSend(asyncResult); } /// Fires the event depending on the type of received packet /// /// The received packet. void FireMessageTypeEvent(BasePacket packet) { switch (packet.MessageType) { case MessageType.PositionUpdateMessage: if (PositionUpdateMessageEvent != null) { PositionUpdateMessageEvent(this, new PositionUpdateEventArgs(packet as PositionUpdatePacket)); } break; } } } The events fired are handled in a different class, here are the event handling code for the PositionUpdateMessage (Other handlers are irrelevant): private readonly Hashtable onlinePlayers = new Hashtable(); /// /// Constructor that creates a new instance of the GameController class. /// private GameController() { //Start the server server = new Thread(networkManager.StartServer); server.Start(); //Create an event handler for the NewClientEvent of networkManager networkManager.PositionUpdateMessageEvent += OnPositionUpdateMessageReceived; } /// /// this event handler is called when a client asks for movement. /// private void OnPositionUpdateMessageReceived(object sender, PositionUpdateEventArgs e) { Point currentLocation = ((PlayerData)onlinePlayers[e.PositionUpdatePacket.PlayerId]).Position; Point locationRequested = e.PositionUpdatePacket.Position; ((PlayerData)onlinePlayers[e.PositionUpdatePacket.PlayerId]).Position = locationRequested; // Broadcast the new position networkManager.BroadcastMessage(new PositionUpdatePacket { Position = locationRequested, PlayerId = e.PositionUpdatePacket.PlayerId }); }

    Read the article

  • Why does C qicksort function implementation works much slower (tape comparations, tape swapping) than bobble sort function?

    - by Artur Mustafin
    I'm going to implement a toy tape "mainframe" for a students, showing the quickness of "quicksort" class functions (recursive or not, does not really matters, due to the slow hardware, and well known stack reversal techniques) comparatively to the "bubblesort" function class, so, while I'm clear about the hardware implementation ans controllers, i guessed that quicksort function is much faster that other ones in terms of sequence, order and comparation distance (it is much faster to rewind the tape from the middle than from the very end, because of different speed of rewind). Unfortunately, this is not the true, this simple "bubble" code shows great improvements comparatively to the "quicksort" functions in terms of comparison distances, direction and number of comparisons and writes. So I have 3 questions: Does I have mistaken in my implememtation of quicksort function? Does I have mistaken in my implememtation of bubblesoft function? If not, why the "bubblesort" function is works much faster in (comparison and write operations) than "quicksort" function? I already have a "quicksort" function: void quicksort(float *a, long l, long r, const compare_function& compare) { long i=l, j=r, temp, m=(l+r)/2; if (l == r) return; if (l == r-1) { if (compare(a, l, r)) { swap(a, l, r); } return; } if (l < r-1) { while (1) { i = l; j = r; while (i < m && !compare(a, i, m)) i++; while (m < j && !compare(a, m, j)) j--; if (i >= j) { break; } swap(a, i, j); } if (l < m) quicksort(a, l, m, compare); if (m < r) quicksort(a, m, r, compare); return; } } and the kind of my own implememtation of the "bubblesort" function: void bubblesort(float *a, long l, long r, const compare_function& compare) { long i, j, k; if (l == r) { return; } if (l == r-1) { if (compare(a, l, r)) { swap(a, l, r); } return; } if (l < r-1) { while(l < r) { i = l; j = l; while (i < r) { i++; if (!compare(a, j, i)) { continue; } j = i; } if (l < j) { swap(a, l, j); } l++; i = r; k = r; while(l < i) { i--; if (!compare(a, i, k)) { continue; } k = i; } if (k < r) { swap(a, k, r); } r--; } return; } } I have used this sort functions in a test sample code, like this: #include <stdio.h> #include <stdlib.h> #include <math.h> #include <conio.h> long swap_count; long compare_count; typedef long (*compare_function)(float *, long, long ); typedef void (*sort_function)(float *, long , long , const compare_function& ); void init(float *, long ); void print(float *, long ); void sort(float *, long, const sort_function& ); void swap(float *a, long l, long r); long less(float *a, long l, long r); long greater(float *a, long l, long r); void bubblesort(float *, long , long , const compare_function& ); void quicksort(float *, long , long , const compare_function& ); void main() { int n; printf("n="); scanf("%d",&n); printf("\r\n"); long i; float *a = (float *)malloc(n*n*sizeof(float)); sort(a, n, &bubblesort); print(a, n); sort(a, n, &quicksort); print(a, n); free(a); } long less(float *a, long l, long r) { compare_count++; return *(a+l) < *(a+r) ? 1 : 0; } long greater(float *a, long l, long r) { compare_count++; return *(a+l) > *(a+r) ? 1 : 0; } void swap(float *a, long l, long r) { swap_count++; float temp; temp = *(a+l); *(a+l) = *(a+r); *(a+r) = temp; } float tg(float x) { return tan(x); } float ctg(float x) { return 1.0/tan(x); } void init(float *m,long n) { long i,j; for (i = 0; i < n; i++) { for (j=0; j< n; j++) { m[i + j*n] = tg(0.2*(i+1)) + ctg(0.3*(j+1)); } } } void print(float *m, long n) { long i, j; for(i = 0; i < n; i++) { for(j = 0; j < n; j++) { printf(" %5.1f", m[i + j*n]); } printf("\r\n"); } printf("\r\n"); } void sort(float *a, long n, const sort_function& sort) { long i, sort_compare = 0, sort_swap = 0; init(a,n); for(i = 0; i < n*n; i+=n) { if (fmod (i / n, 2) == 0) { compare_count = 0; swap_count = 0; sort(a, i, i+n-1, &less); if (swap_count == 0) { compare_count = 0; sort(a, i, i+n-1, &greater); } sort_compare += compare_count; sort_swap += swap_count; } } printf("compare=%ld\r\n", sort_compare); printf("swap=%ld\r\n", sort_swap); printf("\r\n"); }

    Read the article

  • Reading in a 5000 line text file on the Iphone

    - by howsyourface
    Gday, I am trying to create a tiled map for my game, i have had this previously working using other xml methods but i had memory leaks and all sorts of errors. However i had a map load time of about 2.5 - 3 seconds. So i rewrote all of the code using NSMutableStrings and NSStrings. After my best attempt at optomizing it i had a map load time of 10 - 11 seconds, which is far too slow. So i have now rewritten the code using char* arrays, only to now have a load time of 18 seconds -_-. Here is the latest code, i don't know much c so i could have easily botched the whole thing up. FILE* file = fopen(a, "r"); fseek(file, 0L, SEEK_END); length = ftell(file); fseek(file,0L, SEEK_SET); char fileText[length +1]; char buffer[1024];// = malloc(1024); while(fgets(buffer, 1024, file) != NULL) { strncat(fileText, buffer, strlen(buffer)); } fclose(file); [self parseMapFile:fileText]; - (void)parseMapFile:(char*)tiledXML { currentLayerID = 0; currentTileSetID = 0; tileX = 0; tileY = 0; int tmpGid; NSString* tmpName; int tmpTileWidth; int tmpTileHeight; int tilesetCounter = 0; NSString* tmpLayerName; int tmpLayerHeight; int tmpLayerWidth; int layerCounter = 0; tileX = 0; tileY = 0; int tmpFirstGid = 0; int x; int index; char* r; int counter = 0; while ((x = [self findSubstring:tiledXML substring:"\n"]) != 0) { counter ++; char result[x + 1]; r = &result[0]; [self substringIndex:tiledXML index:x newArray:result]; tiledXML += x+2; index = 0; if (counter == 1) { continue; } else if (counter == 2) { char result1[5]; index = [self getStringBetweenStrings:r substring1:"th=\"" substring2:"\"" newArray:result1]; if (r != 0); mapWidth = atoi(result1); r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"ht=\"" substring2:"\"" newArray:result1]; if (r != 0); mapHeight = atoi(result1); r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"th=\"" substring2:"\"" newArray:result1]; if (r != 0); tileWidth = atoi(result1); r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"ht=\"" substring2:"\"" newArray:result1]; if (r != 0); tileHeight = atoi(result1); continue; } char result2[50]; char result3[3]; if ((index = [self getStringBetweenStrings:r substring1:" gid=\"" substring2:"\"" newArray:result3]) != 0) { tmpGid = atoi(result3); free(result2); if(tmpGid == 0) { [currentLayer addTileAtX:tileX y:tileY tileSetID:-1 tileID:0 globalID:0]; } else { [currentLayer addTileAtX:tileX y:tileY tileSetID:[currentTileSet tileSetID] tileID:tmpGid - [currentTileSet firstGID] globalID:tmpGid]; } tileX ++; if (tileX > [currentLayer layerWidth]-1) { tileY ++; tileX = 0; } } else if ((index = [self getStringBetweenStrings:r substring1:"tgid=\"" substring2:"\"" newArray:result2]) != 0) { tmpFirstGid = atoi(result2); r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"me=\"" substring2:"\"" newArray:result2]; if (r != 0); tmpName = [NSString stringWithUTF8String:result2]; r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"th=\"" substring2:"\"" newArray:result2]; if (r != 0); tmpTileWidth = atoi(result2); r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"ht=\"" substring2:"\"" newArray:result2]; if (r != 0); tmpTileHeight = atoi(result2); } else if ((index = [self getStringBetweenStrings:r substring1:"rce=\"" substring2:"\"" newArray:result2]) != 0) { currentTileSet = [[TileSet alloc] initWithImageNamed:[NSString stringWithUTF8String:result2] name:tmpName tileSetID:tilesetCounter firstGID:tmpFirstGid tileWidth:tmpTileWidth tileHeight:tmpTileHeight spacing:0]; [tileSets addObject:currentTileSet]; [currentTileSet release]; tilesetCounter ++; } else if ((index = [self getStringBetweenStrings:r substring1:"r name=\"" substring2:"\"" newArray:result2]) != 0) { tileX = 0; tileY = 0; tmpLayerName = [NSString stringWithUTF8String:result2]; r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"th=\"" substring2:"\"" newArray:result2]; if (r != 0); tmpLayerWidth = atoi(result2); r += index +1; index = 0; index = [self getStringBetweenStrings:r substring1:"ht=\"" substring2:"\"" newArray:result2]; if (r != 0); tmpLayerHeight = atoi(result2); currentLayer = [[Layer alloc] initWithName:tmpLayerName layerID:layerCounter layerWidth:tmpLayerWidth layerHeight:tmpLayerHeight]; [layers addObject:currentLayer]; [currentLayer release]; layerCounter ++; } } } -(void)substringIndex:(char*)c index:(int)x newArray:(char*)result { result[0] = 0; for (int i = 0; i < strlen(c); i++) { result[i] = c[i]; if (i == x) { result[i+1] = '\0'; break; } } } -(int)findSubstring:(char*)c substring:(char*)s { int sCounter = 0; int index = 0; int d; for (int i = 0; i < strlen(c); i ++) { if (i > 500)//max line size break; if (c[i] == s[sCounter]) { d = strlen(s); sCounter ++; if (d > sCounter) { } else { index = i - (d); break; } } else sCounter = 0; } return index; } -(int)getStringBetweenStrings:(char*)c substring1:(char*)s substring2:(char*)s2 newArray:(char*)result { int sCounter = 0; int sCounter2 = 0; int index = 0; int index2 = 0; int d; for (int i = 0; i < strlen(c); i ++) { if (index != 0) { if (c[i] == s2[sCounter2]) { d = strlen(s2); sCounter2 ++; if (d > sCounter2) { } else { index2 = i - (d); break; } } else sCounter2 = 0; } else { if (c[i] == s[sCounter]) { d = strlen(s); sCounter ++; if (d > sCounter) { } else { index = i; } } else sCounter = 0; } } if (index != 0 && index2 != 0) [self substringIndex:(c + index+1) index:index2-index-1 newArray:result]; return index; } (I know it's a lot of code to be putting in here) I thought the by using basic char arrays i could drastically increase the performance, at least over the initial node based code that i was replacing. Thanks for all your efforts.

    Read the article

  • Handling inheritance with overriding efficiently

    - by Fyodor Soikin
    I have the following two data structures. First, a list of properties applied to object triples: Object1 Object2 Object3 Property Value O1 O2 O3 P1 "abc" O1 O2 O3 P2 "xyz" O1 O3 O4 P1 "123" O2 O4 O5 P1 "098" Second, an inheritance tree: O1 O2 O4 O3 O5 Or viewed as a relation: Object Parent O2 O1 O4 O2 O3 O1 O5 O3 O1 null The semantics of this being that O2 inherits properties from O1; O4 - from O2 and O1; O3 - from O1; and O5 - from O3 and O1, in that order of precedence. NOTE 1: I have an efficient way to select all children or all parents of a given object. This is currently implemented with left and right indexes, but hierarchyid could also work. This does not seem important right now. NOTE 2: I have tiggers in place that make sure that the "Object" column always contains all possible objects, even when they do not really have to be there (i.e. have no parent or children defined). This makes it possible to use inner joins rather than severely less effiecient outer joins. The objective is: Given a pair of (Property, Value), return all object triples that have that property with that value either defined explicitly or inherited from a parent. NOTE 1: An object triple (X,Y,Z) is considered a "parent" of triple (A,B,C) when it is true that either X = A or X is a parent of A, and the same is true for (Y,B) and (Z,C). NOTE 2: A property defined on a closer parent "overrides" the same property defined on a more distant parent. NOTE 3: When (A,B,C) has two parents - (X1,Y1,Z1) and (X2,Y2,Z2), then (X1,Y1,Z1) is considered a "closer" parent when: (a) X2 is a parent of X1, or (b) X2 = X1 and Y2 is a parent of Y1, or (c) X2 = X1 and Y2 = Y1 and Z2 is a parent of Z1 In other words, the "closeness" in ancestry for triples is defined based on the first components of the triples first, then on the second components, then on the third components. This rule establishes an unambigous partial order for triples in terms of ancestry. For example, given the pair of (P1, "abc"), the result set of triples will be: O1, O2, O3 -- Defined explicitly O1, O2, O5 -- Because O5 inherits from O3 O1, O4, O3 -- Because O4 inherits from O2 O1, O4, O5 -- Because O4 inherits from O2 and O5 inherits from O3 O2, O2, O3 -- Because O2 inherits from O1 O2, O2, O5 -- Because O2 inherits from O1 and O5 inherits from O3 O2, O4, O3 -- Because O2 inherits from O1 and O4 inherits from O2 O3, O2, O3 -- Because O3 inherits from O1 O3, O2, O5 -- Because O3 inherits from O1 and O5 inherits from O3 O3, O4, O3 -- Because O3 inherits from O1 and O4 inherits from O2 O3, O4, O5 -- Because O3 inherits from O1 and O4 inherits from O2 and O5 inherits from O3 O4, O2, O3 -- Because O4 inherits from O1 O4, O2, O5 -- Because O4 inherits from O1 and O5 inherits from O3 O4, O4, O3 -- Because O4 inherits from O1 and O4 inherits from O2 O5, O2, O3 -- Because O5 inherits from O1 O5, O2, O5 -- Because O5 inherits from O1 and O5 inherits from O3 O5, O4, O3 -- Because O5 inherits from O1 and O4 inherits from O2 O5, O4, O5 -- Because O5 inherits from O1 and O4 inherits from O2 and O5 inherits from O3 Note that the triple (O2, O4, O5) is absent from this list. This is because property P1 is defined explicitly for the triple (O2, O4, O5) and this prevents that triple from inheriting that property from (O1, O2, O3). Also note that the triple (O4, O4, O5) is also absent. This is because that triple inherits its value of P1="098" from (O2, O4, O5), because it is a closer parent than (O1, O2, O3). The straightforward way to do it is the following. First, for every triple that a property is defined on, select all possible child triples: select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value from TriplesAndProperties tp -- Select corresponding objects of the triple inner join Objects as Objects1 on Objects1.Id = tp.O1 inner join Objects as Objects2 on Objects2.Id = tp.O2 inner join Objects as Objects3 on Objects3.Id = tp.O3 -- Then add all possible children of all those objects inner join Objects as Children1 on Objects1.Id [isparentof] Children1.Id inner join Objects as Children2 on Objects2.Id [isparentof] Children2.Id inner join Objects as Children3 on Objects3.Id [isparentof] Children3.Id But this is not the whole story: if some triple inherits the same property from several parents, this query will yield conflicting results. Therefore, second step is to select just one of those conflicting results: select * from ( select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value, row_number() over( partition by Children1.Id, Children2.Id, Children3.Id, tp.Property order by Objects1.[depthInTheTree] descending, Objects2.[depthInTheTree] descending, Objects3.[depthInTheTree] descending ) as InheritancePriority from ... (see above) ) where InheritancePriority = 1 The window function row_number() over( ... ) does the following: for every unique combination of objects triple and property, it sorts all values by the ancestral distance from the triple to the parents that the value is inherited from, and then I only select the very first of the resulting list of values. A similar effect can be achieved with a GROUP BY and ORDER BY statements, but I just find the window function semantically cleaner (the execution plans they yield are identical). The point is, I need to select the closest of contributing ancestors, and for that I need to group and then sort within the group. And finally, now I can simply filter the result set by Property and Value. This scheme works. Very reliably and predictably. It has proven to be very powerful for the business task it implements. The only trouble is, it is awfuly slow. One might point out the join of seven tables might be slowing things down, but that is actually not the bottleneck. According to the actual execution plan I'm getting from the SQL Management Studio (as well as SQL Profiler), the bottleneck is the sorting. The problem is, in order to satisfy my window function, the server has to sort by Children1.Id, Children2.Id, Children3.Id, tp.Property, Parents1.[depthInTheTree] descending, Parents2.[depthInTheTree] descending, Parents3.[depthInTheTree] descending, and there can be no indexes it can use, because the values come from a cross join of several tables. EDIT: Per Michael Buen's suggestion (thank you, Michael), I have posted the whole puzzle to sqlfiddle here. One can see in the execution plan that the Sort operation accounts for 32% of the whole query, and that is going to grow with the number of total rows, because all the other operations use indexes. Usually in such cases I would use an indexed view, but not in this case, because indexed views cannot contain self-joins, of which there are six. The only way that I can think of so far is to create six copies of the Objects table and then use them for the joins, thus enabling an indexed view. Did the time come that I shall be reduced to that kind of hacks? The despair sets in.

    Read the article

< Previous Page | 302 303 304 305 306 307  | Next Page >