Search Results

Search found 5756 results on 231 pages for 'cpu utilization'.

Page 75/231 | < Previous Page | 71 72 73 74 75 76 77 78 79 80 81 82  | Next Page >

  • How to get faster graphics in KVM? VNC is painfully slow with Haiku OS guest, Spice won't install and SDL doesn't work

    - by Don Quixote
    I've been coming up to speed on the Haiku operating system, an Open Source clone of BeOS 5 Pro. I'm using an Apple MacBook Pro as my development machine. Apple's BootCamp BIOS does not support more than four partitions on the internal hard drive. While I can set up extended and logical partitions, doing so will prevent any of the installed operating systems from booting. To run Haiku directly on the iron, I boot it off a USB stick. Using external storage is also helpful because I am perpetually out of filesystem space. While VirtualBox is documented to allow access to physical drives, I could not actually get it to work. Also VirtualBox can only use one of the host CPU's cores. While VB guests can be configured for more than one CPU, they are only emulated. A full build of the Haiku OS takes 4.5 under VB. I had the hope of reducing build times by using KVM instead, but it's not working nearly as well as VirtualBox did. The Linux Kernel Virtual Machine is broken in all manner of fundamental ways as seen from Haiku. But I'm a coder; maybe I could contribute to fixing some of those problems. The first problem I've got is that Haiku's video in virt-manager is quite painfully slow. When I drag Haiku windows around the desktop, they lag quite far behind where my mouse is. It's quite difficult to move a window to a precise position on the screen. Just imagine that the mouse was connected to the window title bar with a really stretchy spring. Also Haiku's mouse lags quite far behind where I have moved it. I found lots of Personal Package Archives that enable Spice from QEMU / KVM at the Ubuntu Personal Package Arhives. I tried a few of the PPAs but none of them worked; with one of them, the command "add-apt-repository" crashed with a traceback. There is a Wiki page about Spice, but it says that it only works on 64-bit. My Early 2006 MacBook Pro is 32-bit. Its Apple Model Identifier is MacBookPro1,1; these use Core Duos NOT Core 2 Duos. I don't mind building a source deb for 32-bit if I can expect it to work. Is there some reason that Spice should be 64-bit only? Does it need features of the x86_64 Instruction Set Architecture that x86 does not have? When I try using SDL from virt-manager, the configuration for Local SDL Window says "Xauth: /home/mike/.Xauthority". When I try to start my guest, virt-manager emits an error. When I Googled the error message, the usual solution was to make ~/.Xauthority readible. However, .Xauthorty does not exist in my home directory. Instead I have a $XAUTHORITY environment variable. There is no way to configure SDL in virt-manager to use $XAUTHORITY instead of ~/.Xauthority. Neither does it work to copy the value of $XAUTHORITY into the file. I am ready to scream, because I've been five fscking days trying to make KVM work for Haiku development. There is a whole lot more that is broken than the slow video. All I really want to do for now is speed up my full builds of Haiku by using "jam -j2" to use both cores in my CPU. I may try Xen next, but the last time I monkeyed with Xen it was far, far more broken than I am finding KVM to be. Just for now, I would be satisfied if there were some way to use my USB stick as a drive in VirtualBox. VB does allow me to configure /dev/sdb as a drive, but it always causes a fatal error when I try to launch the guest. Thank You For Any Advice You Can Give Me. -

    Read the article

  • Using Appendbuffers in unity for terrain generation

    - by Wardy
    Like many others I figured I would try and make the most of the monster processing power of the GPU but I'm having trouble getting the basics in place. CPU code: using UnityEngine; using System.Collections; public class Test : MonoBehaviour { public ComputeShader Generator; public MeshTopology Topology; void OnEnable() { var computedMeshPoints = ComputeMesh(); CreateMeshFrom(computedMeshPoints); } private Vector3[] ComputeMesh() { var size = (32*32) * 4; // 4 points added for each x,z pos var buffer = new ComputeBuffer(size, 12, ComputeBufferType.Append); Generator.SetBuffer(0, "vertexBuffer", buffer); Generator.Dispatch(0, 1, 1, 1); var results = new Vector3[size]; buffer.GetData(results); buffer.Dispose(); return results; } private void CreateMeshFrom(Vector3[] generatedPoints) { var filter = GetComponent<MeshFilter>(); var renderer = GetComponent<MeshRenderer>(); if (generatedPoints.Length > 0) { var mesh = new Mesh { vertices = generatedPoints }; var colors = new Color[generatedPoints.Length]; var indices = new int[generatedPoints.Length]; //TODO: build this different based on topology of the mesh being generated for (int i = 0; i < indices.Length; i++) { indices[i] = i; colors[i] = Color.blue; } mesh.SetIndices(indices, Topology, 0); mesh.colors = colors; mesh.RecalculateNormals(); mesh.Optimize(); mesh.RecalculateBounds(); filter.sharedMesh = mesh; } else { filter.sharedMesh = null; } } } GPU code: #pragma kernel Generate AppendStructuredBuffer<float3> vertexBuffer : register(u0); void genVertsAt(uint2 xzPos) { //TODO: put some height generation code here. // could even run marching cubes / dual contouring code. float3 corner1 = float3( xzPos[0], 0, xzPos[1] ); float3 corner2 = float3( xzPos[0] + 1, 0, xzPos[1] ); float3 corner3 = float3( xzPos[0], 0, xzPos[1] + 1); float3 corner4 = float3( xzPos[0] + 1, 0, xzPos[1] + 1 ); vertexBuffer.Append(corner1); vertexBuffer.Append(corner2); vertexBuffer.Append(corner3); vertexBuffer.Append(corner4); } [numthreads(32, 1, 32)] void Generate (uint3 threadId : SV_GroupThreadID, uint3 groupId : SV_GroupID) { uint2 currentXZ = unint2( groupId.x * 32 + threadId.x, groupId.z * 32 + threadId.z); genVertsAt(currentXZ); } Can anyone explain why when I call "buffer.GetData(results);" on the CPU after the compute dispatch call my buffer is full of Vector3(0,0,0), I'm not expecting any y values yet but I would expect a bunch of thread indexes in the x,z values for the Vector3 array. I'm not getting any errors in any of this code which suggests it's correct syntax-wise but maybe the issue is a logical bug. Also: Yes, I know I'm generating 4,000 Vector3's and then basically round tripping them. However, the purpose of this code is purely to learn how round tripping works between CPU and GPU in Unity.

    Read the article

  • Columnstore Case Study #1: MSIT SONAR Aggregations

    - by aspiringgeek
    Preamble This is the first in a series of posts documenting big wins encountered using columnstore indexes in SQL Server 2012 & 2014.  Many of these can be found in this deck along with details such as internals, best practices, caveats, etc.  The purpose of sharing the case studies in this context is to provide an easy-to-consume quick-reference alternative. Why Columnstore? If we’re looking for a subset of columns from one or a few rows, given the right indexes, SQL Server can do a superlative job of providing an answer. If we’re asking a question which by design needs to hit lots of rows—DW, reporting, aggregations, grouping, scans, etc., SQL Server has never had a good mechanism—until columnstore. Columnstore indexes were introduced in SQL Server 2012. However, they're still largely unknown. Some adoption blockers existed; yet columnstore was nonetheless a game changer for many apps.  In SQL Server 2014, potential blockers have been largely removed & they're going to profoundly change the way we interact with our data.  The purpose of this series is to share the performance benefits of columnstore & documenting columnstore is a compelling reason to upgrade to SQL Server 2014. App: MSIT SONAR Aggregations At MSIT, performance & configuration data is captured by SCOM. We archive much of the data in a partitioned data warehouse table in SQL Server 2012 for reporting via an application called SONAR.  By definition, this is a primary use case for columnstore—report queries requiring aggregation over large numbers of rows.  New data is refreshed each night by an automated table partitioning mechanism—a best practices scenario for columnstore. The Win Compared to performance using classic indexing which resulted in the expected query plan selection including partition elimination vs. SQL Server 2012 nonclustered columnstore, query performance increased significantly.  Logical reads were reduced by over a factor of 50; both CPU & duration improved by factors of 20 or more.  Other than creating the columnstore index, no special modifications or tweaks to the app or databases schema were necessary to achieve the performance improvements.  Existing nonclustered indexes were rendered superfluous & were deleted, thus mitigating maintenance challenges such as defragging as well as conserving disk capacity. Details The table provides the raw data & summarizes the performance deltas. Logical Reads (8K pages) CPU (ms) Durn (ms) Columnstore 160,323 20,360 9,786 Conventional Table & Indexes 9,053,423 549,608 193,903 ? x56 x27 x20 The charts provide additional perspective of this data.  "Conventional vs. Columnstore Metrics" document the raw data.  Note on this linear display the magnitude of the conventional index performance vs. columnstore.  The “Metrics (?)” chart expresses these values as a ratio. Summary For DW, reports, & other BI workloads, columnstore often provides significant performance enhancements relative to conventional indexing.  I have documented here, the first in a series of reports on columnstore implementations, results from an initial implementation at MSIT in which logical reads were reduced by over a factor of 50; both CPU & duration improved by factors of 20 or more.  Subsequent features in this series document performance enhancements that are even more significant. 

    Read the article

  • Selling Federal Enterprise Architecture (EA)

    - by TedMcLaughlan
    Selling Federal Enterprise Architecture A taxonomy of subject areas, from which to develop a prioritized marketing and communications plan to evangelize EA activities within and among US Federal Government organizations and constituents. Any and all feedback is appreciated, particularly in developing and extending this discussion as a tool for use – more information and details are also available. "Selling" the discipline of Enterprise Architecture (EA) in the Federal Government (particularly in non-DoD agencies) is difficult, notwithstanding the general availability and use of the Federal Enterprise Architecture Framework (FEAF) for some time now, and the relatively mature use of the reference models in the OMB Capital Planning and Investment (CPIC) cycles. EA in the Federal Government also tends to be a very esoteric and hard to decipher conversation – early apologies to those who agree to continue reading this somewhat lengthy article. Alignment to the FEAF and OMB compliance mandates is long underway across the Federal Departments and Agencies (and visible via tools like PortfolioStat and ITDashboard.gov – but there is still a gap between the top-down compliance directives and enablement programs, and the bottom-up awareness and effective use of EA for either IT investment management or actual mission effectiveness. "EA isn't getting deep enough penetration into programs, components, sub-agencies, etc.", verified a panelist at the most recent EA Government Conference in DC. Newer guidance from OMB may be especially difficult to handle, where bottom-up input can't be accurately aligned, analyzed and reported via standardized EA discipline at the Agency level – for example in addressing the new (for FY13) Exhibit 53D "Agency IT Reductions and Reinvestments" and the information required for "Cloud Computing Alternatives Evaluation" (supporting the new Exhibit 53C, "Agency Cloud Computing Portfolio"). Therefore, EA must be "sold" directly to the communities that matter, from a coordinated, proactive messaging perspective that takes BOTH the Program-level value drivers AND the broader Agency mission and IT maturity context into consideration. Selling EA means persuading others to take additional time and possibly assign additional resources, for a mix of direct and indirect benefits – many of which aren't likely to be realized in the short-term. This means there's probably little current, allocated budget to work with; ergo the challenge of trying to sell an "unfunded mandate". Also, the concept of "Enterprise" in large Departments like Homeland Security tends to cross all kinds of organizational boundaries – as Richard Spires recently indicated by commenting that "...organizational boundaries still trump functional similarities. Most people understand what we're trying to do internally, and at a high level they get it. The problem, of course, is when you get down to them and their system and the fact that you're going to be touching them...there's always that fear factor," Spires said. It is quite clear to the Federal IT Investment community that for EA to meet its objective, understandable, relevant value must be measured and reported using a repeatable method – as described by GAO's recent report "Enterprise Architecture Value Needs To Be Measured and Reported". What's not clear is the method or guidance to sell this value. In fact, the current GAO "Framework for Assessing and Improving Enterprise Architecture Management (Version 2.0)", a.k.a. the "EAMMF", does not include words like "sell", "persuade", "market", etc., except in reference ("within Core Element 19: Organization business owner and CXO representatives are actively engaged in architecture development") to a brief section in the CIO Council's 2001 "Practical Guide to Federal Enterprise Architecture", entitled "3.3.1. Develop an EA Marketing Strategy and Communications Plan." Furthermore, Core Element 19 of the EAMMF is advised to be applied in "Stage 3: Developing Initial EA Versions". This kind of EA sales campaign truly should start much earlier in the maturity progress, i.e. in Stages 0 or 1. So, what are the understandable, relevant benefits (or value) to sell, that can find an agreeable, participatory audience, and can pave the way towards success of a longer-term, funded set of EA mechanisms that can be methodically measured and reported? Pragmatic benefits from a useful EA that can help overcome the fear of change? And how should they be sold? Following is a brief taxonomy (it's a taxonomy, to help organize SME support) of benefit-related subjects that might make the most sense, in creating the messages and organizing an initial "engagement plan" for evangelizing EA "from within". An EA "Sales Taxonomy" of sorts. We're not boiling the ocean here; the subjects that are included are ones that currently appear to be urgently relevant to the current Federal IT Investment landscape. Note that successful dialogue in these topics is directly usable as input or guidance for actually developing early-stage, "Fit-for-Purpose" (a DoDAF term) Enterprise Architecture artifacts, as prescribed by common methods found in most EA methodologies, including FEAF, TOGAF, DoDAF and our own Oracle Enterprise Architecture Framework (OEAF). The taxonomy below is organized by (1) Target Community, (2) Benefit or Value, and (3) EA Program Facet - as in: "Let's talk to (1: Community Member) about how and why (3: EA Facet) the EA program can help with (2: Benefit/Value)". Once the initial discussion targets and subjects are approved (that can be measured and reported), a "marketing and communications plan" can be created. A working example follows the Taxonomy. Enterprise Architecture Sales Taxonomy Draft, Summary Version 1. Community 1.1. Budgeted Programs or Portfolios Communities of Purpose (CoPR) 1.1.1. Program/System Owners (Senior Execs) Creating or Executing Acquisition Plans 1.1.2. Program/System Owners Facing Strategic Change 1.1.2.1. Mandated 1.1.2.2. Expected/Anticipated 1.1.3. Program Managers - Creating Employee Performance Plans 1.1.4. CO/COTRs – Creating Contractor Performance Plans, or evaluating Value Engineering Change Proposals (VECP) 1.2. Governance & Communications Communities of Practice (CoP) 1.2.1. Policy Owners 1.2.1.1. OCFO 1.2.1.1.1. Budget/Procurement Office 1.2.1.1.2. Strategic Planning 1.2.1.2. OCIO 1.2.1.2.1. IT Management 1.2.1.2.2. IT Operations 1.2.1.2.3. Information Assurance (Cyber Security) 1.2.1.2.4. IT Innovation 1.2.1.3. Information-Sharing/ Process Collaboration (i.e. policies and procedures regarding Partners, Agreements) 1.2.2. Governing IT Council/SME Peers (i.e. an "Architects Council") 1.2.2.1. Enterprise Architects (assumes others exist; also assumes EA participants aren't buried solely within the CIO shop) 1.2.2.2. Domain, Enclave, Segment Architects – i.e. the right affinity group for a "shared services" EA structure (per the EAMMF), which may be classified as Federated, Segmented, Service-Oriented, or Extended 1.2.2.3. External Oversight/Constraints 1.2.2.3.1. GAO/OIG & Legal 1.2.2.3.2. Industry Standards 1.2.2.3.3. Official public notification, response 1.2.3. Mission Constituents Participant & Analyst Community of Interest (CoI) 1.2.3.1. Mission Operators/Users 1.2.3.2. Public Constituents 1.2.3.3. Industry Advisory Groups, Stakeholders 1.2.3.4. Media 2. Benefit/Value (Note the actual benefits may not be discretely attributable to EA alone; EA is a very collaborative, cross-cutting discipline.) 2.1. Program Costs – EA enables sound decisions regarding... 2.1.1. Cost Avoidance – a TCO theme 2.1.2. Sequencing – alignment of capability delivery 2.1.3. Budget Instability – a Federal reality 2.2. Investment Capital – EA illuminates new investment resources via... 2.2.1. Value Engineering – contractor-driven cost savings on existing budgets, direct or collateral 2.2.2. Reuse – reuse of investments between programs can result in savings, chargeback models; avoiding duplication 2.2.3. License Refactoring – IT license & support models may not reflect actual or intended usage 2.3. Contextual Knowledge – EA enables informed decisions by revealing... 2.3.1. Common Operating Picture (COP) – i.e. cross-program impacts and synergy, relative to context 2.3.2. Expertise & Skill – who truly should be involved in architectural decisions, both business and IT 2.3.3. Influence – the impact of politics and relationships can be examined 2.3.4. Disruptive Technologies – new technologies may reduce costs or mitigate risk in unanticipated ways 2.3.5. What-If Scenarios – can become much more refined, current, verifiable; basis for Target Architectures 2.4. Mission Performance – EA enables beneficial decision results regarding... 2.4.1. IT Performance and Optimization – towards 100% effective, available resource utilization 2.4.2. IT Stability – towards 100%, real-time uptime 2.4.3. Agility – responding to rapid changes in mission 2.4.4. Outcomes –measures of mission success, KPIs – vs. only "Outputs" 2.4.5. Constraints – appropriate response to constraints 2.4.6. Personnel Performance – better line-of-sight through performance plans to mission outcome 2.5. Mission Risk Mitigation – EA mitigates decision risks in terms of... 2.5.1. Compliance – all the right boxes are checked 2.5.2. Dependencies –cross-agency, segment, government 2.5.3. Transparency – risks, impact and resource utilization are illuminated quickly, comprehensively 2.5.4. Threats and Vulnerabilities – current, realistic awareness and profiles 2.5.5. Consequences – realization of risk can be mapped as a series of consequences, from earlier decisions or new decisions required for current issues 2.5.5.1. Unanticipated – illuminating signals of future or non-symmetric risk; helping to "future-proof" 2.5.5.2. Anticipated – discovering the level of impact that matters 3. EA Program Facet (What parts of the EA can and should be communicated, using business or mission terms?) 3.1. Architecture Models – the visual tools to be created and used 3.1.1. Operating Architecture – the Business Operating Model/Architecture elements of the EA truly drive all other elements, plus expose communication channels 3.1.2. Use Of – how can the EA models be used, and how are they populated, from a reasonable, pragmatic yet compliant perspective? What are the core/minimal models required? What's the relationship of these models, with existing system models? 3.1.3. Scope – what level of granularity within the models, and what level of abstraction across the models, is likely to be most effective and useful? 3.2. Traceability – the maturity, status, completeness of the tools 3.2.1. Status – what in fact is the degree of maturity across the integrated EA model and other relevant governance models, and who may already be benefiting from it? 3.2.2. Visibility – how does the EA visibly and effectively prove IT investment performance goals are being reached, with positive mission outcome? 3.3. Governance – what's the interaction, participation method; how are the tools used? 3.3.1. Contributions – how is the EA program informed, accept submissions, collect data? Who are the experts? 3.3.2. Review – how is the EA validated, against what criteria?  Taxonomy Usage Example:   1. To speak with: a. ...a particular set of System Owners Facing Strategic Change, via mandate (like the "Cloud First" mandate); about... b. ...how the EA program's visible and easily accessible Infrastructure Reference Model (i.e. "IRM" or "TRM"), if updated more completely with current system data, can... c. ...help shed light on ways to mitigate risks and avoid future costs associated with NOT leveraging potentially-available shared services across the enterprise... 2. ....the following Marketing & Communications (Sales) Plan can be constructed: a. Create an easy-to-read "Consequence Model" that illustrates how adoption of a cloud capability (like elastic operational storage) can enable rapid and durable compliance with the mandate – using EA traceability. Traceability might be from the IRM to the ARM (that identifies reusable services invoking the elastic storage), and then to the PRM with performance measures (such as % utilization of purchased storage allocation) included in the OMB Exhibits; and b. Schedule a meeting with the Program Owners, timed during their Acquisition Strategy meetings in response to the mandate, to use the "Consequence Model" for advising them to organize a rapid and relevant RFI solicitation for this cloud capability (regarding alternatives for sourcing elastic operational storage); and c. Schedule a series of short "Discovery" meetings with the system architecture leads (as agreed by the Program Owners), to further populate/validate the "As-Is" models and frame the "To Be" models (via scenarios), to better inform the RFI, obtain the best feedback from the vendor community, and provide potential value for and avoid impact to all other programs and systems. --end example -- Note that communications with the intended audience should take a page out of the standard "Search Engine Optimization" (SEO) playbook, using keywords and phrases relating to "value" and "outcome" vs. "compliance" and "output". Searches in email boxes, internal and external search engines for phrases like "cost avoidance strategies", "mission performance metrics" and "innovation funding" should yield messages and content from the EA team. This targeted, informed, practical sales approach should result in additional buy-in and participation, additional EA information contribution and model validation, development of more SMEs and quick "proof points" (with real-life testing) to bolster the case for EA. The proof point here is a successful, timely procurement that satisfies not only the external mandate and external oversight review, but also meets internal EA compliance/conformance goals and therefore is more transparently useful across the community. In short, if sold effectively, the EA will perform and be recognized. EA won’t therefore be used only for compliance, but also (according to a validated, stated purpose) to directly influence decisions and outcomes. The opinions, views and analysis expressed in this document are those of the author and do not necessarily reflect the views of Oracle.

    Read the article

  • The best computer ever

    - by Jeff
    (This is a repost from my personal blog… wow… I need to write more technical stuff!) About three years and three months ago, I bought a 17" MacBook Pro, and it turned out to be the best computer I've ever owned. You might think that every computer with better specs is automatically better than the last, but that hasn't been my experience. My first one was a Sony, back in the Pentium III days, and it cost an astonishing $2,500. That was even more ridiculous in 1999 dollars. It had a dial-up modem, and a CD-ROM, built-in! It may have even played DVD's. A few years later I bought an HP, and it ended up being a pile of shit. The power connector inside came loose from the board, and on occasion would even short. In 2005, I bought a Dell, and it wasn't bad. It had a really high resolution screen (complete with dead pixels, a problem in those days), and it was the first laptop I felt I could do real work on. When 2006 rolled around, Apple started making computers with Intel CPU's, and I bought the very first one the week it came out. I used Boot Camp to run Windows. I still have it in its box somewhere, and I used it for three years. The current 17" was new in 2009. The goodness was largely rooted in having a big screen with lots of dots. This computer has been the source of hundreds of blog posts, tens of thousands of lines of code, video and photo editing, and of course, a whole lot of Web surfing. It connected to corpnet at Microsoft, WiFi in Hawaii and has presented many a deck. It has traveled with me tens of thousands of miles. Last year, I put a solid state drive in it, and it was like getting a new computer. I can boot up a Windows 7 VM in about 19 seconds. Having 8 gigs of RAM has always been fantastic. Everything about it has been fast and fun. When new, the battery (when not using VM's) could get as much as 10 hours. I can still do 7 without much trouble. After 460 charge cycles, the battery health is still between 85 and 90%. The only real negative has been the size and weight. It's only an inch thick, but naturally it's pretty big with a 17" screen. You don't get battery life like that without a huge battery, either, so it's heavy. It was never a deal breaker, but sometimes a long haul across a large airport, you know you're carrying it. Today, Apple announced a new, thinner and lighter 15" laptop, with twice the RAM and CPU cores, and four times the screen resolution. It basically handles my size and weight issues while retaining the resolution, and it still costs less than my 17" did. So I ordered one. Three years is an excellent run, but I kind of budgeted for a new workhorse this year anyway. So if you're interested in a 17" MacBook Pro with a Core 2 Duo 2.66 GHz CPU, 8 gigs of RAM and a 320 gig hard drive (sorry, I'm keeping the SSD), I have one to sell. They've apparently discontinued the 17", which is going to piss off the video community. It's in excellent condition, with a few minor scratches, but I take care of my stuff.

    Read the article

  • Faster Memory Allocation Using vmtasks

    - by Steve Sistare
    You may have noticed a new system process called "vmtasks" on Solaris 11 systems: % pgrep vmtasks 8 % prstat -p 8 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 8 root 0K 0K sleep 99 -20 9:10:59 0.0% vmtasks/32 What is vmtasks, and why should you care? In a nutshell, vmtasks accelerates creation, locking, and destruction of pages in shared memory segments. This is particularly helpful for locked memory, as creating a page of physical memory is much more expensive than creating a page of virtual memory. For example, an ISM segment (shmflag & SHM_SHARE_MMU) is locked in memory on the first shmat() call, and a DISM segment (shmflg & SHM_PAGEABLE) is locked using mlock() or memcntl(). Segment operations such as creation and locking are typically single threaded, performed by the thread making the system call. In many applications, the size of a shared memory segment is a large fraction of total physical memory, and the single-threaded initialization is a scalability bottleneck which increases application startup time. To break the bottleneck, we apply parallel processing, harnessing the power of the additional CPUs that are always present on modern platforms. For sufficiently large segments, as many of 16 threads of vmtasks are employed to assist an application thread during creation, locking, and destruction operations. The segment is implicitly divided at page boundaries, and each thread is given a chunk of pages to process. The per-page processing time can vary, so for dynamic load balancing, the number of chunks is greater than the number of threads, and threads grab chunks dynamically as they finish their work. Because the threads modify a single application address space in compressed time interval, contention on locks protecting VM data structures locks was a problem, and we had to re-scale a number of VM locks to get good parallel efficiency. The vmtasks process has 1 thread per CPU and may accelerate multiple segment operations simultaneously, but each operation gets at most 16 helper threads to avoid monopolizing CPU resources. We may reconsider this limit in the future. Acceleration using vmtasks is enabled out of the box, with no tuning required, and works for all Solaris platform architectures (SPARC sun4u, SPARC sun4v, x86). The following tables show the time to create + lock + destroy a large segment, normalized as milliseconds per gigabyte, before and after the introduction of vmtasks: ISM system ncpu before after speedup ------ ---- ------ ----- ------- x4600 32 1386 245 6X X7560 64 1016 153 7X M9000 512 1196 206 6X T5240 128 2506 234 11X T4-2 128 1197 107 11x DISM system ncpu before after speedup ------ ---- ------ ----- ------- x4600 32 1582 265 6X X7560 64 1116 158 7X M9000 512 1165 152 8X T5240 128 2796 198 14X (I am missing the data for T4 DISM, for no good reason; it works fine). The following table separates the creation and destruction times: ISM, T4-2 before after ------ ----- create 702 64 destroy 495 43 To put this in perspective, consider creating a 512 GB ISM segment on T4-2. Creating the segment would take 6 minutes with the old code, and only 33 seconds with the new. If this is your Oracle SGA, you save over 5 minutes when starting the database, and you also save when shutting it down prior to a restart. Those minutes go directly to your bottom line for service availability.

    Read the article

  • Using WKA in Large Coherence Clusters (Disabling Multicast)

    - by jpurdy
    Disabling hardware multicast (by configuring well-known addresses aka WKA) will place significant stress on the network. For messages that must be sent to multiple servers, rather than having a server send a single packet to the switch and having the switch broadcast that packet to the rest of the cluster, the server must send a packet to each of the other servers. While hardware varies significantly, consider that a server with a single gigabit connection can send at most ~70,000 packets per second. To continue with some concrete numbers, in a cluster with 500 members, that means that each server can send at most 140 cluster-wide messages per second. And if there are 10 cluster members on each physical machine, that number shrinks to 14 cluster-wide messages per second (or with only mild hyperbole, roughly zero). It is also important to keep in mind that network I/O is not only expensive in terms of the network itself, but also the consumption of CPU required to send (or receive) a message (due to things like copying the packet bytes, processing a interrupt, etc). Fortunately, Coherence is designed to rely primarily on point-to-point messages, but there are some features that are inherently one-to-many: Announcing the arrival or departure of a member Updating partition assignment maps across the cluster Creating or destroying a NamedCache Invalidating a cache entry from a large number of client-side near caches Distributing a filter-based request across the full set of cache servers (e.g. queries, aggregators and entry processors) Invoking clear() on a NamedCache The first few of these are operations that are primarily routed through a single senior member, and also occur infrequently, so they usually are not a primary consideration. There are cases, however, where the load from introducing new members can be substantial (to the point of destabilizing the cluster). Consider the case where cluster in the first paragraph grows from 500 members to 1000 members (holding the number of physical machines constant). During this period, there will be 500 new member introductions, each of which may consist of several cluster-wide operations (for the cluster membership itself as well as the partitioned cache services, replicated cache services, invocation services, management services, etc). Note that all of these introductions will route through that one senior member, which is sharing its network bandwidth with several other members (which will be communicating to a lesser degree with other members throughout this process). While each service may have a distinct senior member, there's a good chance during initial startup that a single member will be the senior for all services (if those services start on the senior before the second member joins the cluster). It's obvious that this could cause CPU and/or network starvation. In the current release of Coherence (3.7.1.3 as of this writing), the pure unicast code path also has less sophisticated flow-control for cluster-wide messages (compared to the multicast-enabled code path), which may also result in significant heap consumption on the senior member's JVM (from the message backlog). This is almost never a problem in practice, but with sufficient CPU or network starvation, it could become critical. For the non-operational concerns (near caches, queries, etc), the application itself will determine how much load is placed on the cluster. Applications intended for deployment in a pure unicast environment should be careful to avoid excessive dependence on these features. Even in an environment with multicast support, these operations may scale poorly since even with a constant request rate, the underlying workload will increase at roughly the same rate as the underlying resources are added. Unless there is an infrastructural requirement to the contrary, multicast should be enabled. If it can't be enabled, care should be taken to ensure the added overhead doesn't lead to performance or stability issues. This is particularly crucial in large clusters.

    Read the article

  • installer can't find partition, but fdisk can find them

    - by pxd
    I'm installing ubuntu 12.04, my system had install 2 system -- winxp and ubuntu 10.10. Now, I want to update ubuntu to 12.04. I use usb disk to install 12.04. But, the installer can't not find my partition in my harddisk. But, the fdisk can find them. Can you help me? How to do? ubuntu@ubuntu:~$ sudo lshw -short H/W path Device Class Description system HP 2230s (NN868PA#AB2) /0 bus 3037 /0/9 memory 64KiB BIOS /0/0 processor Intel(R) Core(TM)2 Duo CPU T6570 @ 2.10GHz /0/0/1 memory 2MiB L2 cache /0/0/3 memory 32KiB L1 cache /0/0/0.1 processor Logical CPU /0/0/0.2 processor Logical CPU /0/2 memory 32KiB L1 cache /0/4 memory 2GiB System Memory /0/4/0 memory SODIMM [empty] /0/4/1 memory 2GiB SODIMM DDR2 Synchronous 800 MHz (1.2 ns) /0/100 bridge Mobile 4 Series Chipset Memory Controller Hub /0/100/2 display Mobile 4 Series Chipset Integrated Graphics Controller /0/100/2.1 display Mobile 4 Series Chipset Integrated Graphics Controller /0/100/1a bus 82801I (ICH9 Family) USB UHCI Controller #4 /0/100/1a.1 bus 82801I (ICH9 Family) USB UHCI Controller #5 /0/100/1a.2 bus 82801I (ICH9 Family) USB UHCI Controller #6 /0/100/1a.7 bus 82801I (ICH9 Family) USB2 EHCI Controller #2 /0/100/1b multimedia 82801I (ICH9 Family) HD Audio Controller /0/100/1c bridge 82801I (ICH9 Family) PCI Express Port 1 /0/100/1c.1 bridge 82801I (ICH9 Family) PCI Express Port 2 /0/100/1c.1/0 wlan1 network PRO/Wireless 5100 AGN [Shiloh] Network Connection /0/100/1c.2 bridge 82801I (ICH9 Family) PCI Express Port 3 /0/100/1c.4 bridge 82801I (ICH9 Family) PCI Express Port 5 /0/100/1c.5 bridge 82801I (ICH9 Family) PCI Express Port 6 /0/100/1c.5/0 eth1 network 88E8072 PCI-E Gigabit Ethernet Controller /0/100/1d bus 82801I (ICH9 Family) USB UHCI Controller #1 /0/100/1d.1 bus 82801I (ICH9 Family) USB UHCI Controller #2 /0/100/1d.2 bus 82801I (ICH9 Family) USB UHCI Controller #3 /0/100/1d.7 bus 82801I (ICH9 Family) USB2 EHCI Controller #1 /0/100/1e bridge 82801 Mobile PCI Bridge /0/100/1f bridge ICH9M LPC Interface Controller /0/100/1f.2 scsi0 storage 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] /0/100/1f.2/0 /dev/sda disk 500GB WDC WD5000BEVT-0 /0/100/1f.2/0/1 /dev/sda1 volume 48GiB Windows NTFS volume /0/100/1f.2/0/2 /dev/sda2 volume 416GiB Extended partition /0/100/1f.2/0/2/5 /dev/sda5 volume 97GiB HPFS/NTFS partition /0/100/1f.2/0/2/6 /dev/sda6 volume 198GiB HPFS/NTFS partition /0/100/1f.2/0/2/7 /dev/sda7 volume 27GiB Linux filesystem partition /0/100/1f.2/0/2/8 /dev/sda8 volume 93GiB Linux filesystem partition /0/100/1f.2/1 /dev/cdrom disk CDDVDW TS-L633M /0/1 scsi6 storage /0/1/0.0.0 /dev/sdb disk 15GB STORAGE DEVICE /0/1/0.0.0/0 /dev/sdb disk 15GB /0/1/0.0.0/0/1 /dev/sdb1 volume 14GiB Windows FAT volume /1 power HZ04037 ubuntu@ubuntu:~$ ubuntu@ubuntu:~$ sudo fdisk -l Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x31263125 Device Boot Start End Blocks Id System /dev/sda1 * 63 102277727 51138832+ 7 HPFS/NTFS/exFAT /dev/sda2 102277728 976784129 437253201 f W95 Ext'd (LBA) /dev/sda5 102277791 307078127 102400168+ 7 HPFS/NTFS/exFAT /dev/sda6 307078191 724141151 208531480+ 7 HPFS/NTFS/exFAT /dev/sda7 724142080 781459455 28658688 83 Linux /dev/sda8 781461504 976771071 97654784 83 Linux Disk /dev/sdb: 15.9 GB, 15931539456 bytes 64 heads, 32 sectors/track, 15193 cylinders, total 31116288 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0009eb92 Device Boot Start End Blocks Id Systemfile:///home/ubuntu/Pictures/Screenshot%20from%202012-07-07%2010:25:40.png /dev/sdb1 * 32 31115263 15557616 c W95 FAT32 (LBA) ubuntu 12.04 installer can't find the partition in my hard disk, only find device - /dev/sda.(sorry, I'm new user, so can't send image.)

    Read the article

  • High Resolution Timeouts

    - by user12607257
    The default resolution of application timers and timeouts is now 1 msec in Solaris 11.1, down from 10 msec in previous releases. This improves out-of-the-box performance of polling and event based applications, such as ticker applications, and even the Oracle rdbms log writer. More on that in a moment. As a simple example, the poll() system call takes a timeout argument in units of msec: System Calls poll(2) NAME poll - input/output multiplexing SYNOPSIS int poll(struct pollfd fds[], nfds_t nfds, int timeout); In Solaris 11, a call to poll(NULL,0,1) returns in 10 msec, because even though a 1 msec interval is requested, the implementation rounds to the system clock resolution of 10 msec. In Solaris 11.1, this call returns in 1 msec. In specification lawyer terms, the resolution of CLOCK_REALTIME, introduced by POSIX.1b real time extensions, is now 1 msec. The function clock_getres(CLOCK_REALTIME,&res) returns 1 msec, and any library calls whose man page explicitly mention CLOCK_REALTIME, such as nanosleep(), are subject to the new resolution. Additionally, many legacy functions that pre-date POSIX.1b and do not explicitly mention a clock domain, such as poll(), are subject to the new resolution. Here is a fairly comprehensive list: nanosleep pthread_mutex_timedlock pthread_mutex_reltimedlock_np pthread_rwlock_timedrdlock pthread_rwlock_reltimedrdlock_np pthread_rwlock_timedwrlock pthread_rwlock_reltimedwrlock_np mq_timedreceive mq_reltimedreceive_np mq_timedsend mq_reltimedsend_np sem_timedwait sem_reltimedwait_np poll select pselect _lwp_cond_timedwait _lwp_cond_reltimedwait semtimedop sigtimedwait aiowait aio_waitn aio_suspend port_get port_getn cond_timedwait cond_reltimedwait setitimer (ITIMER_REAL) misc rpc calls, misc ldap calls This change in resolution was made feasible because we made the implementation of timeouts more efficient a few years back when we re-architected the callout subsystem of Solaris. Previously, timeouts were tested and expired by the kernel's clock thread which ran 100 times per second, yielding a resolution of 10 msec. This did not scale, as timeouts could be posted by every CPU, but were expired by only a single thread. The resolution could be changed by setting hires_tick=1 in /etc/system, but this caused the clock thread to run at 1000 Hz, which made the potential scalability problem worse. Given enough CPUs posting enough timeouts, the clock thread could be a performance bottleneck. We fixed that by re-implementing the timeout as a per-CPU timer interrupt (using the cyclic subsystem, for those familiar with Solaris internals). This decoupled the clock thread frequency from timeout resolution, and allowed us to improve default timeout resolution without adding CPU overhead in the clock thread. Here are some exceptions for which the default resolution is still 10 msec. The thread scheduler's time quantum is 10 msec by default, because preemption is driven by the clock thread (plus helper threads for scalability). See for example dispadmin, priocntl, fx_dptbl, rt_dptbl, and ts_dptbl. This may be changed using hires_tick. The resolution of the clock_t data type, primarily used in DDI functions, is 10 msec. It may be changed using hires_tick. These functions are only used by developers writing kernel modules. A few functions that pre-date POSIX CLOCK_REALTIME mention _SC_CLK_TCK, CLK_TCK, "system clock", or no clock domain. These functions are still driven by the clock thread, and their resolution is 10 msec. They include alarm, pcsample, times, clock, and setitimer for ITIMER_VIRTUAL and ITIMER_PROF. Their resolution may be changed using hires_tick. Now back to the database. How does this help the Oracle log writer? Foreground processes post a redo record to the log writer, which releases them after the redo has committed. When a large number of foregrounds are waiting, the release step can slow down the log writer, so under heavy load, the foregrounds switch to a mode where they poll for completion. This scales better because every foreground can poll independently, but at the cost of waiting the minimum polling interval. That was 10 msec, but is now 1 msec in Solaris 11.1, so the foregrounds process transactions faster under load. Pretty cool.

    Read the article

  • ?SPARC T4?????????????·???? : Netra SPARC T4-1

    - by user13138700
    ?SPARC T4???????????????·??????? Netra SPARC T4-1 ???? Netra SPARC T4-2 ?2012?1?10??????????3?15??????????????(????) ?????????? Netra SPARC T4-1 ? 4core ???( T4 ???????? 4core ???)(*)???????????????????????????(*)( Netra SPARC T4-1 ?????? 4core ???? 8core ????????) ??? prtdiag ????? pginfo ??????????????? 8????/1core ???? prtdiag ????????4core=32???????????????pginfo ?????????????????core ???????????????????? # ./prtdiag -v System Configuration: Oracle Corporation sun4v Netra SPARC T4-1 ???????: 130560 M ??? ================================ ?? CPU ================================ CPU ID Frequency Implementation Status ------ --------- ---------------------- ------- 0 2848 MHz SPARC-T4 on-line 1 2848 MHz SPARC-T4 on-line 2 2848 MHz SPARC-T4 on-line 3 2848 MHz SPARC-T4 on-line 4 2848 MHz SPARC-T4 on-line 5 2848 MHz SPARC-T4 on-line 6 2848 MHz SPARC-T4 on-line 7 2848 MHz SPARC-T4 on-line 8 2848 MHz SPARC-T4 on-line 9 2848 MHz SPARC-T4 on-line 10 2848 MHz SPARC-T4 on-line 11 2848 MHz SPARC-T4 on-line 12 2848 MHz SPARC-T4 on-line 13 2848 MHz SPARC-T4 on-line 14 2848 MHz SPARC-T4 on-line 15 2848 MHz SPARC-T4 on-line 16 2848 MHz SPARC-T4 on-line 17 2848 MHz SPARC-T4 on-line 18 2848 MHz SPARC-T4 on-line 19 2848 MHz SPARC-T4 on-line 20 2848 MHz SPARC-T4 on-line 21 2848 MHz SPARC-T4 on-line 22 2848 MHz SPARC-T4 on-line 23 2848 MHz SPARC-T4 on-line 24 2848 MHz SPARC-T4 on-line 25 2848 MHz SPARC-T4 on-line 26 2848 MHz SPARC-T4 on-line 27 2848 MHz SPARC-T4 on-line 28 2848 MHz SPARC-T4 on-line 29 2848 MHz SPARC-T4 on-line 30 2848 MHz SPARC-T4 on-line 31 2848 MHz SPARC-T4 on-line ======================= Physical Memory Configuration ======================== ???? # pginfo -p -T 0 (System [system,chip]) CPUs: 0-31 `-- 3 (Data_Pipe_to_memory [system,chip]) CPUs: 0-31 |-- 2 (Floating_Point_Unit [core]) CPUs: 0-7 | `-- 1 (Integer_Pipeline [core]) CPUs: 0-7 |-- 5 (Floating_Point_Unit [core]) CPUs: 8-15 | `-- 4 (Integer_Pipeline [core]) CPUs: 8-15 |-- 7 (Floating_Point_Unit [core]) CPUs: 16-23 | `-- 6 (Integer_Pipeline [core]) CPUs: 16-23 `-- 9 (Floating_Point_Unit [core]) CPUs: 24-31 `-- 8 (Integer_Pipeline [core]) CPUs: 24-31 T4 ????????????????????????????????????????????????? T3 ?????(S2 core)?????T4 ?????(S3 core)?????????????5???????????? T3 ?????(S2 core)?????????????????????????(????????)?????????????????????????????????????????????·???????????????????????????????????????? ????T4 ????????????????????????????T4 ??????????·??????? Netra SPARC T4-1 4core ????????????????????????????????????T3 ???????????????????????????? ?????????Netra SPARC T4-1 ??????????????? Netra SPARC T4-1 ?? Computing 1 x SPARC T4 4?? 32???? or 8 ?? 64 ???? 2.85GHz CPU (1?????8????) 16 x DDR3 DIMM (?? 256GB ?????16GB DIMM ???) I/O and Storage 3 x Low Profile PCI-Express Gen2 ???? (2 x 10Gb Ethernet XAUI ???????) 2 x Full-height Half-length PCI-Express Gen2 ???? 4 x 10/100/1000 Ethernet ???????? 4 x 2.5” SAS2 HDD 4 x USB ??? (?? 2, ?? 2) RAS and Management and Power Supply ???? (RAID????), ????PSU ?????????? ILOM?????????????? 2N (1+1) , AC ???? DC ?? Support OS Oracle Solaris 10 10/9, 9/10, 8/11, Oracle Solaris 11 11/11 Oracle VM Server for SPARC 2.1 (LDoms) ???? ??? NEBS Level3?? ??????21” 19”(EIA-310D),23”,24”,600mm????? ?????(?????)????????? ????SPARC T4 ????????SPARC T4 ?????????????????????????(4???)???????????? Oracle OpenWorld Tokyo 2012 ?3??(4/4(?)?4/5(?)?4/6(?))?????????????????????&?????????????????SPARC T4 ?????????????????????????????????·?????????????????SPARC T4 ???????????????????!? Oracle OpenWorld Tokyo 2012 http://www.oracle.com/openworld/jp-ja/index.html ????·???????????? 4/6(?) Develop D3-13 (14:00 - 14:45) ???????????49 ??? ?????? 7264 ???????????????

    Read the article

  • MSAcpi_ThermalZoneTemperature class not showing actual temperature

    - by jchoudhury
    i want to fetch CPU Performance data in real time including temperature. i used the following code to get CPU Temperature: try { ManagementObjectSearcher searcher = new ManagementObjectSearcher("root\\WMI", "SELECT * FROM MSAcpi_ThermalZoneTemperature"); foreach (ManagementObject queryObj in searcher.Get()) { double temp = Convert.ToDouble(queryObj["CurrentTemperature"].ToString()); double temp_critical = Convert.ToDouble(queryObj["CriticalTripPoint"].ToString()); double temp_cel = (temp/10 - 273.15); double temp_critical_cel = temp_critical / 10 - 273.15; lblCurrentTemp.Text = temp_cel.ToString(); lblCriticalTemp.Text = temp_critical_cel.ToString(); } } catch (ManagementException e) { MessageBox.Show("An error occurred while querying for WMI data: " + e.Message); } but this code shows the temperature that is not the correct temperature. It ususally shows 49.5-50.5 degrees centigrade. But I used "OpenHardwareMonitor" that report CPU temperature over 71 degree centigrade and changing fractions along with time fractions. is there anything I am missing in the code? I used the above code in timer_click event for every 500ms interval to refresh the temperature reading but it's always showing the same temperature from the beginning of execution. That implies if you run this application and if it shows 49 degree then after 1 hour session, it'll constantly show 49 degree. Where is the problem? please help. Thanks in advance.

    Read the article

  • Context migration in CUDA.NET

    - by Vyacheslav
    I'm currently using CUDA.NET library by GASS. I need to initialize cuda arrays (actually cublas vectors, but it doesn't matters) in one CPU thread and use them in other CPU thread. But CUDA context which holding all initialized arrays and loaded functions, can be attached to only one CPU thread. There is mechanism called context migration API to detach context from one thread and attach it to another. But i don't how to properly use it in CUDA.NET. I tried something like this: class Program { private static float[] vector1, vector2; private static CUDA cuda; private static CUBLAS cublas; private static CUdeviceptr ptr; static void Main(string[] args) { cuda = new CUDA(false); cublas = new CUBLAS(cuda); cuda.Init(); cuda.CreateContext(0); AllocateVectors(); cuda.DetachContext(); CUcontext context = cuda.PopCurrentContext(); GetVectorFromDeviceAsync(context); } private static void AllocateVectors() { vector1 = new float[]{1f, 2f, 3f, 4f, 5f}; ptr = cublas.Allocate(vector1.Length, sizeof (float)); cublas.SetVector(vector1, ptr); vector2 = new float[5]; } private static void GetVectorFromDevice(object objContext) { CUcontext localContext = (CUcontext) objContext; cuda.PushCurrentContext(localContext); cuda.AttachContext(localContext); //change vector somehow vector1[0] = -1; //copy changed vector to device cublas.SetVector(vector1, ptr); cublas.GetVector(ptr, vector2); CUDADriver.cuCtxPopCurrent(ref localContext); } private static void GetVectorFromDeviceAsync(CUcontext cUcontext) { Thread thread = new Thread(GetVectorFromDevice); thread.IsBackground = false; thread.Start(cUcontext); } } But execution fails on attempt to copy changed vector to device because context is not attached? Any ideas how i can get it work?

    Read the article

  • How to get Processor and Motherboard Id ?

    - by Frank
    I use the code from http://www.rgagnon.com/javadetails/java-0580.html to get Motherboard Id, but the result is "null", <1 How can that be ? <2 Also I modified the code a bit to look like this to get processor Id : "Set objWMIService = GetObject(\"winmgmts:\\\\.\\root\\cimv2\")\n"+ "Set colItems = objWMIService.ExecQuery _ \n"+ " (\"Select * from Win32_Processor\") \n"+ "For Each objItem in colItems \n"+ " Wscript.Echo objItem.ProcessorId \n"+ " exit for ' do the first cpu only! \n"+ "Next \n"; The result is something like : ProcessorId = BFEBFBFF00010676 On http://msdn.microsoft.com/en-us/library/aa389273%28VS.85%29.aspx it says : ProcessorId : Processor information that describes the processor features. For an x86 class CPU, the field format depends on the processor support of the CPUID instruction. If the instruction is supported, the property contains 2 (two) DWORD formatted values. The first is an offset of 08h-0Bh, which is the EAX value that a CPUID instruction returns with input EAX set to 1. The second is an offset of 0Ch-0Fh, which is the EDX value that the instruction returns. Only the first two bytes of the property are significant and contain the contents of the DX register at CPU reset—all others are set to 0 (zero), and the contents are in DWORD format. I don't quite understand it, in plain English, is it unique or just a number for this class of processors, for instance all Intel Core2 Duo P8400 will have this number ? Frank

    Read the article

  • Winforms: How to speed up Invalidate()?

    - by Pedery
    I'm developing a retained mode drawing application in GDI+. The application can draw simple shapes to a canvas and perform basic editing. The math that does this is optimized to the last byte and is not an issue. I'm drawing on a panel that is using the built-in Controlstyles.DoubleBuffer. Now, my problem arises if I run my app maximized on a big monitor (HD in my case). If I try to draw a line from one corner of the (big) canvas to the diagonally oposite other, it will start to lag and the CPU goes high up. Each graphical object in my app has a boundingbox. Thus, when I invalidate the boundingbox of a line that goes from one corner of the maximized app to the oposite diagonal one, that boundingbox is virtually as big as the canvas. When a user is drawing a line, this invalidation of the boundingbox thus happens on the mousemove event, and there is a clear lag visible. This lag also exists if the line is the only object on the canvas. I've tried to optimize this in many ways. If I draw a shorter line, the CPU and the lag goes down. If I remove the Invalidate() and keep all other code, the app is quick. If I use a Region (that only spans the figure) to invalidate instead of the boundingbox, it is just as slow. If I split the boundingbox into a range of smaller boxes that lie back to back, thus reducing the invalidation area, no visible performance gain can be seen. Thus I'm at a loss here. How can I speed up the invalidation? On a side note, both Paint.Net and Mspaint suffers from the same shortcommings. Word and PowerPoint however, seem to be able to paint a line as described above with no lag and no CPU load at all. Thus it's possible to achieve the desired results, the question is how?

    Read the article

  • Surprising results with .NET multi-theading algorithm

    - by Myles J
    Hi, I've recently wrote a C# console time tabling algorithm that is based on a combination of a genetic algorithm with a few brute force routines thrown in. The initial results were promising but I figured I could improve the performance by splitting the brute force routines up to run in parallel on multi processor architectures. To do this I used the well documented Producer/Consumer model (as documented in this fantastic article http://www.albahari.com/threading/part2.aspx#_ProducerConsumerQWaitHandle). I changed my code to create one thread per logical processor during the brute force routines. The performance gains on my work station were very pleasing. I am running Windows XP on the following hardware: Intel Core 2 Quad CPU 2.33 GHz 3.49 GB RAM Initial tests indicated average performance gains of approx 40% when using 4 threads. The next step was to deploy the new multi-threading version of the algorithm to our higher spec UAT server. Here is the spec of our UAT server: Windows 2003 Server R2 Enterprise x64 8 cpu (Quad-Core) AMD Opteron 2.70 GHz 255 GB RAM After running the first round of tests we were all extremely surprised to find that the algorithm actually runs slower on the high spec W2003 server than on my local XP work station! In fact the tests seem to indicate that it doesn't matter how many threads are generated (tests were ran with the app spawning between 2 to 32 threads). The algorithm always runs significantly slower on the UAT W2003 server? How could this be? Surely the app should run faster on a 8 cpu (Quad-Core) than my 2 Quad work station? Why are we seeing no performance gains with the multi-threading on the W2003 server whilst the XP workstation tests show gains of up to 40%? Any help or pointers would be appreciated. Regards Myles

    Read the article

  • Removing a platform from Configuration Manager

    - by demoncodemonkey
    I have a solution containing C# and C++/CLI projects. There are 3 platforms in my solution: Any CPU Win32 Mixed Platforms I never want to "just build the C# ones" or "just build the C++ ones", I always want to build all projects. So the platforms metaphor is meaningless to me, I'll leave it on Mixed Platforms or whatever as long as they all build. Now VS sometimes automatically switches the current platform to Any CPU (I'm not sure when or why). This means that pressing F7 will only try to build the C# projects, which is obviously no good. So I have to switch back to Mixed Platforms and try again. So how to workaround this irritating problem? I have tried 2 ways: In Configuration Manager, remove Any CPU and Win32 platforms. This worked until I added a new project and Visual Studio very kindly added them back in... :/ In Configuration Manager, check all checkboxes for all projects in all configurations in all platforms. This becomes a nightmare to manage with many projects in the solution. Any other ideas?

    Read the article

  • SQL Server Blocking Issue

    - by Robin Weston
    We currently have an issue that occurs roughly once a day on SQL 2005 database server, although the time it happens is not consistent. Basically, the database grinds to a halt, and starts refusing connections with the following error message. This includes logging into SSMS: A connection was successfully established with the server, but then an error occurred during the login process. (provider: TCP Provider, error: 0 - The specified network name is no longer available.) Our CPU usage for SQL is usually around 15%, but when the DB is in it's broken state it's around 70%, so it's clearly doing something, even if no-one can connect. Even if I disable the web app that uses the database the CPU still doesn't go down. I am unable to restart the SQLSERVER process as it is unresponsive, so I have to end up killing the process manually, which then puts the DB into Suspect/Recovery mode (which I can fix but it's a pain). Below are some PerfMon stats I gathered when the DB was in it's broken state which might help. I have a bunch more if people want to request them: Active Transactions: 2 (Never Changes) Logical Connections: 34 (NC) Process Blocked: 16 (NC) User Connections: 30 (NC) Batch Request: 0 (NC) Active Jobs: 2 (NC) Log Truncations: 596 (NC) Log Shrinks: 24 (NC) Longest Running Transaction Time: 99 (NC) I guess they key is finding out what the DB is using it's CPU on, but as I can't even log into SSMS this isn't possible with the standard methods. Disturbingly, I can't even use the dedicated admin connection to get into SSMS. I get the same timout as with all other requests. Any advice, reccomendations, or even sympathy, is much appreciated!

    Read the article

  • Simple description of worker and I/O threads in .NET

    - by Konstantin
    It's very hard to find detailed but simple description of worker and I/O threads in .NET What's clear to me regarding this topic (but may not be technically precise): Worker threads are threads that should employ CPU for their work; I/O threads (also called "completion port threads") should employ device drivers for their work and essentially "do nothing", only monitor the completion of non-CPU operations. What is not clear: Although method ThreadPool.GetAvailableThreads returns number of available threads of both types, it seems there is no public API to schedule work for I/O thread. You can only manually create worker thread in .NET? It seems that single I/O thread can monitor multiple I/O operations. Is it true? If so, why ThreadPool has so many available I/O threads by default? In some texts I read that callback, triggered after I/O operation completion is performed by I/O thread. Is it true? Isn’t this a job for worker thread, considering that this callback is CPU operation? To be more specific – do ASP.NET asynchronous pages user I/O threads? What exactly is performance benefit in switching I/O work to separate thread instead of increasing maximum number of worker threads? Is it because single I/O thread does monitor multiple operations? Or Windows does more efficient context switching when using I/O threads?

    Read the article

  • Java Random Slowdowns on Mac OS cont'd

    - by javajustice
    I asked this question a few weeks ago, but I'm still having the problem and I have some new hints. The original question is here: http://stackoverflow.com/questions/1651887/java-random-slowdowns-on-mac-os Basically, I have a java application that splits a job into independent pieces and runs them in separate threads. The threads have no synchronization or shared memory items. The only resources they do share are data files on the hard disk, with each thread having an open file channel. Most of the time it runs very fast, but occasionally it will run very slow for no apparent reason. If I attach a CPU profiler to it, then it will start running quickly again. If I take a CPU snapshot, it says its spending most of its time in "self time" in a function that doesn't do anything except check a few (unshared unsynchronized) booleans. I don't know how this could be accurate because 1, it makes no sense, and 2, attaching the profiler seems to knock the threads out of whatever mode they're in and fix the problem. Also, regardless of whether it runs fast or slow, it always finishes and gives the same output, and it never dips in total cpu usage (in this case ~1500%), implying that the threads aren't getting blocked. I have tried different garbage collectors, different sizings the parts of the memory space, writing data output to non-raid drives, and putting all data output in threads separate the main worker threads. Does anyone have any idea what kind of problem this could be? Could it be the operating system (OS X 10.6.2) ? I have not been able to duplicate it on a windows machine, but I don't have one with a similar hardware configuration.

    Read the article

  • java GC periodically enters into several full GC cycles

    - by Peter
    Environment: sun JDK 1.6.0_16 vm settings: -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -Xms1024 -Xmx1024M -XX:MaxNewSize=448m -XX:NewSize=448m -XX:SurvivorRatio=4(6 also checked) -XX:MaxPermSize=128M OS: windows server 2003 processor: 4 cores of INTEL XEON 5130, 2000 Hz my application description: high intensity of concurrent(java 5 concurrency used) operations completed each time by commit to oracle. it's about 20-30 threads run non stop, doing tasks. application runs in JBOSS web container. My GC starts work normally, I see a lot of small GCs and all that time CPU shows good load, like all 4 cores loaded to 40-50%, CPU graph is stable. Then , after 1 min of good work, CPU starts drop to 0% on 2 cores from 4, it's graph becomes unstable, goes up and down("teeth"). I see, that my threads work slower(I have monitoring), I see that GC starts produce a lot of FULL GC during that time and next 4-5 minutes this situation remains as is, then for short period of time, like 1 minute, it gets back to normal situation, but shortly after that all bad thing repeats. Question: Why I have so frequent full GC??? How to prevent that? I played with SurvivorRatio - does not help. I noticed, that application behaves normally until first FULL GC occurs, while I have enough memory. Then it runs badly. my GC LOG: starts good then long period of FULL GCs(many of them) 1027.861: [GC 942200K-623526K(991232K), 0.0887588 secs] 1029.333: [GC 803279K(991232K), 0.0927470 secs] 1030.551: [GC 967485K-625549K(991232K), 0.0823024 secs] 1030.634: [GC 625957K(991232K), 0.0763656 secs] 1033.126: [GC 969613K-632963K(991232K), 0.0850611 secs] 1033.281: [GC 649899K(991232K), 0.0378358 secs] 1035.910: [GC 813948K(991232K), 0.3540375 secs] 1037.994: [GC 967729K-637198K(991232K), 0.0826042 secs] 1038.435: [GC 710309K(991232K), 0.1370703 secs] 1039.665: [GC 980494K-972462K(991232K), 0.6398589 secs] 1040.306: [Full GC 972462K-619643K(991232K), 3.7780597 secs] 1044.093: [GC 620103K(991232K), 0.0695221 secs] 1047.870: [Full GC 991231K-626514K(991232K), 3.8732457 secs] 1053.739: [GC 942140K(991232K), 0.5410483 secs] 1056.343: [Full GC 991232K-634157K(991232K), 3.9071443 secs] 1061.257: [GC 786274K(991232K), 0.3106603 secs] 1065.229: [Full GC 991232K-641617K(991232K), 3.9565638 secs] 1071.192: [GC 945999K(991232K), 0.5401515 secs] 1073.793: [Full GC 991231K-648045K(991232K), 3.9627814 secs] 1079.754: [GC 936641K(991232K), 0.5321197 secs]

    Read the article

  • New projects not built when target platform is set explicitly

    - by stiank81
    I create a new solution with one project, and then change the target platform from "Any CPU" to "x86". After this new projects added doesn't get built by default, and their target platform doesn't follow the global settings. Why?! Looking at the configuration manager new projects added are not checked to "Build", and they get target platform "Any CPU" instead of the globally set x86. Why is this happening? I expect new projects too to get the globally set and defined x86 target platform.. Some things I've tried: Toggle global platform back to Any CPU, and then to x86 again. No change.. Choosing platform explicitly for the new project. x86 is not available in the list, and when I say <New..> and try adding it I'm not allowed as ".. a solution platform with the same name already exists.". On the build properties for the new project I can't change the platform in the Configuration section, but I can set "Platform target" to x86 in the General section. It is however not clear whether this actually makes a difference, and it wouldn't respond if I change the target platform globally later. Initially I thought this was a problem from converting my solution from VS2008 to VS2010, but the problem applies both places. I.e. when I create a solution in VS2008 and just stay in VS2008 I still get the problem.

    Read the article

  • Did I implement clock drift properly?

    - by David Titarenco
    I couldn't find any clock drift RNG code for Windows anywhere so I attempted to implement it myself. I haven't run the numbers through ent or DIEHARD yet, and I'm just wondering if this is even remotely correct... void QueryRDTSC(__int64* tick) { __asm { xor eax, eax cpuid rdtsc mov edi, dword ptr tick mov dword ptr [edi], eax mov dword ptr [edi+4], edx } } __int64 clockDriftRNG() { __int64 CPU_start, CPU_end, OS_start, OS_end; // get CPU ticks -- uses RDTSC on the Processor QueryRDTSC(&CPU_start); Sleep(1); QueryRDTSC(&CPU_end); // get OS ticks -- uses the Motherboard clock QueryPerformanceCounter((LARGE_INTEGER*)&OS_start); Sleep(1); QueryPerformanceCounter((LARGE_INTEGER*)&OS_end); // CPU clock is ~1000x faster than mobo clock // return raw return ((CPU_end - CPU_start)/(OS_end - OS_start)); // or // return a random number from 0 to 9 // return ((CPU_end - CPU_start)/(OS_end - OS_start)%10); } If you're wondering why I Sleep(1), it's because if I don't, OS_end - OS_start returns 0 consistently (because of the bad timer resolution, I presume). Basically, (CPU_end - CPU_start)/(OS_end - OS_start) always returns around 1000 with a slight variation based on the entropy of CPU load, maybe temperature, quartz crystal vibration imperfections, etc. Anyway, the numbers have a pretty decent distribution, but this could be totally wrong. I have no idea.

    Read the article

  • How long is the time frame between context switches on Windows?

    - by mattcodes
    Reading CLR via C# 2.0 (I dont have 3.0 with me at the moment) Is this still the case: If there is only one CPU in a computer, only one thread can run at any one time. Windows has to keep track of the thread objects, and every so often, Windows has to decide which thread to schedule next to go to the CPU. This is additional code that has to execute once every 20 milliseconds or so. When Windows makes a CPU stop executing one thread's code and start executing another thread's code, we call this a context switch. A context switch is fairly expensive because the operating system has to: So circa CLR via C# 2.0 lets say we are on Pentium 4 2.4ghz 1 core non-HT, XP. Every 20 milliseconds? Where a CLR thread or Java thread is mapped to an OS thread only a maximum of 50 threads per second may get a chance to to run? I've read that context switching is very fast in mircoseconds here on SO, but how often roughly (magnitude style guesses) will say a modest 5 year old server Windows 2003 Pentium Xeon single core give the OS the opportunity to context switch? 20ms in the right area? I dont need exact figures I just want to be sure that's in the right area, seems rather long to me.

    Read the article

  • My program is spending most of its time in objc_msgSend. Does that mean that Objective-C has bad per

    - by Paperflyer
    Hello Stackoverflow. I have written an application that has a number of custom views and generally draws a lot of lines and bitmaps. Since performance is somewhat critical for the application, I spent a good amount of time optimizing draw performance. Now, activity monitor tells me that my application is usually using about 12% CPU and Instrument (the profiler) says that a whopping 10% CPU is spent in objc_msgSend (mostly in drawing related system calls). On the one hand, I am glad about this since it means that my drawing is about as fast as it gets and my optimizations where a huge success. On the other hand, it seems to imply that the only thing that is still using my CPU is the Objective-C overhead for messages (objc_msgSend). Hence, that if I had written the application in, say, Carbon, its performance would be drastically better. Now I am tempted to conclude that Objective-C is a language with bad performance, even though Cocoa seems to be awfully efficient since it can apparently draw faster than Objective-C can send messages. So, is Objective-C really a language with bad performance? What do you think about that?

    Read the article

  • How expensive is a context switch? Is it better to implement a manual task switch than to rely on OS

    - by Vilx-
    The title says it all. Imagine I have two (three, four, whatever) tasks that have to run in parallel. Now, the easy way to do this would be to create separate threads and forget about it. But on a plain old single-core CPU that would mean a lot of context switching - and we all know that context switching is big, bad, slow, and generally simply Evil. It should be avoided, right? On that note, if I'm writing the software from ground up anyway, I could go the extra mile and implement my own task-switching. Split each task in parts, save the state inbetween, and then switch among them within a single thread. Or, if I detect that there are multiple CPU cores, I could just give each task to a separate thread and all would be well. The second solution does have the advantage of adapting to the number of available CPU cores, but will the manual task-switch really be faster than the one in the OS core? Especially if I'm trying to make the whole thing generic with a TaskManager and an ITask, etc?

    Read the article

< Previous Page | 71 72 73 74 75 76 77 78 79 80 81 82  | Next Page >