Search Results

Search found 20275 results on 811 pages for 'general performance'.

Page 59/811 | < Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >

Adding complexity by generalising: how far should you go?

- by marcog

Reference question: http://stackoverflow.com/questions/4303813/help-with-interview-question The above question asked to solve a problem for an NxN matrix. While there was an easy solution, I gave a more general solution to solve the more general problem for an NxM matrix. A handful of people commented that this generalisation was bad because it made the solution more complex. One such comment is voted +8. Putting aside the hard-to-explain voting effects on SO, there are two types of complexity to be considered here: Runtime complexity, i.e. how fast does the code run Code complexity, i.e. how difficult is the code to read and understand The question of runtime complexity is something that requires a better understanding of the input data today and what it might look like in the future, taking the various growth factors into account where necessary. The question of code complexity is the one I'm interested in here. By generalising the solution, we avoid having to rewrite it in the event that the constraints change. However, at the same time it can often result in complicating the code. In the reference question, the code for NxN is easy to understand for any competent programmer, but the NxM case (unless documented well) could easily confuse someone coming across the code for the first time. So, my question is this: Where should you draw the line between generalising and keeping the code easy to understand?

Read the article
What calls trigger a new batch?

- by sebf

I am finding my project is starting to show performance degradation and I need to optimize it. The answer to my previous question and this presentation from NVidia have helped greatly in understanding the performance characteristics of code using the GPU but there are a couple of things that aren't clear that I need to know to optimize my drawing. Specifically, what calls make the distinction between batches. I know that any state changes cause a new batch, so that includes: Render State Changes Buffer Changes Shader Changes Render Target Changes Correct? What else counts as a 'state change'? Does each Draw**Primitive() call constitute a new batch? Even if I were to issue the same call twice, with no state changes, or call it once on on part of the buffer, then again on another? If I were to update a buffer, but not change the bindings, would that be a new batch? That presentation and a DX9 page suggest using all of the texture slots available, which I take to mean loading multiple objects in 'parallel' by mapping their buffers/shaders/textures to slots 1-16. But I am not sure how this works - surely to do this you would need to change the buffer binding and that would count as a state change? (or is it a case of you do but it saves 16 calls so its OK?)

Read the article
How to store bitmaps in memory?

- by Geotarget

I'm working with general purpose image rendering, and high-performance image processing, and so I need to know how to store bitmaps in-memory. (24bpp/32bpp, compressed/raw, etc) I'm not working with 3D graphics or DirectX / OpenGL rendering and so I don't need to use graphics card compatible bitmap formats. My questions: What is the "usual" or "normal" way to store bitmaps in memory? (in C++ engines/projects?) How to store bitmaps for high-performance algorithms, such that read/write times are the fastest? (fixed array? with/without padding? 24-bpp or 32-bpp?) How to store bitmaps for applications handling a lot of bitmap data, to minimize memory usage? (JPEG? or a faster [de]compression algorithm?) Some possible methods: Use a fixed packed 24-bpp or 32-bpp int[] array and simply access pixels using pointer access, all pixels are allocated in one continuous memory chunk (could be 1-10 MB) Use a form of "sparse" data storage so each line of the bitmap is allocated separately, reusing more memory and requiring smaller contiguous memory segments Store bitmaps in its compressed form (PNG, JPG, GIF, etc) and unpack only when its needed, reducing the amount of memory used. Delete the unpacked data if its not used for 10 secs.

Read the article
Recommended formats to store bitmaps in memory?

- by Geotarget

I'm working with general purpose image rendering, and high-performance image processing, and so I need to know how to store bitmaps in-memory. (24bpp/32bpp, compressed/raw, etc) I'm not working with 3D graphics or DirectX / OpenGL rendering and so I don't need to use graphics card compatible bitmap formats. My questions: What is the "usual" or "normal" way to store bitmaps in memory? (in C++ engines/projects?) How to store bitmaps for high-performance algorithms, such that read/write times are the fastest? (fixed array? with/without padding? 24-bpp or 32-bpp?) How to store bitmaps for applications handling a lot of bitmap data, to minimize memory usage? (JPEG? or a faster [de]compression algorithm?) Some possible methods: Use a fixed packed 24-bpp or 32-bpp int[] array and simply access pixels using pointer access, all pixels are allocated in one continuous memory chunk (could be 1-10 MB) Use a form of "sparse" data storage so each line of the bitmap is allocated separately, reusing more memory and requiring smaller contiguous memory segments Store bitmaps in its compressed form (PNG, JPG, GIF, etc) and unpack only when its needed, reducing the amount of memory used. Delete the unpacked data if its not used for 10 secs.

Read the article
How should I group these variables?

- by stariz77

I have a shape that will be defined by: char s_type; char color; double height; double width; These variables are scanned in from a request string sent to my server and passed into my printing function, which then prints out the shape. Currently they are just local variables sitting in my main(); however, I was wondering if there would be any advantage in creating a struct containing these variables, and then passing the struct to my printing function? or how else might I improve my program's structure/style, would passing a struct by reference have any kind of performance benefit if there were many requests and therefore many printing function calls? printer(char st, char cr, double ht, double wd); int main() { // Other main functionality. char s_type; char color; double height; double width; sscanf (serv_req, "GET /%c/%c/%lf/%lf", &s_type, &color, &height, &width); printer(s_type, color, height, width); // Other main functionality. return 0; } It seemed "neater" if I had a struct or something that didn't leave me with declarations in the middle of everything else going on in main. I'm interested in structure/style as well as performance. EDIT: didn't mean to put printer declaration inside main.

Read the article
Product Development Investment: A Measure of Vendor Performance

- by Jim Mcglothlin

The relationship between a large, complex organization and its key suppliers of information technology is normally more than just "strategic". Expectations about the duration of the relationship typically exceed 20 years. Enterprise applications and technology infrastructure are not expected to be changed out like petunias. So how would you rate the due diligence processes as performed in Higher Education when selecting critical, transformational information technology? My observation: I see a lot of effort put into elaborate demonstration of basic software functionality. I see a lot of attention paid to the cost element of technology acquisition, including the contracted cost of implementation consulting services. But the factor that receives only cursory analysis and due diligence is long-term performance--the ability of a vendor to grow, expand, and develop, and bring its customers along with it. So what should you look for in a long-term IT supplier? Oracle has a public track record for product development. The annual investment has been on a run rate of almost $3 Billion organic product development. Oracle's well-publicized acquisitions and mergers have been supplemental to its R&D. This is important for Higher Education. Another meaningful way to evaluate a company is to look at the tangible track record of enhancement. Consider the Oracle-PeopleSoft enterprise business platform since acquired by Oracle 6 years ago: Product or Technology Enhancement Customer or User Impact Service Oriented Architecture (SOA) 300+ new web services delivered in versions 9.0 & 9.1 provide flexibility, so that customers can integrate PeopleSoft with other applications. Campus Solutions has added Admissions and Constituent Web Services. Constituent Relationship Management PeopleSoft CRM 9.1 for Higher Education introduced new process flows for student recruiting and retention to support "Student Success" initiatives. A 360 view of the constituent is now delivered, and the concept of a single-stop Student Services Center is now in CRM 9.1 with tight integration to PeopleSoft Campus Solutions. Human Capital Management Contract Pay for Education, with flexibility for configuration and calculation, has been extended in HCM 9.1. New chartfield integration among Project Costing - Time & Labor - Payroll to serve the labor distribution requirements for Grants / Sponsored Research. Talent Management PeopleSoft 9.0 and 9.1 feature an integrated talent management approach centered on definitions in "Profile Manager", with all new usability improvements. Internal and external candidate pools, and the entire recruitment process, are driven by delivered configurable selection and on-boarding processes. Interview scheduling, and online job offers are newly delivered processes. Performance Management PeopleSoft HCM ePerformance 9.1 will include significant new functionality designed to help organizations more effectively align business objectives with employee goals. Using an Organization Chart view, your business goals can flow down to become tangible objectives per employee. Succession Planning / Workforce Development New in HCM 9.0, enhanced in 9.1, is a planning capability for regular or unusual (major organizational change) succession of internal or external candidates. PeopleSoft supports employee-based career planning, which ultimately increases the integrity of the succession planning process (identify their career needs, plans, preferences, and interests). Dashboards / Oracle Business Intelligence Application Suite Oracle Human Resources Analytics provides the workforce information foundation that integrates data from HR functional areas and Finance. Oracle Human Resources Analytics delivers 9 dashboards and over 200 reports. Provide your HR professionals and front-line managers the tools to analyze workforce staffing, retention, productivity, to better source high-quality applicants, and to reduce absence costs. Multi-year Planning and Commitment Control External funding sources, especially Grants, require a multi-year encumbrance business process. PeopleSoft HCM 9.1 adds multi-year funding and commitment control, including budget checking. The newly designed Real Time Budget Checking will provide the customer with an updated snapshot of their budget and encumbrances at any given time. Position Budgeting with Hyperion Hyperion Planning world-class products now include delivered integration to PeopleSoft HCM. Position Budgeting is available in the new Public Sector Planning module of Hyperion. Web 2.0 features for the latest in usability PeopleSoft 9.1 features a contemporary internet user experience: Partial-page refreshing Drag and drop pagelets New menu structure Navigation pagelets Modal popup message windows Favorites & recently used links Type-ahead Drag and drop grid columns, pop-out grids Portal Workspaces Enterprise 2.0 for your collaborative web communities, using new content management, along with Wikis, blogs, and discussion forums in PeopleSoft Portal 9.1. PeopleTools enhanced by Oracle Fusion Middleware Standards-based tools have been added to the PeopleTools application infrastructure: BI (XML) Publisher, Java tools. Certified for use with PeopleSoft: Oracle Business Intelligence (OBIEE), Oracle Enterprise Manager, Oracle Weblogic Server, Oracle SOA Suite. Hosting for PeopleSoft applications A solid new deployment option: Oracle On Demand remote hosting center for high scalability, security, and continuity of operations. Business Process Outsourcing (BPO) for HCM / Payroll functions Partnership with AT&T provides hosting of HR/Payroll application along with payroll business process operations, and subscription-based service fees (SaaS). AT&T BPO full service includes pay sheet processing, bank and 3rd party file transfer, payroll tax handling, etc. Continuous Delivery Model Feature Packs provide faster time-to-benefit; new features become available in PeopleSoft 9.1 (or Campus Solutions 9.0) without need to perform upgrade. Golden person data model across all campus applications Oracle Higher Education Constituent Hub provides synchronization and data governance of person data across any application, e.g. HR/ Payroll, Student Information System, Housing, Emergency Contact, LMS, CRM. Oracle's aggressive enhancement plans within the "Applications Unlimited" program continue, as new functionality is under development for a new version of a PeopleSoft release planned for 2012. Meanwhile, new capabilities are planned on an annual basis in Feature Packs. PeopleSoft just delivered the HCM 2010 Feature Pack and another is planned for 2011. In February we plan to have over 100 customers from our Customer Advisory Boards at our PeopleSoft Development Center in California to review designs for all of these releases. For those of you near New York City The investment and progressive development story described above is the subject of an Oracle road show event on February 9, 2011. Charting Your Course with Oracle Applications is a global event series designed to help business and IT executives assess the impact of new inflection points on their business and applications roadmap: changing workforces, shifting customer and constituent bases, and increased volatility. Learn how innovations ranging from new deployment models like cloud computing to the introduction of social applications and smart devices are delivering results across all areas of business and industry. THIS DOCUMENT IS FOR INFORMATIONAL PURPOSES ONLY AND MAY NOT BE INCORPORATED INTO A CONTRACT OR AGREEMENT.

Read the article
T4 Performance Counters explained

- by user13346607

Now that T4 is out for a few month some people might have wondered what details of the new pipeline you can monitor. A "cpustat -h" lists a lot of events that can be monitored, and only very few are self-explanatory. I will try to give some insight on all of them, some of these "PIC events" require an in-depth knowledge of T4 pipeline. Over time I will try to explain these, for the time being these events should simply be ignored. (Side note: some counters changed from tape-out 1.1 (*only* used in the T4 beta program) to tape-out 1.2 (used in the systems shipping today) The table only lists the tape-out 1.2 counters) 0 0 1 1058 6033 Oracle Microelectronics 50 14 7077 14.0 Normal 0 false false false EN-US JA X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin;} pic name (cpustat) Prose Comment Sel-pipe-drain-cycles, Sel-0-[wait|ready], Sel-[1,2] Sel-0-wait counts cycles a strand waits to be selected. Some reasons can be counted in detail; these are: Sel-0-ready: Cycles a strand was ready but not selected, that can signal pipeline oversubscription Sel-1: Cycles only one instruction or µop was selected Sel-2: Cycles two instructions or µops were selected Sel-pipe-drain-cycles: cf. PRM footnote 8 to table 10.2 Pick-any, Pick-[0|1|2|3] Cycles one, two, three, no or at least one instruction or µop is picked Instr_FGU_crypto Number of FGU or crypto instructions executed on that vcpu Instr_ld dto. for load Instr_st dto. for store SPR_ring_ops dto. for SPR ring ops Instr_other dto. for all other instructions not listed above, PRM footnote 7 to table 10.2 lists the instructions Instr_all total number of instructions executed on that vcpu Sw_count_intr Nr of S/W count instructions on that vcpu (sethi %hi(fc000),%g0 (whatever that is)) Atomics nr of atomic ops, which are LDSTUB/a, CASA/XA, and SWAP/A SW_prefetch Nr of PREFETCH or PREFETCHA instructions Block_ld_st Block loads or store on that vcpu IC_miss_nospec, IC_miss_[L2_or_L3|local|remote]\ _hit_nospec Various I$ misses, distinguished by where they hit. All of these count per thread, but only primary events: T4 counts only the first occurence of an I$ miss on a core for a certain instruction. If one strand misses in I$ this miss is counted, but if a second strand on the same core misses while the first miss is being resolved, that second miss is not counted This flavour of I$ misses counts only misses that are caused by instruction that really commit (note the "_nospec") BTC_miss Branch target cache miss ITLB_miss ITLB misses (synchronously counted) ITLB_miss_asynch dto. but asynchronously [I|D]TLB_fill_\ [8KB|64KB|4MB|256MB|2GB|trap] H/W tablewalk events that fill ITLB or DTLB with translation for the corresponding page size. The “_trap” event occurs if the HWTW was not able to fill the corresponding TLB IC_mtag_miss, IC_mtag_miss_\ [ptag_hit|ptag_miss|\ ptag_hit_way_mismatch] I$ micro tag misses, with some options for drill down Fetch-0, Fetch-0-all fetch-0 counts nr of cycles nothing was fetched for this particular strand, fetch-0-all counts cycles nothing was fetched for all strands on a core Instr_buffer_full Cycles the instruction buffer for a strand was full, thereby preventing any fetch BTC_targ_incorrect Counts all occurences of wrongly predicted branch targets from the BTC [PQ|ROB|LB|ROB_LB|SB|\ ROB_SB|LB_SB|RB_LB_SB|\ DTLB_miss]\ _tag_wait ST_q_tag_wait is listed under sl=20. These counters monitor pipeline behaviour therefore they are not strand specific: PQ_...: cycles Rename stage waits for a Pick Queue tag (might signal memory bound workload for single thread mode, cf. Mail from Richard Smith) ROB_...: cycles Select stage waits for a ROB (ReOrderBuffer) tag LB_...: cycles Select stage waits for a Load Buffer tag SB_...: cycles Select stage waits for Store Buffer tag combinations of the above are allowed, although some of these events can overlap, the counter will only be incremented once per cycle if any of these occur DTLB_...: cycles load or store instructions wait at Pick stage for a DTLB miss tag [ID]TLB_HWTW_\ [L2_hit|L3_hit|L3_miss|all] Counters for HWTW accesses caused by either DTLB or ITLB misses. Canbe further detailed by where they hit IC_miss_L2_L3_hit, IC_miss_local_remote_remL3_hit, IC_miss I$ prefetches that were dropped because they either miss in L2$ or L3$ This variant counts misses regardless if the causing instruction commits or not DC_miss_nospec, DC_miss_[L2_L3|local|remote_L3]\ _hit_nospec D$ misses either in general or detailed by where they hit cf. the explanation for the IC_miss in two flavours for an explanation of _nospec and the reasoning for two DC_miss counters DTLB_miss_asynch counts all DTLB misses asynchronously, there is no way to count them synchronously DC_pref_drop_DC_hit, SW_pref_drop_[DC_hit|buffer_full] L1-D$ h/w prefetches that were dropped because of a D$ hit, counted per core. The others count software prefetches per strand [Full|Partial]_RAW_hit_st_[buf|q] Count events where a load wants to get data that has not yet been stored, i. e. it is still inside the pipeline. The data might be either still in the store buffer or in the store queue. If the load's data matches in the SB and in the store queue the data in buffer takes precedence of course since it is younger [IC|DC]_evict_invalid, [IC|DC|L1]_snoop_invalid, [IC|DC|L1]_invalid_all Counter for invalidated cache evictions per core St_q_tag_wait Number of cycles pipeline waits for a store queue tag, of course counted per core Data_pref_[drop_L2|drop_L3|\ hit_L2|hit_L3|\ hit_local|hit_remote] Data prefetches that can be further detailed by either why they were dropped or where they did hit St_hit_[L2|L3], St_L2_[local|remote]_C2C, St_local, St_remote Store events distinguished by where they hit or where they cause a L2 cache-to-cache transfer, i.e. either a transfer from another L2$ on the same die or from a different die DC_miss, DC_miss_\ [L2_L3|local|remote]_hit D$ misses either in general or detailed by where they hit cf. the explanation for the IC_miss in two flavours for an explanation of _nospec and the reasoning for two DC_miss counters L2_[clean|dirty]_evict Per core clean or dirty L2$ evictions L2_fill_buf_full, L2_wb_buf_full, L2_miss_buf_full Per core L2$ buffer events, all count number of cycles that this state was present L2_pipe_stall Per core cycles pipeline stalled because of L2$ Branches Count branches (Tcc, DONE, RETRY, and SIT are not counted as branches) Br_taken Counts taken branches (Tcc, DONE, RETRY, and SIT are not counted as branches) Br_mispred, Br_dir_mispred, Br_trg_mispred, Br_trg_mispred_\ [far_tbl|indir_tbl|ret_stk] Counter for various branch misprediction events. Cycles_user counts cycles, attribute setting hpriv, nouser, sys controls addess space to count in Commit-[0|1|2], Commit-0-all, Commit-1-or-2 Number of times either no, one, or two µops commit for a strand. Commit-0-all counts number of times no µop commits for the whole core, cf. footnote 11 to table 10.2 in PRM for a more detailed explanation on how this counters interacts with the privilege levels

Read the article
Code Metrics: Number of IL Instructions

- by DigiMortal

In my previous posting about code metrics I introduced how to measure LoC (Lines of Code) in .NET applications. Now let’s take a step further and let’s take a look how to measure compiled code. This way we can somehow have a picture about what compiler produces. In this posting I will introduce you code metric called number of IL instructions. NB! Number of IL instructions is not something you can use to measure productivity of your team. If you want to get better idea about the context of this metric and LoC then please read my first posting about LoC. What are IL instructions? When code written in some .NET Framework language is compiled then compiler produces assemblies that contain byte code. These assemblies are executed later by Common Language Runtime (CLR) that is code execution engine of .NET Framework. The byte code is called Intermediate Language (IL) – this is more common language than C# and VB.NET by example. You can use ILDasm tool to convert assemblies to IL assembler so you can read them. As IL instructions are building blocks of all .NET Framework binary code these instructions are smaller and highly general – we don’t want very rich low level language because it executes slower than more general language. For every method or property call in some .NET Framework language corresponds set of IL instructions. There is no 1:1 relationship between line in high level language and line in IL assembler. There are more IL instructions than lines in C# code by example. How much instructions there are? I have no common answer because it really depends on your code. Here you can see some metrics from my current community project that is developed on SharePoint Server 2007. As average I have about 7 IL instructions per line of code. This is not metric you should use, it is just illustrative example so you can see the differences between numbers of lines and IL instructions. Why should I measure the number of IL instructions? Just take a look at chart above. Compiler does something that you cannot see – it compiles your code to IL. This is not intuitive process because you usually cannot say what is exactly the end result. You know it at greater plain but you don’t know it exactly. Therefore we can expect some surprises and that’s why we should measure the number of IL instructions. By example, you may find better solution for some method in your source code. It looks nice, it works nice and everything seems to be okay. But on server under load your fix may be way slower than previous code. Although you minimized the number of lines of code it ended up with increasing the number of IL instructions. How to measure the number of IL instructions? My choice is NDepend because Visual Studio is not able to measure this metric. Steps to make are easy. Open your NDepend project or create new and add all your application assemblies to project (you can also add Visual Studio solution to project). Run project analysis and wait until it is done. You can see over-all stats form global summary window. This is the same window I used to read the LoC and the number of IL instructions metrics for my chart. Meanwhile I made some changes to my code (enabled advanced caching for events and event registrations module) and then I ran code analysis again to get results for this section of this posting. NDepend is also able to tell you exactly what parts of code have problematically much IL instructions. The code quality section of CQL Query Explorer shows you how much problems there are with members in analyzed code. If you click on the line Methods too big (NbILInstructions) you can see all the problematic members of classes in CQL Explorer shown in image on right. In my case if have 10 methods that are too big and two of them have horrible number of IL instructions – just take a look at first two methods in this TOP10. Also note the query box. NDepend has easy and SQL-like query language to query code analysis results. You can modify these queries if you like and also you can define your own ones if default set is not enough for you. What is good result? As you can see from query window then the number of IL instructions per member should have maximally 200 IL instructions. Of course, like always, the less instructions you have, the better performing code you have. I don’t mean here little differences but big ones. By example, take a look at my first method in warnings list. The number of IL instructions it has is huge. And believe me – this method looks awful. Conclusion The number of IL instructions is useful metric when optimizing your code. For analyzing code at general level to find out too long methods you can use the number of LoC metric because it is more intuitive for you and you can therefore handle the situation more easily. Also you can use NDepend as code metrics tool because it has a lot of metrics to offer.

Read the article
Polite busy-waiting with WRPAUSE on SPARC

- by Dave

Unbounded busy-waiting is an poor idea for user-space code, so we typically use spin-then-block strategies when, say, waiting for a lock to be released or some other event. If we're going to spin, even briefly, then we'd prefer to do so in a manner that minimizes performance degradation for other sibling logical processors ("strands") that share compute resources. We want to spin politely and refrain from impeding the progress and performance of other threads — ostensibly doing useful work and making progress — that run on the same core. On a SPARC T4, for instance, 8 strands will share a core, and that core has its own L1 cache and 2 pipelines. On x86 we have the PAUSE instruction, which, naively, can be thought of as a hardware "yield" operator which temporarily surrenders compute resources to threads on sibling strands. Of course this helps avoid intra-core performance interference. On the SPARC T2 our preferred busy-waiting idiom was "RD %CCR,%G0" which is a high-latency no-nop. The T4 provides a dedicated and extremely useful WRPAUSE instruction. The processor architecture manuals are the authoritative source, but briefly, WRPAUSE writes a cycle count into the the PAUSE register, which is ASR27. Barring interrupts, the processor then delays for the requested period. There's no need for the operating system to save the PAUSE register over context switches as it always resets to 0 on traps. Digressing briefly, if you use unbounded spinning then ultimately the kernel will preempt and deschedule your thread if there are other ready threads than are starving. But by using a spin-then-block strategy we can allow other ready threads to run without resorting to involuntary time-slicing, which operates on a long-ish time scale. Generally, that makes your application more responsive. In addition, by blocking voluntarily we give the operating system far more latitude regarding power management. Finally, I should note that while we have OS-level facilities like sched_yield() at our disposal, yielding almost never does what you'd want or naively expect. Returning to WRPAUSE, it's natural to ask how well it works. To help answer that question I wrote a very simple C/pthreads benchmark that launches 8 concurrent threads and binds those threads to processors 0..7. The processors are numbered geographically on the T4, so those threads will all be running on just one core. Unlike the SPARC T2, where logical CPUs 0,1,2 and 3 were assigned to the first pipeline, and CPUs 4,5,6 and 7 were assigned to the 2nd, there's no fixed mapping between CPUs and pipelines in the T4. And in some circumstances when the other 7 logical processors are idling quietly, it's possible for the remaining logical processor to leverage both pipelines. Some number T of the threads will iterate in a tight loop advancing a simple Marsaglia xor-shift pseudo-random number generator. T is a command-line argument. The main thread loops, reporting the aggregate number of PRNG steps performed collectively by those T threads in the last 10 second measurement interval. The other threads (there are 8-T of these) run in a loop busy-waiting concurrently with the T threads. We vary T between 1 and 8 threads, and report on various busy-waiting idioms. The values in the table are the aggregate number of PRNG steps completed by the set of T threads. The unit is millions of iterations per 10 seconds. For the "PRNG step" busy-waiting mode, the busy-waiting threads execute exactly the same code as the T worker threads. We can easily compute the average rate of progress for individual worker threads by dividing the aggregate score by the number of worker threads T. I should note that the PRNG steps are extremely cycle-heavy and access almost no memory, so arguably this microbenchmark is not as representative of "normal" code as it could be. And for the purposes of comparison I included a row in the table that reflects a waiting policy where the waiting threads call poll(NULL,0,1000) and block in the kernel. Obviously this isn't busy-waiting, but the data is interesting for reference. _table { border:2px black dotted; margin: auto; width: auto; } _tr { border: 2px red dashed; } _td { border: 1px green solid; } _table { border:2px black dotted; margin: auto; width: auto; } _tr { border: 2px red dashed; } td { background-color : #E0E0E0 ; text-align : right ; } th { text-align : left ; } td { background-color : #E0E0E0 ; text-align : right ; } th { text-align : left ; } Aggregate progress T = #worker threads Wait Mechanism for 8-T threadsT=1T=2T=3T=4T=5T=6T=7T=8 Park thread in poll() 32653347334833483348334833483348 no-op 415 831 124316482060249729303349 RD %ccr,%g0 "pause" 14262429269228623013316232553349 PRNG step 412 829 124616702092251029303348 WRPause(8000) 32443361333133483349334833483348 WRPause(4000) 32153308331533223347334833473348 WRPause(1000) 30853199322432513310334833483348 WRPause(500) 29173070315032223270330933483348 WRPause(250) 26942864294930773205338833483348 WRPause(100) 21552469262227902911321433303348

Read the article
Blazing fast performance with RadGridView for Silverlight 4, RadDataPager and WCF RIA Services

In my previous post I’ve used almost 2 million records to the check the grid performance in WPF and I’ve decided to do the same for Silverlight 4 using WCF RIA Services. The grid again is bound completely codelessly using DomainDataSource and RadDataPager: <Grid x:Name="LayoutRoot"> <Grid.RowDefinitions> <RowDefinition /> <RowDefinition Height="Auto" /> </Grid.RowDefinitions> <riaControls:DomainDataSource Name="orderDomainDataSource" QueryName="GetOrdersAndOrderDetails"> <riaControls:DomainDataSource.DomainContext> <my:NorthwindDomainContext /> </riaControls:DomainDataSource.DomainContext> </riaControls:DomainDataSource> <telerik:RadGridView Name="RadGridView1" IsReadOnly="True" AutoExpandGroups="True" ItemsSource="{Binding Data, ElementName=orderDomainDataSource}" /> <telerik:RadDataPager Grid.Row="1" PageSize="10" Source="{Binding Data, ElementName=orderDomainDataSource}" DisplayMode="All" /> </Grid> And the query again will return join between Northwind Orders and Order_Details: … public IQueryable<OrdersAndOrderDetails> GetOrdersAndOrderDetails() ...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
F# performance vs Erlang performance, is there proof the Erlang's VM is faster?

- by afuzzyllama

I've been putting time into learning functional programming and I've come to the part where I want to start writing a project instead of just dabbling in tutorials/examples. While doing my research, I've found that Erlang seems to be a pretty powerful when it comes to writing concurrent software (which is my goal), but resources and tools for development aren't as mature as Microsoft development products. F# can run on linux (Mono) so that requirement is met, but while looking around on the internet I cannot find any comparisons of F# vs Erlang. Right now, I am leaning towards Erlang just because it seems to have the most press, but I am curious if there is really any performance difference between the two languages. Since I am use to developing in .NET, I can probably get up to speed with F# a lot faster than Erlang, but I cannot find any resource to convince me that F# is just as scalable as Erlang. I am most interested in simulation, which is going to be firing a lot of quickly processed messages to persistant nodes. If I have not done a good job with what I am trying to ask, please ask for more verification.

Read the article
Increase Performance and Agility with Oracle’s New Data Center Fabric Solutions

- by Cinzia Mascanzoni

Join this Webcas on Tues., December 11, 2012 10 a.m. PT / 1 p.m. ET and hear from S.K. Vinod, Senior Director of Product Management, Oracle Virtual Networking products. He’ll show you how the fast, simple, and agile architecture of Oracle Fabric Interconnect provides dynamic network and storage connectivity to thousands of servers. You will see how to use Oracle Software Defined Network (SDN) to connect any resource on the data center fabric quickly—without incurring downtime or requiring network reconfiguration. With Oracle Virtual Networking products, you can: Streamline your data center connectivity Reduce complexity by 70% Cut infrastructure expenses by up to 50% Increase application performance up to 30x Provision new services and reconfigure resources in minutes Simplify deployments with wire-once infrastructure During the Webcast, you’ll also have the opportunity to chat directly with Oracle experts. Visit OPN's Server & Storage Systems Knowledge Zones anytime to learn about partner engagement, training, resources, and replays of other webcasts to jump start business. You can also email us your questions. Unable to attend live? Register anyway – we'll send you the on-demand link to the Webcast!

Read the article
Declarative View Objects (VOs) for better ADF performance

- by Shay Shmeltzer

Just got back from ODTUG's kscope13 conference which had a lot of good deep ADF content. In one of my session I ran out of time to do one of my demos, so I wanted to share it here instead. This is a demo of how Declarative View Objects can increase your application's performance. For those who are not familiar with declarative VOs, those are VOs that don't actually specify a hard coded query. Instead ADF creates their query at runtime, and it does it based on the data that is requested in your UI layer. This can be a huge saver of both DB resources and network resources. More in the documentation. Here is a quick example that shows you how using such a VO can automatically switch to a simpler SQL instead of a complex join when needed. (note while I demo with 11.1.2.* the feature is there in 11.1.1.* versions also). The demo also shows you how you can monitor the SQL that ADF BC issues to the database using the WebLogic logging feature in JDeveloper. As a side note, I would have loved to see more ADF developers attending Kscope. This demo was part of the "ADF intro" track at Kscope, In the advanced ADF track you would have been treated to a full tuning session about ADF with lots of other tips. Consider attending Kscope next year - it is going to be in Seattle this time.

Read the article
New database profiling support in ANTS Performance Profiler

- by Ben Emmett

In May last year, the ANTS Performance Profiler team added the ability to profile database requests your application makes to SQL Server or Oracle. The really cool thing is that you’re shown those requests in the application’s call tree, so you can see what .NET code caused those queries to run. It’s particularly helpful if you’re using an ORM which automagically generates and runs queries for you, but which doesn’t necessarily do it in the most efficient way possible. Now by popular demand, we’ve added support for profiling MySQL (or MariaDB) and PostgreSQL, so you can see queries run against those databases too. Some of you have also said that you’re using the Devart dotConnect data providers instead of the native .NET ones, so we’ve added support for those drivers too. Hope it helps! For the record, here’s a list of supported connectors (ones in bold are new): SQL Server .NET Framework Data Provider Devart dotConnect for SQL Server Oracle .NET Framework Data Provider Oracle Data Provider for .NET Devart dotConnect for Oracle MySQL / MariaDB MySQL Connector/Net Devart dotConnect for MySQL PostgreSQL Npgsql .NET Data Provider for PostgreSQL Devart dotConnect for PostgreSQL SQL Server Compact Edition .NET Framework Data Provider for SQL Server Compact Edition Devart dotConnect for SQL Server Pro Have we missed a connector or database which you’d find useful? Tell us about it in the comments or by emailing [email protected]. Ben

Read the article
Issue in understanding how to compare performance of classifier using ROC

- by user1214586

I am trying to demystify pattern recognition techniques and understood few of them. I am trying to design a classifier M. A gesture is classified based on the hamming distance between the sample time series y and the training time series x. The result of the classifier are probabilistic values. There are 3 classes/categories with labels A,B,C which classifies hand gestures where there are 100 samples for each class which are to be classified (single feature and data length=100). The data are different time series (x coordinate vs time). The training set is used to assign probabilities indicating which gesture has occured how many times. So,out of 10 training samples if gesture A appeared 6 times then probability that a gesture falls under category A is P(A)=0.6 similarly P(B)=0.3 and P(C)=0.1 Now, I am trying to compare the performance of this classifier with Bayes classifier, K-NN, Principal component analysis (PCA) and Neural Network. On what basis,parameter and method should I do it if I consider ROC or cross validate since the features for my classifier are the probabilistic values for the ROC plot hence what shall be the features for k-nn,bayes classification and PCA? Is there a code for it which will be useful. What should be the value of k is there are 3 classes of gestures? Please help. I am in a fix.

Read the article
How to evaluate a user against optimal performance?

- by Alex K

I have trouble coming up with a system of assigning a rating to player's performance. Well, technically there is is a trivial rating system, but I don't like it because it would mean assigning negative scores, which I think most players will be discouraged by. The problem is that I only know the ideal number of actions to get the desired result. The worst case is infinite number of actions, so there is no obvious scale. The trivial way I referred to above is to take score = (#optimal-moves - #players-moves), with ideal score being zero. However, psychologically people like big numbers. No one wants to win by getting a mark of 0. I wonder if there is a system that someone else has come up with before to solve this problem? Essentially I wish to score the players based on: How close they've come to the ideal solution. Different challenges will have different optimal number of actions, so the scoring system needs to take that into account, e.g. Challenge 1 - max 10 points, Challenge 2 - max 20 points. I don't mind giving the players negative scores if they've performed exceptionally badly, I just don't want all scores to be <=0

Read the article
Optimizing MySQL, Improving Performance of Database Servers

- by Antoinette O'Sullivan

Optimization involves improving the performance of a database server and queries that run against it. Optimization reduces query execution time and optimized queries benefit everyone that uses the server. When the server runs more smoothly and processes more queries with less, it performs better as a whole. To learn more about how a MySQL developer can make a difference with optimization, take the MySQL Developers training course. This 5-day instructor-led course is available as: Live-Virtual Event: Attend a live class from your own desk - no travel required. Choose from a selection of events on the schedule to suit different timezones. In-Class Event: Travel to an education center to attend an event. Below is a selection of the events on the schedule. Location Date Delivery Language Vienna, Austria 17 November 2014 German Brussels, Belgium 8 December 2014 English Sao Paulo, Brazil 14 July 2014 Brazilian Portuguese London, English 29 September 2014 English Belfast, Ireland 6 October 2014 English Dublin, Ireland 27 October 2014 English Milan, Italy 10 November 2014 Italian Rome, Italy 21 July 2014 Italian Nairobi, Kenya 14 July 2014 English Petaling Jaya, Malaysia 25 August 2014 English Utrecht, Netherlands 21 July 2014 English Makati City, Philippines 29 September 2014 English Warsaw, Poland 25 August 2014 Polish Lisbon, Portugal 13 October 2014 European Portuguese Porto, Portugal 13 October 2014 European Portuguese Barcelona, Spain 7 July 2014 Spanish Madrid, Spain 3 November 2014 Spanish Valencia, Spain 24 November 2014 Spanish Basel, Switzerland 4 August 2014 German Bern, Switzerland 4 August 2014 German Zurich, Switzerland 4 August 2014 German The MySQL for Developers course helps prepare you for the MySQL 5.6 Developers OCP certification exam. To register for an event, request an additional event or learn more about the authentic MySQL curriculum, go to http://education.oracle.com/mysql.

Read the article
Workshop CUOA-Oracle Hyperion "Pianificazione economico-finanziaria, reporting e performance management" - Altavilla Vicentina, 25/10/2012

- by Edilio Rossi

Più di 100 professsionisti - manager della funzione Amministrazione, Finanza e Controllo in azienda e consulenti del settore - hanno partecipato al Workshop, organizzato da CUOA e Oracle Hyperion, in collaborazione con Adacta Studio Associato. E' stata un'occasione unica per approfondire i temi della pianificazione "estesa" e del controllo di gestione nelle imprese italiane - piccole, medie e grandi - alternando chiavi di lettura diverse (accademica, consulenziale, tecnologico-applicativa e utenti) ma tutte legate dal filo conduttore dell'evoluzione dei modelli, degli strumenti e dell'utilizzo dei sistemi evoluti di planning e budgeting economico-finanziario e patrimoniale. Una particolare attenzione è stata posta sul rapporto banca-impresa alla luce dell'attuale crisi e di come i sistemi innovativi di performance management e business intelligence possono aiutare il management nel ridisegno del sistema di finanziamento delle aziende e nella negoziazione con i diversi stakeholders. Grazie alle testimonianze dei casi aziendali GIV (Gruppo Italiano Vini) e Datalogic si è potuto "toccare con mano" l'utlizzo dei modelli e degli strumenti di pianificazione e controllo in realtà aziendali diverse ma che affrontano entrambe alcune delle sfide che i mercati oggi pongono alle imprese italiane. Le presentazioni sono disponibili su richiesta inviando una mail a: paolo.leveghi-AT-oracle.com

Read the article
High Performance SQL Views Using WITH(NOLOCK)

- by gt0084e1

Every now and then you find a simple way to make everything much faster. We often find customers creating data warehouses or OLAP cubes even though they have a relatively small amount of data (a few gigs) compared to their server memory. If you have more server memory than the size of your database or working set, nearly any aggregate query should run in a second or less. In some situations there may be high traffic on from the transactional application and SQL server may wait for several other queries to run before giving you your results. The purpose of this is make sure you don’t get two versions of the truth. In an ATM system, you want to give the bank balance after the withdrawal, not before or you may get a very unhappy customer. So by default databases are rightly very conservative about this kind of thing. Unfortunately this split-second precision comes at a cost. The performance of the query may not be acceptable by today’s standards because the database has to maintain locks on the server. Fortunately, SQL Server gives you a simple way to ask for the current version of the data without the pending transactions. To better facilitate reporting, you can create a view that includes these directives. CREATE VIEW CategoriesAndProducts AS SELECT * FROM dbo.Categories WITH(NOLOCK) INNER JOIN dbo.Products WITH(NOLOCK) ON dbo.Categories.CategoryID = dbo.Products.CategoryID In some cases quires that are taking minutes end up taking seconds. Much easier than moving the data to a separate database and it’s still pretty much real time give or take a few milliseconds. You’ve been warned not to use this for bank balances though. More from Data Stream

Read the article
Honing Performance Tuning Skills on MySQL

- by Antoinette O'Sullivan

Get hands-on experience with techniques for tuning a MySQL Server with the Authorized MySQL Performance Tuning course. This course is designed for database administrators, database developers and system administrators who are responsible for managing, optimizing, and tuning a MySQL Server. You can follow this live instructor led training: From your desk. Choose from among the 800+ events on the live-virtual training schedule. In a classroom. A selection of events/locations listed below Location Date Delivery Language Prague, Czech Republic 1 October 2012 Czech Warsaw, Poland 9 July 2012 Polish London, UK 19 November 2012 English Rome, Italy 23 October 2012 Italian Lisbon, Portugal 17 September 2012 European Portugese Aix-en-Provence, France 4 September 2012 French Strasbourg, France 16 October 2012 French Nieuwegein, Netherlands 3 September 2012 Dutch Madrid, Spain 6 August 2012 Spanish Mechelen, Belgium 1 October 2012 English Riga, Latvia 10 December 2012 Latvian Petaling Jaya, Malaysia 10 September 2012 English Edmonton, Canada 27 August 2012 English Vancouver, Canada 27 August 2012 English Ottawa, Canada 26 November 2012 English Toronto, Canada 26 November 2012 English Montreal, Canada 26 November 2012 English Mexico City, Mexico 9 July 2012 Spanish Sao Paulo, Brazil 2 July 2012 Brazilian Portugese To find a virtual or in-class event that suits you, go or http://oracle.com/education and choose a course and delivery type in your location.

Read the article
How to improve Minecraft-esque voxel world performance?

- by SomeXnaChump

After playing Minecraft I marveled a bit at its large worlds but at the same time I found them extremely slow to navigate, even with a quad core and meaty graphics card. Now I assume Minecraft is fairly slow because: A) It's written in Java, and as most of the spatial partitioning and memory management activities happen in there, it would naturally be slower than a native C++ version. B) It doesn't partition its world very well. I could be wrong on both assumptions; however it got me thinking about the best way to manage large voxel worlds. As it is a true 3D world, where a block can exist in any part of the world, it is basically a big 3D array [x][y][z], where each block in the world has a type (i.e BlockType.Empty = 0, BlockType.Dirt = 1 etc.) Now, I am assuming to make this sort of world perform well you would need to: A) Use a tree of some variety (oct/kd/bsp) to split all the cubes out; it seems like an oct/kd would be the better option as you can just partition on a per cube level not a per triangle level. B) Use some algorithm to work out which blocks can currently be seen, as blocks closer to the user could obfuscate the blocks behind, making it pointless to render them. C) Keep the block object themselves lightweight, so it is quick to add and remove them from the trees. I guess there is no right answer to this, but I would be interested to see peoples' opinions on the subject. How would you improve performance in a large voxel-based world?

Read the article
Minimum percentage of free physical memory that Linux require for optimal performance

- by csoto

Recently, we have been getting questions about this percentage of free physical memory that OS require for optimal performance, mainly applicable to physical compute nodes. Under normal conditions you may see that at the nodes without any application running the OS take (for example) between 24 and 25 GB of memory. The Linux system reports the free memory in a different way, and most of those 25gbs (of the example) are available for user processes. IE: Mem: 99191652k total, 23785732k used, 75405920k free, 173320k buffers The MOS Doc Id. 233753.1 - "Analyzing Data Provided by '/proc/meminfo'" - explains it (section 4 - "Final Remarks"): Free Memory and Used Memory Estimating the resource usage, especially the memory consumption of processes is by far more complicated than it looks like at a first glance. The philosophy is an unused resource is a wasted resource.The kernel therefore will use as much RAM as it can to cache information from your local and remote filesystems/disks. This builds up over time as reads and writes are done on the system trying to keep the data stored in RAM as relevant as possible to the processes that have been running on your system. If there is free RAM available, more caching will be performed and thus more memory 'consumed'. However this doesn't really count as resource usage, since this cached memory is available in case some other process needs it. The cache is reclaimed, not at the time of process exit (you might start up another process soon that needs the same data), but upon demand. That said, focusing more specifically on the percentage question, apart from this memory that OS takes, how much should be the minimum free memory that must be available every node so that they operate normally? The answer is: As a rule of thumb 80% memory utilization is a good threshold, anything bigger than that should be investigated and remedied.

Read the article
What's the best way to measure and track performance over various calls at runtime?

- by bitcruncher

Hello. I'm trying to optimize the performance of my code, but I'm not familiar with xcode's debuggers or debuggers in general. Is it possible to track the execution time and frequency of calls being made at runtime? Imagine a chain of events with some recursive calls over a fraction of a second. What's the best way to track where the CPU spends most of its time? Many thanks. Edit: Maybe this is better asked by saying, how do I use the xcode debug tools to do a stack trace?

Read the article
Are there ways to improve NHibernate's performance regarding entity instantiation?

- by denny_ch

Hi folks, while profiling NHibernate with NHProf I noticed that a lot of time is spend for entity building or at least spend outside the query duration (database roundtrip). The project I'm currently working on prefetches some static data (which goes into the 2nd level cache) at application start. There are about 3000 rows in the result set (and maybe 30 columns) that is queried in 75 ms. The overall duration observed by NHProf is about 13 SECONDS! Is this typical beheviour? I know that NHibernate shouldn't be used for bulk operations, but I didn't thought that entity instantiation would be so expensive. Are there ways to improve performance in such situations or do I have to live with it? Thx, denny_ch

Read the article
Low cost way to host a large table yet keep the performance scalable?

- by Leo Liang

I have a growing table storing time series data, 500M entries now, and 200K new records every day. The total size is around 15GB for now. My clients are querying the table via a PHP script mostly, and the size of the result set is around 10K records (not very large). select * from T where timestamp > X and timestamp < Y and additionFilters And I want this operation cheap. Currently my table is hosting in Postgres 7, on a single 16G memory Box, and I would love to see some good suggestion for me to host this in low cost and also allow me to scale up for performance if needed. The table serves: 1. Query: 90% 2. Insert: 9.9% 2. Update: 0.1% <-- very rare.

Read the article

< Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >