Search Results

Search found 14282 results on 572 pages for 'performance counter'.

Page 71/572 | < Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78 | Next Page >

SQL SERVER – #TechEdIn – Presenting Tomorrow on Speed Up! – Parallel Processes and Unparalleled Performance at TechEd India 2012

- by pinaldave

Performance tuning is always a very hot topic when it is about SQL Server. SQL Server Performance Tuning is a very challenging subject that requires expertise in Database Administration and Database Development. I always have enjoyed talking about SQL Server Performance tuning subject. However, in India, it’s actually the very first time someone is presenting on this interesting subject, so this time I had the biggest challenge to present this session. Frequently enough, we get these two kind of questions: How to turn off parallelism as it is reducing performance? How to turn on parallelism as I want more performance? The reality is that not everyone knows what exactly is needed by their system. In this session, I have attempted to answer this very question. I’ve decided to provide a balanced view but stay away from theory, which leads us to say “It depends”. The session will have a clear message about this towards its end. Deck Details Slides: 45+ Demos: 7+ Bonus Quiz: 5 Images: 10+ Session delivery time: 52 Mins + 8 Mins of Q & A I have presented this session a couple of times to my friends and so far have received good feedback. Oftentimes, when people hear that I am going to present 45 slides, they all say it is too much to cover. However, when I am done with the session the usual reaction is that I truly gave justice to those slides. Action Item Here are a few of the action items for all of those who are going to attend this session: If you want to attend the session, just come early. There’s a good chance that you may not get a seat because right before me, there is a session from SQL Guru Vinod Kumar. He performs a powerful delivery of million concepts in just a little time. Quiz. I will be asking few questions during the session as well as before the session starts. If you get the correct answer, I will give unique learning material for you. You may not want to miss this learning opportunity at any cosst. Session Details Title: Speed Up! – Parallel Processes and Unparalleled Performance (Add to Calendar) Abstract: “More CPU, More Performance” – A very common understanding is that usage of multiple CPUs can improve the performance of the query. To get a maximum performance out of any query, one has to master various aspects of the parallel processes. In this deep-dive session, we will explore this complex subject with a very simple interactive demo. Attendees will walk away with proper understanding of CX_PACKET wait types, MAXDOP, parallelism threshold and various other concepts. Date and Time: March 23, 2012, 12:15 to 13:15 Location: Hotel Lalit Ashok - Kumara Krupa High Grounds, Bengaluru – 560001, Karnataka, India. Add to Calendar Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Interview Questions and Answers, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
Skype performance in IPSEC VPN

- by dunxd

I've been challenged to "improve Skype performance" for calls within my organisation. Having read the Skype IT Administrators Guide I am wondering whether we might have a performance issue where the Skype Clients in a call are all on our WAN. The call is initiated by a Skype Client at our head office, and terminated on a Skype Client in a remote office connected via IPSEC VPN. Where this happens, I assume the trafficfrom Client A (encrypted by Skype) goes to our ASA 5510, where it is furtehr encrypted, sent to the remote ASA 5505 decrypted, then passed to Client B which decrypts the Skype encryption. Would the call quality benefit if the traffic didn't go over the VPN, but instead only relied on Skype's encryption? I imagine I could achieve this by setting up a SOCKS5 proxy in our HQ DMZ for Skype traffic. Then the traffic goes from Client A to Proxy, over the Skype relay network, then arrives at Cisco ASA 5505 as any other internet traffic, and then to Client B. Is there likely to be any performance benefit in doing this? If so, is there a way to do it that doesn't require a proxy? Has anyone else tackled this?

Read the article
How do I improve my incremental-backup performance?

- by Alistair Bell

I'm currently using the traditional rsync+cp -al method to create incremental/snapshot backups of our server tree. The backups are going onto a pair of eight-disk towers connected to the backup machine (a Sandy Bridge machine with 16 GB of RAM, running CentOS 5.5) via four eSATA connections (four disks per connection). Each disk is a regular 2 TB disk, so we have 32 TB of disk space connected to the backup machine. We're backing up about 20 TB of data on the servers with this. The problem is that each daily backup is taking more than 24 hours, and the real time-killer isn't the actual rsync, but the time it takes to perform a cp -al of the tree locally on the backup machine. It's taking more than 12 hours just to make the shadow copy of the tree, and as far as I can tell the performance backlog is at the disk (top shows the cp using a lot of RAM but not a lot of CPU and mostly in uninterruptible-sleep state) We have the server data split into four major volumes (and a few minor ones), and each of these backups runs in parallel (with some offsets in the cron to try to get some disks' cp done first). There are two volumes on the backup drive, both striped LVM volumes of 16 TB each. So obviously I need to improve the performance because it's unusable as it stands. The first question is: when CentOS 6 comes out, with support for btrfs, will making snapshots of subvolumes with btrfs substantially increase this performance? The second is: is there a way, with ext3 or something else supported in CentOS 5 or 6, to 'encourage' it to put the directories/inodes in one part of a volume (which could happen to be the part that's on an SSD, via LVM) and the files in another? That would presumably solve the problem, but I don't know of ways to hint ext3 like that.

Read the article
Poor SSL performance with vsftpd

- by petrus

I'm trying to tweak vsftpd to achieve maximum performance for my usage: I have only one or two clients that connect to the server. File size is between ~15MB and 1GB. Typical transfer batch represent between 1 and 2GB of data. For testing purposes, I'm using a tmpfs on both sides (thus eliminating any disks bottleneck) with a single 1GB file. When SSL is disabled, performance is good, with a transfer rate at ~120MB/s (reaching the limits of gigabit networking). With SSL enabled only for control traffic (and not data traffic), performance drops at about 112MB/s, which is still within the acceptable limits. However, when SSL is enabled for data flows, the transfer speed drops dramatically: 6.7MB/s using 3DES & SHA (ssl_ciphers=DES-CBC3-SHA in vsftpd.conf) 16MB/s using DES & SHA (ssl_ciphers=DES-CBC-SHA) I didn't tested other ciphers, but from what I can see from the CPU usage during the transfer, it seems that vsftpd is only using a single cpu/core per client. While this can fit for large ftp sites with hundreds of clients, I'd like to avoid this behavior and use more ressources on the server. On a side note, if you have any ideas regarding other openssl ciphers...

Read the article
Performance degrades for more than 2 threads on Xeon X5355

- by zoolii

Hi All, I am writing an application using boost threads and using boost barriers to synchronize the threads. I have two machines to test the application. Machine 1 is a core2 duo (T8300) cpu machine (windows XP professional - 4GB RAM) where I am getting following performance figures : Number of threads :1 , TPS :21 Number of threads :2 , TPS :35 (66 % improvement) further increase in number of threads decreases the TPS but that is understandable as the machine has only two cores. Machine 2 is a 2 quad core ( Xeon X5355) cpu machine (windows 2003 server with 4GB RAM) and has 8 effective cores. Number of threads :1 , TPS :21 Number of threads :2 , TPS :27 (28 % improvement) Number of threads :4 , TPS :25 Number of threads :8 , TPS :24 As you can see, performance is degrading after 2 threads (though it has 8 cores). If the program has some bottle neck , then for 2 thread also it should have degraded. Any idea? , Explanations ? , Does the OS has some role in performance ? - It seems like the Core2duo (2.4GHz) scales better than Xeon X5355 (2.66GHz) though it has better clock speed. Thank you -Zoolii

Read the article
Improve wireless performance

- by djechelon

Hello, I have a Trust Speedshare Turbo Pro router, which is running on channel 6. I found that the wireless signal (and network performance) dramatically drops from my PDA (I can barely attach to the network, even if I set the PDA's energy settings to maximum wireless performance) when I even exit my room, and I don't have shielded walls or something like that. I can't even stream a SD video from my desktop (connected via LAN) to my laptop using WiFi, while via LAN it works fine. I read that changing router's channel could improve performance due to interference reducing. I found that almost all wireless networks around here run on channels 6 and 11. I tried to go to my router's settings page to change channel, but I found that the combo box only allows me to select 6!! I'm not sure, but I may have been able in the past to change channel, though not to all of the available channels. A few minutes ago I tried a firmware upgrade, but it didn't solve my problem. My question is Is it possible that my router is someway locked to its channel? I bought it on my own, I didn't receive it from my ISP Apart from boosting the antenna power to the maximum (which, by the way, increases the EM radiation my and my family's bodies absorb 24/7 and is little more environment-unfriendly), do you have any tips on getting high quality transmission up to 5 metres from the antenna? Thank you

Read the article
Developing high-performance and scalable zend framework website [on hold]

- by Daniel

We are going to develop an ads website like http://www.gumtree.com/ (it will not be like this one but just to give you an ideea) and we are having some issues regarding performance and scalability. We are planning on using Zend Framework for this project but this is all that I'm sure off at this point. I don't think a classic approch like Zend Framework (PHP) + MySQL + Memcache + jQuery (and I would throw Doctrine 2 in there to) will fix result in a high-performance application. I was thinking on making this a RESTful application (with Zend Framework) + NGINX (or maybe MongoDB) + Memcache (or eAccelerator -- I understand this will create problems with scalability on multiple servers) + jQuery or maybe throw Backbone.js in there, a CDN for static content, a server for images and a scalable server for the requests and the rest. My questions are: - What do you think about my approch? - What solutions would you recommand for developing an high performance, scalable application expected to have a lot of traffic using PHP(Zend Framework 2)...I would be interested in your approch. I should note that I'm a Zend developer, I'm working with Zend for over 3 years, this is why I'm choosing it.

Read the article
WSUS performance for unneeded updates

- by mhouston100

We have a WSUS server serving around 300 PC's and a couple of dozen servers and a discussion came up at work as to what products to include. We have a single SQL 2005 instance on one of the servers and it has NEVER been updated. My first thought was to just tick the box for SQL 2005 and let WSUS do it's thing to upgrade to the latest service pack at least. One of the other guys here has the opinion that having updates that are relevant to only a small selection of hosts would effect the performance of WSUS as a whole, claiming that each update does a 'check' against all the hosts or something similar. My argument is that manually updating these servers is obviously not working as the admins are not paying attention to what is needed. So my question is: Do updates that only effect a sub-set of the hosts effect the overall performance of the WSUS server in relation to ALL the hosts? (disk space is not an issue at this point) Is there any performance justification for or against manually updating small amounts of products? Basically I'm needing a rebuttal against his argument and I'm unable to find any concrete documentation to prove him wrong.

Read the article
Hyperthreading vs. SQL Server & PostgreSQL

- by IanC

I have read that hyperthreading is a "performance killer" when it comes to DBs. However, what I read didn't state which CPUs. Further, it mostly indicated that I/O was "cut to < 10% performance". That logically doesn't make sense since I/O is primarily a function of controllers and disks, not CPUs. But then no one ever said bugs made sense. What I read also stated that SQL Server could put two parallel query ops onto 1 logical core (2 threads), thereby degrading performance. I have a hard time believing SQL Server's architects would have made such an obvious miscalculation. Does anyone have and data on how hyperthreading on current generation CPUs affects either of the RDBMSs I mentioned?

Read the article
aligning truecrypt partition on 1.5TB 4kB sector drive

- by pQd

hi, aligning partitions to start at real physical sector of ssds / stripped raids / 4kB drives is a 'good thing to do'. but i've run into a problems when trying to do it for a truecrypt partition that will contain ext3 on it. or so it seems. when drive is question is partitioned properly and formatted with ext3 i get very reasonable write speeds around 70-80MB/s, but when i put truecrypt and ext3 on the top of it write performance becomes very unstable and goes between 1-25MB/s with very high io-wait. on the same server i dont have any performance issues with ext3 on the top of truecrypt on regular 512B-sector 500GB sata disks. so my best guess is that iowaits are caused by misalignment but i cannot really find reliable information on how to calculate optimal partition beginning. i've tried to start it at 128 logical sector, i've also tried 8132 sector as suggested here but both gave me very bad and unstable performance. do you have any experience with similar setup? thanks!

Read the article
What is Peformance Monitor telling me when my page faults / second are high?

- by David Robison

I have a Windows 7 x64 computer that is having performance issues. After some investigation, I have discovered that the page faults / second on it, as reported by Performance Monitor, are really high. Everything else seems to be normal. Resource Monitor reports no hard faults and lots of available memory. Is this a potential cause for problems, or is it a red herring? If it is something that could be causing problems, what should I do next to figure out what is causing it? Here is a screen shot of the Performance Monitor. Notice that the average page faults / second is 75,887. On another computer that does not have problems, this number is closer to 3,000. Here is a screen shot of the Resource Monitor, sorted by hard faults / second, which is currently 0 for all processes.

Read the article
When to use Nginx PHP Fast CGI with a TCP socket instead of a UNIX socket?

- by user64204

I've followed this guide to setup PHP in FastCGI mode with Nginx. This guide describes 2 ways of doing it: TCP socket and UNIX socket. I've ran some Apache Benchmark on my locale machine and here are the results: Below tests ran multiple times to get better average statistics: $ ab -c 200 -n 100000 http://.... APACHE: 1800 req/sec NGINX (TCP socket): 2500 req/sec NGINX (UNIX socket): 15000 req/sec As far as I understand, there is overhead with using a TCP socket rather than a UNIX socket, hence the better performance with the latter. However I was not expecting such a performance difference given that the TCP socket is on the localhost, and therefore would like to ask the following question: Q: Given the huge performance gain with using a UNIX socket, what are the configuration scenarios where it would make sense to use a TCP socket instead?

Read the article
Processor upgrade on a laptop vs. Ram upgrade. Also does ram always matter?

- by Evan

I have a Dell Inspiron 14r (N4110) with an Intel Core i3 and 4gb of ram. It runs very smoothly, however gaming on this laptop is very limited. This is mostly because of integrated graphics but i have seen a computer with a Core i5, and very similar specs otherwise, run games that the N4110 cannot. This other computer has integrated graphics and 6gb of ram. I am wondering whether upgrading ram or upgrading the processor make the most difference in performance. Which setup would get better performance, an i5 with 4gb of ram or an i3 with 8 gb of ram? (Both with integrated graphics) Also, is there a certain point at which you have too much ram for the computer to ever possibly use? For instance is there really any difference in performance between 8 gb and 16 gb of ram?

Read the article
Xen Disk Performence Issues

- by user98651

I'm currently using Xen PV on CentOS 5 with my domU's as flat files running on a hardware RAID controlled (write cache enabled) formatted with XFS. On the dom0 I can get about 500MB/s in a 2GB dd write from /dev/zero however on the domU's I'm lucky if I get 10MB/s (it is usually around half that). I've tried changing the disk scheduling to NOOP on the domU's, changed some mount parameters and tweaked the performance allocations of both the dom0 (prioritize CPU) and domU's (increase RAM and VCPU allocations). None of these steps have produced any noticeable change in performance. My instinct here is that it is not a hardware problem, due to the solid performance of the dom0. Any ideas on what might be causing this problem? I'm considering moving to LVM based domU's.

Read the article
Powerpoint: a slide that counts to 2010

- by flybywire

I want a slide in powerpoint that shows the number 0 and when I click a button the number increases by one one until 2010. It should take 4 to 6 seconds. How can I do that?

Read the article
Perfmon quick rundown

- by anon

I've known quite along while about performance monitor on Windows. I have decided now to create a scheduled performance monitoring of my entire system so I can find bottlenecks for future improvements. So as you can imagine this is going to run 24/7 so I can identify peak utilization. With performance monitor on Windows 7 for example where are the logs stored (c:\perfmon)? Is there a log size? Even better is the a website that can get me up to speed with scheduling and best practices of perfmon? (I don't need an explanation of what I can monitor)

Read the article
SQL SERVER – Concurrency Basics – Guest Post by Vinod Kumar

- by pinaldave

This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions from SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. Learning is always fun when it comes to SQL Server and learning the basics again can be more fun. I did write about Transaction Logs and recovery over my blogs and the concept of simplifying the basics is a challenge. In the real world we always see checks and queues for a process – say railway reservation, banks, customer supports etc there is a process of line and queue to facilitate everyone. Shorter the queue higher is the efficiency of system (a.k.a higher is the concurrency). Every database does implement this using checks like locking, blocking mechanisms and they implement the standards in a way to facilitate higher concurrency. In this post, let us talk about the topic of Concurrency and what are the various aspects that one needs to know about concurrency inside SQL Server. Let us learn the concepts as one-liners: Concurrency can be defined as the ability of multiple processes to access or change shared data at the same time. The greater the number of concurrent user processes that can be active without interfering with each other, the greater the concurrency of the database system. Concurrency is reduced when a process that is changing data prevents other processes from reading that data or when a process that is reading data prevents other processes from changing that data. Concurrency is also affected when multiple processes are attempting to change the same data simultaneously. Two approaches to managing concurrent data access: Optimistic Concurrency Model Pessimistic Concurrency Model Concurrency Models Pessimistic Concurrency Default behavior: acquire locks to block access to data that another process is using. Assumes that enough data modification operations are in the system that any given read operation is likely affected by a data modification made by another user (assumes conflicts will occur). Avoids conflicts by acquiring a lock on data being read so no other processes can modify that data. Also acquires locks on data being modified so no other processes can access the data for either reading or modifying. Readers block writer, writers block readers and writers. Optimistic Concurrency Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying. Default behavior of optimistic concurrency is to use row versioning to allow data readers to see the state of the data before the modification occurs. Older versions of the data are saved so a process reading data can see the data as it was when the process started reading and not affected by any changes being made to that data. Processes modifying the data is unaffected by processes reading the data because the reader is accessing a saved version of the data rows. Readers do not block writers and writers do not block readers, but, writers can and will block writers. Transaction Processing A transaction is the basic unit of work in SQL Server. Transaction consists of SQL commands that read and update the database but the update is not considered final until a COMMIT command is issued (at least for an explicit transaction: marked with a BEGIN TRAN and the end is marked by a COMMIT TRAN or ROLLBACK TRAN). Transactions must exhibit all the ACID properties of a transaction. ACID Properties Transaction processing must guarantee the consistency and recoverability of SQL Server databases. Ensures all transactions are performed as a single unit of work regardless of hardware or system failure. A – Atomicity C – Consistency I – Isolation D- Durability Atomicity: Each transaction is treated as all or nothing – it either commits or aborts. Consistency: ensures that a transaction won’t allow the system to arrive at an incorrect logical state – the data must always be logically correct. Consistency is honored even in the event of a system failure. Isolation: separates concurrent transactions from the updates of other incomplete transactions. SQL Server accomplishes isolation among transactions by locking data or creating row versions. Durability: After a transaction commits, the durability property ensures that the effects of the transaction persist even if a system failure occurs. If a system failure occurs while a transaction is in progress, the transaction is completely undone, leaving no partial effects on data. Transaction Dependencies In addition to supporting all four ACID properties, a transaction might exhibit few other behaviors (known as dependency problems or consistency problems). Lost Updates: Occur when two processes read the same data and both manipulate the data, changing its value and then both try to update the original data to the new value. The second process might overwrite the first update completely. Dirty Reads: Occurs when a process reads uncommitted data. If one process has changed data but not yet committed the change, another process reading the data will read it in an inconsistent state. Non-repeatable Reads: A read is non-repeatable if a process might get different values when reading the same data in two reads within the same transaction. This can happen when another process changes the data in between the reads that the first process is doing. Phantoms: Occurs when membership in a set changes. It occurs if two SELECT operations using the same predicate in the same transaction return a different number of rows. Isolation Levels SQL Server supports 5 isolation levels that control the behavior of read operations. Read Uncommitted All behaviors except for lost updates are possible. Implemented by allowing the read operations to not take any locks, and because of this, it won’t be blocked by conflicting locks acquired by other processes. The process can read data that another process has modified but not yet committed. When using the read uncommitted isolation level and scanning an entire table, SQL Server can decide to do an allocation order scan (in page-number order) instead of a logical order scan (following page pointers). If another process doing concurrent operations changes data and move rows to a new location in the table, the allocation order scan can end up reading the same row twice. Also can happen if you have read a row before it is updated and then an update moves the row to a higher page number than your scan encounters later. Performing an allocation order scan under Read Uncommitted can cause you to miss a row completely – can happen when a row on a high page number that hasn’t been read yet is updated and moved to a lower page number that has already been read. Read Committed Two varieties of read committed isolation: optimistic and pessimistic (default). Ensures that a read never reads data that another application hasn’t committed. If another transaction is updating data and has exclusive locks on data, your transaction will have to wait for the locks to be released. Your transaction must put share locks on data that are visited, which means that data might be unavailable for others to use. A share lock doesn’t prevent others from reading but prevents them from updating. Read committed (snapshot) ensures that an operation never reads uncommitted data, but not by forcing other processes to wait. SQL Server generates a version of the changed row with its previous committed values. Data being changed is still locked but other processes can see the previous versions of the data as it was before the update operation began. Repeatable Read This is a Pessimistic isolation level. Ensures that if a transaction revisits data or a query is reissued the data doesn’t change. That is, issuing the same query twice within a transaction cannot pickup any changes to data values made by another user’s transaction because no changes can be made by other transactions. However, this does allow phantom rows to appear. Preventing non-repeatable read is a desirable safeguard but cost is that all shared locks in a transaction must be held until the completion of the transaction. Snapshot Snapshot Isolation (SI) is an optimistic isolation level. Allows for processes to read older versions of committed data if the current version is locked. Difference between snapshot and read committed has to do with how old the older versions have to be. It’s possible to have two transactions executing simultaneously that give us a result that is not possible in any serial execution. Serializable This is the strongest of the pessimistic isolation level. Adds to repeatable read isolation level by ensuring that if a query is reissued rows were not added in the interim, i.e, phantoms do not appear. Preventing phantoms is another desirable safeguard, but cost of this extra safeguard is similar to that of repeatable read – all shared locks in a transaction must be held until the transaction completes. In addition serializable isolation level requires that you lock data that has been read but also data that doesn’t exist. Ex: if a SELECT returned no rows, you want it to return no. rows when the query is reissued. This is implemented in SQL Server by a special kind of lock called the key-range lock. Key-range locks require that there be an index on the column that defines the range of values. If there is no index on the column, serializable isolation requires a table lock. Gets its name from the fact that running multiple serializable transactions at the same time is equivalent of running them one at a time. Now that we understand the basics of what concurrency is, the subsequent blog posts will try to bring out the basics around locking, blocking, deadlocks because they are the fundamental blocks that make concurrency possible. Now if you are with me – let us continue learning for SQL Server Locking Basics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Concurrency

Read the article
Demantra 7.3.1.3 Controlling MDP_MATRIX Combinations Assigned to Forecasting Tasks Using TargetTaskSize

- by user702295

New 7.3.1.3 parameter: TargetTaskSize Old parameter: BranchID Multiple, deprecated 7.3.1.3 onwards Parameter Location: Parameters > System Parameters > Engine > Proport Default: 0 Engine Mode: Both Details: Specifies how many MDP_MATRIX combinations the analytical engine attempts to assign to each forecasting task. Allocation will be affected by forecsat tree branch size. TaskTargetSize is automcatically calculated. It holds the perferred branch size, in number of combinations in the lowest level. This parameter is adjusted to a lower value for smaller schemas, depending on the number of available engines. - As the forecast is generated the engine goes up the tree using max_fore_level and not top_level -1. Max_fore_level has to be less than or equal to top_level -1. Due to this requirement, combinations falling under the same top level -1 member must be in the same task. A member of the top level -1 of the forecast tree is known as a branch. An engine task is therefore comprised of one or more branches. - Reveal current task size go to Engine Administrator --> View --> Branch Information and run the application on your Demantra schema. This will be deprecated in 7.3.1.3 since there is no longer a means of adjusting the brach size directly. The focus is now on proper hierarchy / forecast design. - Control of tasks The number of tasks created is the lowest of number of branches, as defined by top level -1 members in forecast tree, and engine sessions and the value of TargetTaskSize. You are used to using the branch multiplier in this calculation. As of 7.3.1.3, the branch ID multiple is deprecated. - Discovery of current branch size To resolve this you must review the 2nd highest level in the forecast tree (below highest/highest) as this is the level which determines the size of the branches. If a few resulting tasks are too large it is recommended that the forecast tree level driving branches be revised or at times completely removed from the forecast tree. - Control of foreacast tree branch size - Run the following sql to determine how even the branches are being split by the engine: select count(*),branch_id from mdp_matrix where prediction_status = 1 and do_fore = 1 group by branch_id; This will give you an understanding if some of the individual branches have an unusually large number of rows and thus might indicate that the engine is not efficiently dividing up the parallel tasks. - Based on the results of this sql, we may want to adjust the branch id multiplier and/or the number of engines (both of these settings are found in the Engine Administrator) select count(*), level_id from mdp_matrix where prediction_status = 1 and do_fore = 1 group by level_id; This will give us an understanding at which level of the Forecast tree where the forecast is being generated. Having a majority of combinations higher on the forecast tree might indicate either a poorly designed forecast tree and/or engine parameters that are too strict Based on the results of this we would adjust the Forecast Tree to see if choosing a different hierarchy might produce a forecast, with more combinations, at a lower level. For example: - Review the 2nd highest level in the forecast tree, below highest/highest, as this is the level which determines the size of the branches. - If a few resulting tasks are too large it is recommended that the forecast tree level driving branches be revised or at times completely removed from the forecast tree. - For example, if the highest level of the forecast tree is set to Brand/All Locations. - You have 10 brands but 2 of the brands account for 67% and 29% of all combinations. - There is a distinct possibility that the tasks resulting from these 2 branches will be too large for a single engine to process. Some possible solutions could be to remove the Brand level and instead use a different product grouping which has a more even distribution, possibly Product Group. - It is also possible to add a location dimension to this forecast tree level, for example Customer. This will also reduce forecast tree branch size and will deliver a balanced task allocation. - A correctly configured Forecast Tree is something that is done by the Implementation team and is not the responsibility of Oracle Support. Allocation will be affected by forecast tree branch size. When TargetTaskSize is set to 0, the default value, the system automatically calculates a value for 'TargetTaskSize' depending on the number of engines. - QUESTION: Does this mean that if TargetTaskSize is 1, we use tree branch size to allocate branches to tasks instead of automatically calculating the size? ANSWER: DEV Strongly recommends that the setting of TargetTaskSize remain at the DEFAULT of ZERO (0). - How to control the number of engines? Determine how many CPUs are on the machine(s) that is (are) running the engine. As mentioned earlier, the general rule is that you should designate 2 engines per each CPU that is available. So for example, if you are running the engine on a machine that has 4 CPU then you can have up to 8 engines designated in the Engine Administrator. In this type of architecture then instead of having one 'localhost' in your Engine Settings Screen, you would have 'localhost' repeated eight times in this field. Where do I set the number of engines? To add multiples computers where engine will run, please do a back-up of Settings.xml file under Analytical Engines\bin\ folder, then edit it and add there the selected machines. Example, this will allow 3 engines to start: - <Entry> <Key argument="ComputerNames" /> <Value type="string" argument="localhost,localhost,localhost" /> </Entry Otherwise, if there are no additional engines defined, the calculated value of 'TargetTaskSize' is used. (Oracle does not recommend changing the default value.) The TargetTaskSize holds the engines prefered branch size, in number of level 1 combinations. - Level 1 combinations, known as group size The engine manager will use this parameter to attempt creating branches with similar size. * The engine manager will not create engines that do not have a branch. The engine divider algorithm uses the value of 'TargetTaskSize' as a system-preferred branch size to create branches that are more equal in size which improves engine performance. The engine divider will try to add as many tasks as possible to an existing branch, up to the limit of 'TargetTaskSize' level 1 combinations, before adding new branches. Coming up next: - The engine divider - Group size - Level 1 combinations - MAX_FORE_LEVEL - Engine Parameters

Read the article
Are SharePoint site templates really less performant than site definitions?

- by Jim

So, it seems in the SharePoint blogosphere that everybody just copies and pastes the same bullet points from other blogs. One bullet point I've seen is that SharePoint site templates are less performant than site definitions because site definitions are stored on the file system. Is that true? It seems odd that site templates would be less performant. It's my understanding that all site content lives in a database, whether you use a site template or a site definition. A site template is applied once to the database, and from then on the site should not care if the content was created using a site template or not. So, does anybody have an architectural reason why a site template would be less performant than a site definition? Edit: Links to the blogs that say there is a performance difference: From MSDN: Because it is slow to store templates in and retrieve them from the database, site templates can result in slower performance. From DevX: However, user templates in SharePoint can lead to performance problems and may not be the best approach if you're trying to create a set of reusable templates for an entire organization. From IT Footprint: Because it is slow to store templates in and retrieve them from the database, site templates can result in slower performance. Templates in the database are compiled and executed every time a page is rendered. From Branding SharePoint:Custom site definitions hold the following advantages over custom templates: Data is stored directly on the Web servers, so performance is typically better. At a minimum, I think the above articles are incomplete, and I think several are misleading based on what I know of SharePoints architecture. I read another blog post that argued against the performance differences, but I can't find the link.

Read the article
.NET Code Evolution

- by Alois Kraus

Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/07/24/153504.aspxAt my day job I do look at a lot of code written by other people. Most of the code is quite good and some is even a masterpiece. And there is also code which makes you think WTF… oh it was written by me. Hm not so bad after all. There are many excuses reasons for bad code. Most often it is time pressure followed by not enough ambition (who cares) or insufficient training. Normally I do care about code quality quite a lot which makes me a (perceived) slow worker who does write many tests and refines the code quite a lot because of the design deficiencies. Most of the deficiencies I do find by putting my design under stress while checking for invariants. It does also help a lot to step into the code with a debugger (sometimes also Windbg). I do this much more often when my tests are red. That way I do get a much better understanding what my code really does and not what I think it should be doing. This time I do want to show you how code can evolve over the years with different .NET Framework versions. Once there was time where .NET 1.1 was new and many C++ programmers did switch over to get rid of not initialized pointers and memory leaks. There were also nice new data structures available such as the Hashtable which is fast lookup table with O(1) time complexity. All was good and much code was written since then. At 2005 a new version of the .NET Framework did arrive which did bring many new things like generics and new data structures. The “old” fashioned way of Hashtable were coming to an end and everyone used the new Dictionary<xx,xx> type instead which was type safe and faster because the object to type conversion (aka boxing) was no longer necessary. I think 95% of all Hashtables and dictionaries use string as key. Often it is convenient to ignore casing to make it easy to look up values which the user did enter. An often followed route is to convert the string to upper case before putting it into the Hashtable. Hashtable Table = new Hashtable(); void Add(string key, string value) { Table.Add(key.ToUpper(), value); } This is valid and working code but it has problems. First we can pass to the Hashtable a custom IEqualityComparer to do the string matching case insensitive. Second we can switch over to the now also old Dictionary type to become a little faster and we can keep the the original keys (not upper cased) in the dictionary. Dictionary<string, string> DictTable = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); void AddDict(string key, string value) { DictTable.Add(key, value); } Many people do not user the other ctors of Dictionary because they do shy away from the overhead of writing their own comparer. They do not know that .NET has for strings already predefined comparers at hand which you can directly use. Today in the many core area we do use threads all over the place. Sometimes things break in subtle ways but most of the time it is sufficient to place a lock around the offender. Threading has become so mainstream that it may sound weird that in the year 2000 some guy got a huge incentive for the idea to reduce the time to process calibration data from 12 hours to 6 hours by using two threads on a dual core machine. Threading does make it easy to become faster at the expense of correctness. Correct and scalable multithreading can be arbitrarily hard to achieve depending on the problem you are trying to solve. Lets suppose we want to process millions of items with two threads and count the processed items processed by all threads. A typical beginners code might look like this: int Counter; void IJustLearnedToUseThreads() { var t1 = new Thread(ThreadWorkMethod); t1.Start(); var t2 = new Thread(ThreadWorkMethod); t2.Start(); t1.Join(); t2.Join(); if (Counter != 2 * Increments) throw new Exception("Hmm " + Counter + " != " + 2 * Increments); } const int Increments = 10 * 1000 * 1000; void ThreadWorkMethod() { for (int i = 0; i < Increments; i++) { Counter++; } } It does throw an exception with the message e.g. “Hmm 10.222.287 != 20.000.000” and does never finish. The code does fail because the assumption that Counter++ is an atomic operation is wrong. The ++ operator is just a shortcut for Counter = Counter + 1 This does involve reading the counter from a memory location into the CPU, incrementing value on the CPU and writing the new value back to the memory location. When we do look at the generated assembly code we will see only inc dword ptr [ecx+10h] which is only one instruction. Yes it is one instruction but it is not atomic. All modern CPUs have several layers of caches (L1,L2,L3) which try to hide the fact how slow actual main memory accesses are. Since cache is just another word for redundant copy it can happen that one CPU does read a value from main memory into the cache, modifies it and write it back to the main memory. The problem is that at least the L1 cache is not shared between CPUs so it can happen that one CPU does make changes to values which did change in meantime in the main memory. From the exception you can see we did increment the value 20 million times but half of the changes were lost because we did overwrite the already changed value from the other thread. This is a very common case and people do learn to protect their data with proper locking. void Intermediate() { var time = Stopwatch.StartNew(); Action acc = ThreadWorkMethod_Intermediate; var ar1 = acc.BeginInvoke(null, null); var ar2 = acc.BeginInvoke(null, null); ar1.AsyncWaitHandle.WaitOne(); ar2.AsyncWaitHandle.WaitOne(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Intermediate did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Intermediate() { for (int i = 0; i < Increments; i++) { lock (this) { Counter++; } } } This is better and does use the .NET Threadpool to get rid of manual thread management. It does give the expected result but it can result in deadlocks because you do lock on this. This is in general a bad idea since it can lead to deadlocks when other threads use your class instance as lock object. It is therefore recommended to create a private object as lock object to ensure that nobody else can lock your lock object. When you read more about threading you will read about lock free algorithms. They are nice and can improve performance quite a lot but you need to pay close attention to the CLR memory model. It does make quite weak guarantees in general but it can still work because your CPU architecture does give you more invariants than the CLR memory model. For a simple counter there is an easy lock free alternative present with the Interlocked class in .NET. As a general rule you should not try to write lock free algos since most likely you will fail to get it right on all CPU architectures. void Experienced() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Experienced); t1.Wait(); t2.Wait(); if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Experienced did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Experienced() { for (int i = 0; i < Increments; i++) { Interlocked.Increment(ref Counter); } } Since time does move forward we do not use threads explicitly anymore but the much nicer Task abstraction which was introduced with .NET 4 at 2010. It is educational to look at the generated assembly code. The Interlocked.Increment method must be called which does wondrous things right? Lets see: lock inc dword ptr [eax] The first thing to note that there is no method call at all. Why? Because the JIT compiler does know very well about CPU intrinsic functions. Atomic operations which do lock the memory bus to prevent other processors to read stale values are such things. Second: This is the same increment call prefixed with a lock instruction. The only reason for the existence of the Interlocked class is that the JIT compiler can compile it to the matching CPU intrinsic functions which can not only increment by one but can also do an add, exchange and a combined compare and exchange operation. But be warned that the correct usage of its methods can be tricky. If you try to be clever and look a the generated IL code and try to reason about its efficiency you will fail. Only the generated machine code counts. Is this the best code we can write? Perhaps. It is nice and clean. But can we make it any faster? Lets see how good we are doing currently. Level Time in s IJustLearnedToUseThreads Flawed Code Intermediate 1,5 (lock) Experienced 0,3 (Interlocked.Increment) Master 0,1 (1,0 for int[2]) That lock free thing is really a nice thing. But if you read more about CPU cache, cache coherency, false sharing you can do even better. int[] Counters = new int[12]; // Cache line size is 64 bytes on my machine with an 8 way associative cache try for yourself e.g. 64 on more modern CPUs void Master() { var time = Stopwatch.StartNew(); Task t1 = Task.Factory.StartNew(ThreadWorkMethod_Master, 0); Task t2 = Task.Factory.StartNew(ThreadWorkMethod_Master, Counters.Length - 1); t1.Wait(); t2.Wait(); Counter = Counters[0] + Counters[Counters.Length - 1]; if (Counter != 2 * Increments) throw new Exception(String.Format("Hmm {0:N0} != {1:N0}", Counter, 2 * Increments)); Console.WriteLine("Master did take: {0:F1}s", time.Elapsed.TotalSeconds); } void ThreadWorkMethod_Master(object number) { int index = (int) number; for (int i = 0; i < Increments; i++) { Counters[index]++; } } The key insight here is to use for each core its own value. But if you simply use simply an integer array of two items, one for each core and add the items at the end you will be much slower than the lock free version (factor 3). Each CPU core has its own cache line size which is something in the range of 16-256 bytes. When you do access a value from one location the CPU does not only fetch one value from main memory but a complete cache line (e.g. 16 bytes). This means that you do not pay for the next 15 bytes when you access them. This can lead to dramatic performance improvements and non obvious code which is faster although it does have many more memory reads than another algorithm. So what have we done here? We have started with correct code but it was lacking knowledge how to use the .NET Base Class Libraries optimally. Then we did try to get fancy and used threads for the first time and failed. Our next try was better but it still had non obvious issues (lock object exposed to the outside). Knowledge has increased further and we have found a lock free version of our counter which is a nice and clean way which is a perfectly valid solution. The last example is only here to show you how you can get most out of threading by paying close attention to your used data structures and CPU cache coherency. Although we are working in a virtual execution environment in a high level language with automatic memory management it does pay off to know the details down to the assembly level. Only if you continue to learn and to dig deeper you can come up with solutions no one else was even considering. I have studied particle physics which does help at the digging deeper part. Have you ever tried to solve Quantum Chromodynamics equations? Compared to that the rest must be easy ;-). Although I am no longer working in the Science field I take pride in discovering non obvious things. This can be a very hard to find bug or a new way to restructure data to make something 10 times faster. Now I need to get some sleep ….

Read the article
How significant is the bazaar performance factor?

- by memodda

I hear all this stuff about bazaar being slower than git. I haven't used too much distributed version control yet, but in Bazaar vs. Git on the bazaar site, they say that most complaints about performance aren't true anymore. Have you found this to be true? Is performance pretty much on par now? I've heard that speed can affect workflow (people are more likely to do good thing X if X is fast). What specific cases does performance currently affect workflow in bazaar vs other systems (especially git), and how? I'm just trying to get at why performance is of particular importance. Usually when I check something in or update it, I expect it to take a little while, but it doesn't matter. I commit/update when I have a second, so it doesn't interfere with my productivity. But then I haven't used DVCS yet, so maybe that has something to do with it?

Read the article
How to improve Java performance on Informix for Windows

- by Michal Niklas

I have problem with performance of Java UDR functions on Informix on Windows. On this server I already have some functions in C and SPL. I chose one function to write it in those 3 languages and I measured performance of this function on test table. Function calculates some kind of checksum so it does not use any db libraries etc. only string and math operations. I observed performance on 30k records with SQL like: select function(txt) from _tmp_perf_test and I changed function to 'function_c, function_spl or function_java. My performance tests showed that C function is the fastest, SPL function is about 5 times slower, where Java is 100 (one hundred!) times slower than C. I checked it few times and 1:100 ratio didn't improve. I changed Java function to simply return length of the string but even this do not help so it looks, that there is general problem with Java function invocation, because there was no difference in time between Java function that calculate checksum and Java function that returns length of the string. I increased JVM_MAX_HEAP_SIZE to 128 and it not helped too. I use IBM Informix Dynamic Server Version 11.50.TC6DE. The same test on Linux server: IBM Informix Dynamic Server Version 11.50.FC6 show more "normal" results, i.e. Java is slower from C and SPL but only 2 to 5 times. What can I do to improve Java performance on Informix server on Windows? More info about Java on servers: c:\Informix\extend\krakatoa\jre\bin>java -version java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pwi32dev-20081129a (SR9-0 )) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows Server 2003 x86-32 j9vmwi3223-20081129 (JIT enabled) J9VM - 20081126_26240_lHdSMr JIT - 20081112_1511ifx1_r8 GC - 200811_07) JCL - 20081129 [root@informix11 bin]# ./java -version java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pxa64devifx-20071025 (SR6b)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423-20071005 (JIT enabled) J9VM - 20071004_14218_LHdSMr JIT - 20070820_1846ifx1_r8 GC - 200708_10) JCL - 20071025

Read the article
How can serialisation/deserialisation improve performance?

- by dotnetdev

Hi, I heard an example at work of using serialization to serialise some values for a webpart (which come from class properties), as this improves performance (than I assume getting values/etting values from/to database). I know it is not possible to get an explanation of how performance in the scenario I speak of at work can be improved as there is not enough information, but is there any explanation of how serialization can generally improve performance? Thanks

Read the article
Performance Overhead of Perf Event Subsystem in Linux Kernel

- by Bo Xiao

Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well. Since 2.6.33, the kernel provide 'perf_event_create_kernel_counter' kernel api for developers to create kernel counter to collect system runtime information. What I concern most is the performance impact on overall system when tracepoint/ftrace is enabled. There are no docs I can find about them. I was once told that ftrace was implemented by dynamically patching code, will it slow the system dramatically?

Read the article
Java 1.4 Class performance on 1.5 JVM

- by user222164

Switching from JVM 1.4 to 1.5 has performance benefits as per release notes. http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#performance We have Java 1.4 compiled classes which are run on 1.5 JVM, will these classes suffer in performance because they were compiled using 1.4 ?

Read the article

Search Results

Search found 14282 results on 572 pages for 'performance counter'.

Page 71/572 | < Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78 | Next Page >

- by pinaldave

- by dunxd

- by Alistair Bell

- by petrus

- by zoolii

- by djechelon

- by Daniel

- by mhouston100

- by IanC

- by pQd

- by David Robison

- by user64204

- by Evan

- by user98651

- by flybywire

- by anon

- by pinaldave

- by user702295

- by Jim

- by Alois Kraus

- by memodda

- by Michal Niklas

- by dotnetdev

- by Bo Xiao

- by user222164

< Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78 | Next Page >