data load performance - Page 11

SQL SERVER – Speed Up! – Parallel Processes and Unparalleled Performance – TechEd 2012 India

- by pinaldave

TechEd India 2012 is just around the corner and I will be presenting there on two different session. SQL Server Performance Tuning is a very challenging subject that requires expertise in Database Administration and Database Development. I always have enjoyed talking about SQL Server Performance tuning subject. Just like doctors I like to call my every attempt to improve the performance of SQL Server queries and database server as a practice too. I have been working with SQL Server for more than 8 years and I believe that many of the performance tuning concept I have mastered. However, performance tuning is not a simple subject. However there are occasions when I feel stumped, there are occasional when I am not sure what should be the next step. When I face situation where I cannot figure things out easily, it makes me most happy because I clearly see this as a learning opportunity. I have been presenting in TechEd India for last three years. This is my fourth time opportunity to present a technical session on SQL Server. Just like every other year, I decided to present something different, something which I have spend years of learning. This time, I am going to present about parallel processes. It is widely believed that more the CPU will improve performance of the server. It is true in many cases. However, there are cases when limiting the CPU usages have improved overall health of the server. I will be presenting on the subject of Parallel Processes and its effects. I have spent more than a year working on this subject only. After working on various queries on multi-CPU systems I have personally learned few things. In coming TechEd session, I am going to share my experience with parallel processes and performance tuning. Session Details Title: Speed Up! – Parallel Processes and Unparalleled Performance (Add to Calendar) Abstract: “More CPU More Performance” – A very common understanding is that usage of multiple CPUs can improve the performance of the query. To get maximum performance out of any query – one has to master various aspects of the parallel processes. In this deep dive session, we will explore this complex subject with a very simple interactive demo. An attendee will walk away with proper understanding of CX_PACKET wait types, MAXDOP, parallelism threshold and various other concepts. Date and Time: March 23, 2012, 12:15 to 13:15 Location: Hotel Lalit Ashok - Kumara Krupa High Grounds, Bengaluru – 560001, Karnataka, India. Add to Calendar Please submit your questions in the comments area and I will be for sure discussing them during my session. If I pick your question to discuss during my session, here is your gift I commit right now – SQL Server Interview Questions and Answers Book. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article

SQL server virtual memory usage and performance

- by user365035

Hello, I have a very large DB used mostly for analytics. The performance overall is very sluggish. I just noticed that when running the query below, the amount of virtual memory used greatly exceeds the amount of physical memory available. Currently, physical memory is 10GB (10238k bytes) whereas the virtual memory returns significantly more - 8388607k bytes. That seems really wrong, but I'm at a bit of a loss on how to proceed. USE [master]; GO select cpu_count , hyperthread_ratio , physical_memory_in_bytes / 1048576 as 'mem_MB' , virtual_memory_in_bytes / 1048576 as 'virtual_mem_MB' , max_workers_count , os_error_mode , os_priority_class from sys.dm_os_sys_info

Read the article

Fixing predicated NSFetchedResultsController/NSFetchRequest performance with SQLite backend?

- by Jaanus

I have a series of NSFetchedResultsControllers powering some table views, and their performance on device was abysmal, on the order of seconds. Since it all runs on main thread, it's blocking my app at startup, which is not great. I investigated and turns out the predicate is the problem: NSPredicate *somePredicate = [NSPredicate predicateWithFormat:@"ANY somethings == %@", something]; [fetchRequest setPredicate:somePredicate]; I.e the fetch entity, call it "things", has a many-to-many relation with entity "something". This predicate is a filter that limits the results to only things that have a relation with a particular "something". When I removed the predicate for testing, fetch time (the initial performFetch: call) dropped (for some extreme cases) from 4 seconds to around 100ms or less, which is acceptable. I am troubled by this, though, as it negates a lot of the benefit I was hoping to gain with Core Data and NSFRC, which otherwise seems like a powerful tool. So, my question is, how can I optimize this performance? Am I using the predicate wrong? Should I modify the model/schema somehow? And what other ways there are to fix this? Is this kind of degraded performance to be expected? (There are on the order of hundreds of <1KB objects.) EDIT WITH DETAILS: Here's the code: [fetchRequest setFetchLimit:200]; NSLog(@"before fetch"); BOOL success = [frc performFetch:&error]; if (!success) { NSLog(@"Fetch request error: %@", error); } NSLog(@"after fetch"); Updated logs (previously, I had some application inefficiencies degrading the performance here. These are the updated logs that should be as close to optimal as you can get under my current environment): 2010-02-05 12:45:22.138 Special Ppl[429:207] before fetch 2010-02-05 12:45:22.144 Special Ppl[429:207] CoreData: sql: SELECT DISTINCT 0, t0.Z_PK, t0.Z_OPT, <model fields> FROM ZTHING t0 LEFT OUTER JOIN Z_1THINGS t1 ON t0.Z_PK = t1.Z_2THINGS WHERE t1.Z_1SOMETHINGS = ? ORDER BY t0.ZID DESC LIMIT 200 2010-02-05 12:45:22.663 Special Ppl[429:207] CoreData: annotation: sql connection fetch time: 0.5094s 2010-02-05 12:45:22.668 Special Ppl[429:207] CoreData: annotation: total fetch execution time: 0.5240s for 198 rows. 2010-02-05 12:45:22.706 Special Ppl[429:207] after fetch If I do the same fetch without predicate (by commenting out the two lines in the beginning of the question): 2010-02-05 12:44:10.398 Special Ppl[414:207] before fetch 2010-02-05 12:44:10.405 Special Ppl[414:207] CoreData: sql: SELECT 0, t0.Z_PK, t0.Z_OPT, <model fields> FROM ZTHING t0 ORDER BY t0.ZID DESC LIMIT 200 2010-02-05 12:44:10.426 Special Ppl[414:207] CoreData: annotation: sql connection fetch time: 0.0125s 2010-02-05 12:44:10.431 Special Ppl[414:207] CoreData: annotation: total fetch execution time: 0.0262s for 200 rows. 2010-02-05 12:44:10.457 Special Ppl[414:207] after fetch 20-fold difference in times. 500ms is not that great, and there does not seem to be a way to do it in background thread or otherwise optimize that I can think of. (Apart from going to a binary store where this becomes a non-issue, so I might do that. Binary store performance is consistently ~100ms for the above 200-object predicated query.) (I nested another question here previously, which I now moved away).

Read the article

Books and resources for Java Performance tuning - when working with databases, huge lists

- by Arvind

Hi All, I am relatively new to working on huge applications in Java. I am working on a Java web service which is pretty heavily used by various clients. The service basically queries the database (hibernate) and then works with a lot of Lists (there are adapters to convert list returned from DB to the interface which the service publishes) and I am seeing lot of issues with the service like high CPU usage or high heap space. While I can troubleshoot the performance issues using a profiler, I want to actually learn about what all I need to take care when I actually write code. Like what kind of List to use or things like using StringBuilder instead of String, etc... Is there any book or blogs which I can refer which will help me while I write new services? Also my application is multithreaded - each service call from a client is a new thread, and I want to know some best practices around that area as well. I did search the web but I found many tips which are not relevant in the latest Java 6 releases, so wanted to know what kind of resources would help a developer starting out now on Java for heavily used applications. Arvind

Read the article

SQL SERVER – Plan Recompilation and Reduce Recompilation – Performance Tuning

- by pinaldave

Recompilation process is same as compilation and degrades server performance. In SQL Server 2000 and earlier versions, this was a serious issue but in SQL server 2005, the severity of this issue has been significantly reduced by introducing a new feature called Statement-level recompilation. When SQL Server 2005 recompiles stored procedures, only the statement that [...]

Read the article

SQL SERVER – Improve Performance by Reducing IO – Creating Covered Index

- by pinaldave

This blog post is in the response of the T-SQL Tuesday #004: IO by Mike Walsh. The subject of this month is IO. Here is my quick blog post on how Cover Index can Improve Performance by Reducing IO. Let us kick off this post with disclaimers about Index. Index is a very complex subject and [...]

Read the article

.NET Reflector 7.2 Early Access Build 2 Released: Performance Critical

- by Bart Read

I've just posted a write-up of some of the performance tuning I've done to improve .NET Reflector 7.2's start-up time here: http://www.reflector.net/2011/05/net-reflector-7-start-up-time-running-out-of-gas-or-pedal-to-the-metal/ You can get the new build from the .NET Reflector homepage at http://www.reflector.net/. Please remember to give us your feedback in the forum, at http://forums.reflector.net/, using the tags #7.2 and #eap. Technorati Tags: reflector,early access,7.2

Read the article

OBIEE 11.1.1 - Built-in BI Metrics for Performance Monitoring

- by Ahmed Awan

You can use Fusion Middleware Control metrics to monitor System Components (BI processes) and WebLogic Server processes. Tip: · Use Oracle Enterprise Manager (EM) URL to monitor end to end OBIEE real time performance: :7001/em"http://<server>:7001/em · In Oracle Business Intelligence 11g, the perfmon URL is still valid to use i.e. :9704/analytics/saw.dll?Perfmon"http://<server>:9704/analytics/saw.dll?Perfmon

Read the article

High Performance Storage Systems for SQL Server

Rod Colledge turns his pessimistic mindset to storage systems, and describes the best way to configure the storage systems of SQL Servers for both performance and reliability. Even Rod gets a glint in his eye when he then goes on to describe the dazzling speed of solid-state storage, though he is quick to identify the risks.

Read the article

Compute Scalars, Expressions and Execution Plan Performance

- by Paul White

The humble Compute Scalar is one of the least well-understood of the execution plan operators, and usually the last place people look for query performance problems. It often appears in execution plans with a very low (or even zero) cost, which goes some way to explaining why people ignore it. Some readers will already know that a Compute Scalar can contain a call to a user-defined function, and that any T-SQL function with a BEGIN…END block in its definition can have truly disastrous consequences...(read more)

Read the article

Can frequent state changes decrease rendering performance?

- by Miro

Can frequent texture and shader binding decrease rendering performance? "Frequent" binding example: for object for material in object render part of object using that material "Low count" binding example: for material for object in material render part of object using that material I'm planning to use an octree later and with this "low count" method of rendering it can drastically increase memory consumption. So is it good idea?

Read the article

High load (and high temp) with idle processes

- by Nanne

I've got a semi-old laptop (toshiba satellite a110-228), that's appointed 'laptop for the kids' by my sister. I've installed ubuntu netbook (10.10) on it because of the lack-of memory and it seems to work fine, accept from some heat-issues. These where never a problem under windows. It looks like I've got something similar to this problem: Load is generally 1 or higher, sometimes its stuck at 0.80, but its way to high. Top/htop only show a couple of percentage CPU use (which isn't too shocking, as i'm not doing anything). At this point all the software is stock, and i'd like to keep it that way because its supposed to be the easy-to-maintain kids computer. Now I'd like to find out: What could be the cause of the high load? Could it be as this thread implies, some driver, are there other options to check? How could I see what is really keeping the system hot and bothered? How to check what runs, etc etc? I'd like to pinpoint the culprint. further steps to take for debugging? The big bad internet leads me to believe that it might be the graphics drivers. The laptop has an Intel 945M chipset, but that doesn't seem to be one of the problem childs in this manner (I read a lot abotu ATI drivers that need special isntall). I'd not only welcome hints to directly solve this (duh) but also help in starting to debug what is going on. I am really hesitant in installing an older kernel, as I want it to be stock, and easy upgradeable (because I don't live near it, it should run without me ;) ) As an afterthought: to keep the whole thing cooler, can I 'amp up' the fancontrol? Its only going "airplane" mode when the computer is 95 Celcius, which is a tad late for my taste. Top: powertop:

Read the article

Let the RAM improves performance

- by user1717079

I have a low profile machine but with a lot of fast RAM, 4 Gb, which is really an amount of memory that i probably will never use, not even an half, since i just use this machine for coding and browsing the web. The HDD is really slow and so the overall performance are bad when booting, caching or starting new program, I'm wondering if Ubuntu can provide some setting or utility to solve this situation and let my system rely more on the RAM usage.

Read the article

Big Data – Final Wrap and What Next – Day 21 of 21

- by Pinal Dave

In yesterday’s blog post we explored various resources related to learning Big Data and in this blog post we will wrap up this 21 day series on Big Data. I have been exploring various terms and technology related to Big Data this entire month. It was indeed fun to write about Big Data in 21 days but the subject of Big Data is much bigger and larger than someone can cover it in 21 days. My first goal was to write about the basics and I think we have got that one covered pretty well. During this 21 days I have received many questions and answers related to Big Data. I have covered a few of the questions in this series and a few more I will be covering in the next coming months. Now after understanding Big Data basics. I am personally going to do a list of the things next. I thought I will share the same with you as this will give you a good idea how to continue the journey of the Big Data. Build a schedule to read various Apache documentations Watch all Pluralsight Courses Explore HortonWorks Sandbox Start building presentation about Big Data – this is a great way to learn something new Present in User Groups Meetings on Big Data Topics Write more blog posts about Big Data I am going to continue learning about Big Data – I want you to continue learning Big Data. Please leave a comment how you are going to continue learning about Big Data. I will publish all the informative comments on this blog with due credit. I want to end this series with the infographic by UMUC. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Brendan Gregg's "Systems Performance: Enterprise and the Cloud"

- by user12608550

Long ago, the prerequisite UNIX performance book was Adrian Cockcroft's 1994 classic, Sun Performance and Tuning: Sparc & Solaris, later updated in 1998 as Java and the Internet. As Solaris evolved to include the invaluable DTrace observability features, new essential performance references have been published, such as Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (2006) by McDougal, Mauro, and Gregg, and DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD (2011), also by Mauro and Gregg. Much has occurred in Solaris Land since those books appeared, notably Oracle's acquisition of Sun Microsystems in 2010 and the demise of the OpenSolaris community. But operating system technologies have continued to improve markedly in recent years, driven by stunning advances in multicore processor architecture, virtualization, and the massive scalability requirements of cloud computing. A new performance reference was needed, and I eagerly waited for something that thoroughly covered modern, distributed computing performance issues from the ground up. Well, there's a new classic now, authored yet again by Brendan Gregg, former Solaris kernel engineer at Sun and now Lead Performance Engineer at Joyent. Systems Performance: Enterprise and the Cloud is a modern, very comprehensive guide to general system performance principles and practices, as well as a highly detailed reference for specific UNIX and Linux observability tools used to examine and diagnose operating system behaviour. It provides thorough definitions of terms, explains performance diagnostic Best Practices and "Worst Practices" (called "anti-methods"), and covers key observability tools including DTrace, SystemTap, and all the traditional UNIX utilities like vmstat, ps, iostat, and many others. The book focuses on operating system performance principles and expands on these with respect to Linux (Ubuntu, Fedora, and CentOS are cited), and to Solaris and its derivatives [1]; it is not directed at any one OS so it is extremely useful as a broad performance reference. The author goes beyond the intricacies of performance analysis and shows how to interpret and visualize statistical information gathered from the observability tools. It's often difficult to extract understanding from voluminous rows of text output, and techniques are provided to assist with summarizing, visualizing, and interpreting the performance data. Gregg includes myriad useful references from the system performance literature, including a "Who's Who" of contributors to this great body of diagnostic tools and methods. This outstanding book should be required reading for UNIX and Linux system administrators as well as anyone charged with diagnosing OS performance issues. Moreover, the book can easily serve as a textbook for a graduate level course in operating systems [2]. [1] Solaris 11, of course, and Joyent's SmartOS (developed from OpenSolaris) [2] Gregg has taught system performance seminars for many years; I have also taught such courses...this book would be perfect for the OS component of an advanced CS curriculum.

Read the article

Using DB_PARAMS to Tune the EP_LOAD_SALES Performance

- by user702295

The DB_PARAMS table can be used to tune the EP_LOAD_SALES performance. The AWR report supplied shows 16 CPUs so I imaging that you can run with 8 or more parallel threads. This can be done by setting the following DB_PARAMS parameters. Note that most of parameter changes are just changing a 2 or 4 into an 8: DBHintEp_Load_SalesUseParallel = TRUE DBHintEp_Load_SalesUseParallelDML = TRUE DBHintEp_Load_SalesInsertErr = + parallel(@T_SRC_SALES@ 8) full(@T_SRC_SALES@) DBHintEp_Load_SalesInsertLd = + parallel(@T_SRC_SALES@ 8) DBHintEp_Load_SalesMergeSALES_DATA = + parallel(@T_SRC_SALES_LD@ 8) full(@T_SRC_SALES_LD@) DBHintMdp_AddUpdateIs_Fictive0SD = + parallel(s 8 ) DBHintMdp_AddUpdateIs_Fictive2SD = + parallel(s 8 )

Read the article

Partner Webcast - Focus on Oracle Data Profiling and Data Quality 11g

- by lukasz.romaszewski(at)oracle.com

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-ansi-language:RO;} Partner Webcast Focus on Oracle Data Profiling and Data Quality 11g February 24th, 12am CET Oracle offers an integrated suite Data Quality software architected to discover and correct today's data quality problems and establish a platform prepared for tomorrow's yet unknown data challenges. Oracle Data Profiling provides data investigation, discovery, and profiling in support of quality, migration, integration, stewardship, and governance initiatives. It includes a broad range of features that expand upon basic profiling, including automated monitoring, business-rule validation, and trend analysis. Oracle Data Quality for Data Integrator provides cleansing, standardization, matching, address validation, location enrichment, and linking functions for global customer data and operational business data. It ensures that data adheres to established standards that are adaptable to fit each organization's specific needs. Both single - and double - byte data are processed in local languages to provide a unique and centralized view of customers, products and services. During this in-person briefing, Data Integration Solution Specialists will be providing a technical overview and a walkthrough. Agenda · Oracle Data Integration Strategy overview · A focus on Oracle Data Profiling and Oracle Data Quality for Data Integrator: o Oracle Data Profiling o Oracle Data Quality for Data Integrator o Live demoo Q&A Delivery Format This FREE online LIVE eSeminar will be delivered over the Web and Conference Call. Registrations received less than 24hours prior to start time may not receive confirmation to attend. To register , click here. For any questions please contact [email protected].

Read the article

Parallel downloading of JavaScript files on page load

- by user359650

Below is a quote from one of the Yahoo performance pages: While a script is downloading, however, the browser won't start any other downloads, even on different hostnames. When I look at page load of our website, I can see that many scripts are being downloaded at the same time: Am I mistaken, or should the quote should instead read like this? While scripts are downloading (there can be several scripts downloading at the same time), the browser won't start any other downloads, even on different hostnames.

Read the article

Python performance: iteration and operations on nested lists

- by J.J.

Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem: Given: A mesh of nodes of size (x,y) each with a value (0...255) starting at 0 A list of N input coordinates each at a specified location within the range (0...x, 0...y) Increment the value of the node at the input coordinate and the node's neighbors within range Z up to a maximum of 255. Neighbors beyond the mesh edge are ignored. (No wrapping) BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes. Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time. Current results I have 2 current implementations: f1, f2 Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1: f1: 2.9s f2: 1.8s f1 is the initial naive implementation: three nested for loops. f2 is replaces the inner for loop with a list comprehension. Code is included below for your perusal. Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters. Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question. thanks much! f1 is the initial naive implementation: three nested for loops. 2.9s def f1(x,y,n,z): rows = [] for i in range(x): rows.append([0 for i in xrange(y)]) for i in range(n): inputX, inputY = (int(x*random.random()), int(y*random.random())) topleft = (inputX - z, inputY - z) for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)): for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)): if rows[i][j] <= 255: rows[i][j] += 1 f2 is replaces the inner for loop with a list comprehension. 1.8s def f2(x,y,n,z): rows = [] for i in range(x): rows.append([0 for i in xrange(y)]) for i in range(n): inputX, inputY = (int(x*random.random()), int(y*random.random())) topleft = (inputX - z, inputY - z) for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)): l = max(0, topleft[1]) r = min(topleft[1]+(z*2), y) rows[i][l:r] = [j+1 for j in rows[i][l:r] if j < 255]

Read the article

How to set up a load/stress test for a web site?

- by Ryan

I've been tasked with stress/load testing our company web site out of the blue and know nothing about doing so. Every search I make on google for "how to load test a web site" just comes back with various companies and software to physically do the load testing. For now I'm more interested in how to actually go about setting up a load test like what I should take into account prior to load testing, what pages within my site I should be testing load against and what things I'm going to want to monitor when doing the test. Our web site is on a multi-tier system complete with a separate database server (IIS 7 Web Server, SQL Server 2000 db). I imagine I'd want to monitor both the web server and the database server for testing load however when setting up scenarios to load test the web server I'd have to use pages that query the database to see any load on the database server at the same time. Are web servers and database servers generally tested simultaneously or are they done as separate tests? As you can see I'm pretty clueless as to the whole operation so any incite as to how to go about this would be very helpful. FYI I have been tinkering with Pylot and was able to create and run a scenario against our site but I'm not sure what I should be looking for in the results or if the scenario I created is even a scenario worth measuring for our site. Thanks in advance.

Read the article

High Load average threshold in linux

- by user2481010

My one of friend said that his server load average sometime goes above 500-1000, for me it is strange value because I never saw load average more than 10. I asked him give me some snapshot of top and memory usages, he gave following details: TOP USAGES top - 06:06:03 up 117 days, 23:02, 2 users, load average: 147.37, 44.57, 15.95 Tasks: 116 total, 2 running, 113 sleeping, 0 stopped, 1 zombie Cpu(s): 16.6%us, 6.9%sy, 0.0%ni, 9.2%id, 66.5%wa, 0.0%hi, 0.8%si, 0.0%st Mem: 8161648k total, 7779528k used, 382120k free, 3296k buffers Swap: 5242872k total, 1293072k used, 3949800k free, 168660k cached Free $ free -gt total used free shared buffers cached Mem: 7 6 1 0 0 4 -/+ buffers/cache: 1 5 Swap: 4 0 4 Total: 12 6 6 Total cpu $ nproc 8 my question is it possible load average more than 100 on 8 core,12 GB mem Server? because I read many tutorial,article on load average, it said that thumb rule is "number of cores = max load" according to thumb rule here is max load average 16 then how his server running with 147.37 load server? he said that it is least value (147.37) some time goes more than 500.

Read the article

Performance Enhancement in Full-Text Search Query

- by Calvin Sun

Ever since its first release, we are continuing consolidating and developing InnoDB Full-Text Search feature. There is one recent improvement that worth blogging about. It is an effort with MySQL Optimizer team that simplifies some common queries’ Query Plans and dramatically shorted the query time. I will describe the issue, our solution and the end result by some performance numbers to demonstrate our efforts in continuing enhancement the Full-Text Search capability. The Issue: As we had discussed in previous Blogs, InnoDB implements Full-Text index as reversed auxiliary tables. The query once parsed will be reinterpreted into several queries into related auxiliary tables and then results are merged and consolidated to come up with the final result. So at the end of the query, we’ll have all matching records on hand, sorted by their ranking or by their Doc IDs. Unfortunately, MySQL’s optimizer and query processing had been initially designed for MyISAM Full-Text index, and sometimes did not fully utilize the complete result package from InnoDB. Here are a couple examples: Case 1: Query result ordered by Rank with only top N results: mysql> SELECT FTS_DOC_ID, MATCH (title, body) AGAINST ('database') AS SCORE FROM articles ORDER BY score DESC LIMIT 1; In this query, user tries to retrieve a single record with highest ranking. It should have a quick answer once we have all the matching documents on hand, especially if there are ranked. However, before this change, MySQL would almost retrieve rankings for almost every row in the table, sort them and them come with the top rank result. This whole retrieve and sort is quite unnecessary given the InnoDB already have the answer. In a real life case, user could have millions of rows, so in the old scheme, it would retrieve millions of rows' ranking and sort them, even if our FTS already found there are two 3 matched rows. Apparently, the million ranking retrieve is done in vain. In above case, it should just ask for 3 matched rows' ranking, all other rows' ranking are 0. If it want the top ranking, then it can just get the first record from our already sorted result. Case 2: Select Count(*) on matching records: mysql> SELECT COUNT(*) FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); In this case, InnoDB search can find matching rows quickly and will have all matching rows. However, before our change, in the old scheme, every row in the table was requested by MySQL one by one, just to check whether its ranking is larger than 0, and later comes up a count. In fact, there is no need for MySQL to fetch all rows, instead InnoDB already had all the matching records. The only thing need is to call an InnoDB API to retrieve the count The difference can be huge. Following query output shows how big the difference can be: mysql> select count(*) from searchindex_inno where match(si_title, si_text) against ('people') +----------+ | count(*) | +----------+ | 666877 | +----------+ 1 row in set (16 min 17.37 sec) So the query took almost 16 minutes. Let’s see how long the InnoDB can come up the result. In InnoDB, you can obtain extra diagnostic printout by turning on “innodb_ft_enable_diag_print”, this will print out extra query info: Error log: keynr=2, 'people' NL search Total docs: 10954826 Total words: 0 UNION: Searching: 'people' Processing time: 2 secs: row(s) 666877: error: 10 ft_init() ft_init_ext() keynr=2, 'people' NL search Total docs: 10954826 Total words: 0 UNION: Searching: 'people' Processing time: 3 secs: row(s) 666877: error: 10 Output shows it only took InnoDB only 3 seconds to get the result, while the whole query took 16 minutes to finish. So large amount of time has been wasted on the un-needed row fetching. The Solution: The solution is obvious. MySQL can skip some of its steps, optimize its plan and obtain useful information directly from InnoDB. Some of savings from doing this include: 1) Avoid redundant sorting. Since InnoDB already sorted the result according to ranking. MySQL Query Processing layer does not need to sort to get top matching results. 2) Avoid row by row fetching to get the matching count. InnoDB provides all the matching records. All those not in the result list should all have ranking of 0, and no need to be retrieved. And InnoDB has a count of total matching records on hand. No need to recount. 3) Covered index scan. InnoDB results always contains the matching records' Document ID and their ranking. So if only the Document ID and ranking is needed, there is no need to go to user table to fetch the record itself. 4) Narrow the search result early, reduce the user table access. If the user wants to get top N matching records, we do not need to fetch all matching records from user table. We should be able to first select TOP N matching DOC IDs, and then only fetch corresponding records with these Doc IDs. Performance Results and comparison with MyISAM The result by this change is very obvious. I includes six testing result performed by Alexander Rubin just to demonstrate how fast the InnoDB query now becomes when comparing MyISAM Full-Text Search. These tests are base on the English Wikipedia data of 5.4 Million rows and approximately 16G table. The test was performed on a machine with 1 CPU Dual Core, SSD drive, 8G of RAM and InnoDB_buffer_pool is set to 8 GB. Table 1: SELECT with LIMIT CLAUSE mysql> SELECT si_title, match(si_title, si_text) against('family') as rel FROM si WHERE match(si_title, si_text) against('family') ORDER BY rel desc LIMIT 10; InnoDB MyISAM Times Faster Time for the query 1.63 sec 3 min 26.31 sec 127 You can see for this particular query (retrieve top 10 records), InnoDB Full-Text Search is now approximately 127 times faster than MyISAM. Table 2: SELECT COUNT QUERY mysql>select count(*) from si where match(si_title, si_text) against('family‘); +----------+ | count(*) | +----------+ | 293955 | +----------+ InnoDB MyISAM Times Faster Time for the query 1.35 sec 28 min 59.59 sec 1289 In this particular case, where there are 293k matching results, InnoDB took only 1.35 second to get all of them, while take MyISAM almost half an hour, that is about 1289 times faster!. Table 3: SELECT ID with ORDER BY and LIMIT CLAUSE for selected terms mysql> SELECT <ID>, match(si_title, si_text) against(<TERM>) as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.5 sec 5.05 sec 10.1 family film 0.95 sec 25.39 sec 26.7 Pizza restaurant orange county California 0.93 sec 32.03 sec 34.4 President united states of America 2.5 sec 36.98 sec 14.8 Table 4: SELECT title and text with ORDER BY and LIMIT CLAUSE for selected terms mysql> SELECT <ID>, si_title, si_text, ... as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.61 sec 41.65 sec 68.3 family film 1.15 sec 47.17 sec 41.0 Pizza restaurant orange county california 1.03 sec 48.2 sec 46.8 President united states of america 2.49 sec 44.61 sec 17.9 Table 5: SELECT ID with ORDER BY and LIMIT CLAUSE for selected terms mysql> SELECT <ID>, match(si_title, si_text) against(<TERM>) as rel FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) ORDER BY rel desc LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.5 sec 5.05 sec 10.1 family film 0.95 sec 25.39 sec 26.7 Pizza restaurant orange county califormia 0.93 sec 32.03 sec 34.4 President united states of america 2.5 sec 36.98 sec 14.8 Table 6: SELECT COUNT(*) mysql> SELECT count(*) FROM si_<TB> WHERE match(si_title, si_text) against (<TERM>) LIMIT 10; Term InnoDB (time to execute) MyISAM(time to execute) Times Faster family 0.47 sec 82 sec 174.5 family film 0.83 sec 131 sec 157.8 Pizza restaurant orange county califormia 0.74 sec 106 sec 143.2 President united states of america 1.96 sec 220 sec 112.2 Again, table 3 to table 6 all showing InnoDB consistently outperform MyISAM in these queries by a large margin. It becomes obvious the InnoDB has great advantage over MyISAM in handling large data search. Summary: These results demonstrate the great performance we could achieve by making MySQL optimizer and InnoDB Full-Text Search more tightly coupled. I think there are still many cases that InnoDB’s result info have not been fully taken advantage of, which means we still have great room to improve. And we will continuously explore the area, and get more dramatic results for InnoDB full-text searches. Jimmy Yang, September 29, 2012

Read the article

Performance profiler for a java application

- by Nitin Garg

I need to optimize a java application. It makes some 3rd party calls. I need some good tool to accurately measure the time taken by individual api calls. To give an idea of complexity- the application takes a data source file containing 10 lakh rows, and it takes around one hour to complete the processing. As a part of processing , it makes some 3rd party calls (including some network calls). I need to identify which calls are taking more time then others, and based on that, find out a way to optimize the application. Any suggestions would be appreciated.

Read the article

OBIEE 11.1.1 - User Interface (UI) Performance Is Slow With Internet Explorer 8

- by Ahmed A

The OBIEE 11g UI is performance is slow in IE 8 and faster in Firefox. For VPN or WAN users, it takes long time to display links on Dashboards via IE 8. Cause is IE 8 generates many HTTP 304 return calls and this caused the 11g UI slower when compared to the Mozilla FireFox browser. To resolve this issue, you can implement HTTP compression and caching. This is a best practice.Why use Web Server Compression / Caching for OBIEE? Bandwidth Savings: Enabling HTTP compression can have a dramatic improvement on the latency of responses. By compressing static files and dynamic application responses, it will significantly reduce the remote (high latency) user response time. Improves request/response latency: Caching makes it possible to suppress the payload of the HTTP reply using the 304 status code. Minimizing round trips over the Web to re-validate cached items can make a huge difference in browser page load times. This screen shot depicts the flow and where the compression and decompression occurs: Solution: a. How to Enable HTTP Caching / Compression in Oracle HTTP Server (OHS) 11.1.1.x 1. To implement HTTP compression / caching, install and configure Oracle HTTP Server (OHS) 11.1.1.x for the bi_serverN Managed Servers (refer to "OBIEE Enterprise Deployment Guide for Oracle Business Intelligence" document for details). 2. On the OHS machine, open the file HTTP Server configuration file (httpd.conf) for editing. This file is located in the OHS installation directory.For example: ORACLE_HOME/Oracle_WT1/instances/instance1/config/OHS/ohs13. In httpd.conf file, verify that the following directives are included and not commented out: LoadModule expires_module "${ORACLE_HOME}/ohs/modules/mod_expires.soLoadModule deflate_module "${ORACLE_HOME}/ohs/modules/mod_deflate.so 4. Add the following lines in httpd.conf file below the directive LoadModule section and restart the OHS: Note: For the Windows platform, you will need to enclose any paths in double quotes ("), for example:Alias "/analytics ORACLE_HOME/bifoundation/web/app"<Directory "ORACLE_HOME/bifoundation/web/app"> Alias /analytics ORACLE_HOME/bifoundation/web/app#Pls replace the ORACLE_HOME with your actual BI ORACLE_HOME path<Directory ORACLE_HOME/bifoundation/web/app>#We don't generate proper cross server ETags so disable themFileETag noneSetOutputFilter DEFLATE# Don't compress imagesSetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary<FilesMatch "\.(gif|jpeg|png|js|x-javascript|javascript|css)$">#Enable future expiry of static filesExpiresActive onExpiresDefault "access plus 1 week" #1 week, this will stops the HTTP304 calls i.e. generated by IE 8Header set Cache-Control "max-age=604800"</FilesMatch>DirectoryIndex default.jsp</Directory>#Restrict access to WEB-INF<Location /analytics/WEB-INF>Order Allow,DenyDeny from all</Location> Note: Make sure you replace above placeholder "ORACLE_HOME" to your correct path for BI ORACLE_HOME.For example: my BI Oracle Home path is /Oracle/BIEE11g/Oracle_BI1/bifoundation/web/app Important Notes: Above caching rules restricted to static files found inside the /analytics directory(/web/app). This approach is safer instead of setting static file caching globally. In some customer environments you may not get 100% performance gains in IE 8.0 browser. So in that case you need to extend caching rules to other directories with static files content. If OHS is installed on separate dedicated machine, make sure static files in your BI ORACLE_HOME (../Oracle_BI1/bifoundation/web/app) is accessible to the OHS instance. The following screen shot summarizes the before and after results and improvements after enabling compression and caching:

Read the article

Big Data – Buzz Words: What is HDFS – Day 8 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is MapReduce. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – HDFS. What is HDFS ? HDFS stands for Hadoop Distributed File System and it is a primary storage system used by Hadoop. It provides high performance access to data across Hadoop clusters. It is usually deployed on low-cost commodity hardware. In commodity hardware deployment server failures are very common. Due to the same reason HDFS is built to have high fault tolerance. The data transfer rate between compute nodes in HDFS is very high, which leads to reduced risk of failure. HDFS creates smaller pieces of the big data and distributes it on different nodes. It also copies each smaller piece to multiple times on different nodes. Hence when any node with the data crashes the system is automatically able to use the data from a different node and continue the process. This is the key feature of the HDFS system. Architecture of HDFS The architecture of the HDFS is master/slave architecture. An HDFS cluster always consists of single NameNode. This single NameNode is a master server and it manages the file system as well regulates access to various files. In additional to NameNode there are multiple DataNodes. There is always one DataNode for each data server. In HDFS a big file is split into one or more blocks and those blocks are stored in a set of DataNodes. The primary task of the NameNode is to open, close or rename files and directory and regulate access to the file system, whereas the primary task of the DataNode is read and write to the file systems. DataNode is also responsible for the creation, deletion or replication of the data based on the instruction from NameNode. In reality, NameNode and DataNode are software designed to run on commodity machine build in Java language. Visual Representation of HDFS Architecture Let us understand how HDFS works with the help of the diagram. Client APP or HDFS Client connects to NameSpace as well as DataNode. Client App access to the DataNode is regulated by NameSpace Node. NameSpace Node allows Client App to connect to the DataNode based by allowing the connection to the DataNode directly. A big data file is divided into multiple data blocks (let us assume that those data chunks are A,B,C and D. Client App will later on write data blocks directly to the DataNode. Client App does not have to directly write to all the node. It just has to write to any one of the node and NameNode will decide on which other DataNode it will have to replicate the data. In our example Client App directly writes to DataNode 1 and detained 3. However, data chunks are automatically replicated to other nodes. All the information like in which DataNode which data block is placed is written back to NameNode. High Availability During Disaster Now as multiple DataNode have same data blocks in the case of any DataNode which faces the disaster, the entire process will continue as other DataNode will assume the role to serve the specific data block which was on the failed node. This system provides very high tolerance to disaster and provides high availability. If you notice there is only single NameNode in our architecture. If that node fails our entire Hadoop Application will stop performing as it is a single node where we store all the metadata. As this node is very critical, it is usually replicated on another clustered as well as on another data rack. Though, that replicated node is not operational in architecture, it has all the necessary data to perform the task of the NameNode in the case of the NameNode fails. The entire Hadoop architecture is built to function smoothly even there are node failures or hardware malfunction. It is built on the simple concept that data is so big it is impossible to have come up with a single piece of the hardware which can manage it properly. We need lots of commodity (cheap) hardware to manage our big data and hardware failure is part of the commodity servers. To reduce the impact of hardware failure Hadoop architecture is built to overcome the limitation of the non-functioning hardware. Tomorrow In tomorrow’s blog post we will discuss the importance of the relational database in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Search Results

Search found 80052 results on 3203 pages for 'data load performance'.

Page 11/3203 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by pinaldave

- by user365035

- by Jaanus

- by Arvind

- by pinaldave

- by pinaldave

- by Bart Read

- by Ahmed Awan

- by Paul White

- by Miro

- by Nanne

- by user1717079

- by Pinal Dave

- by user12608550

- by user702295

- by lukasz.romaszewski(at)oracle.com

- by user359650

- by J.J.

- by Ryan

- by user2481010

- by Calvin Sun

- by Nitin Garg

- by Ahmed A

- by Pinal Dave

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >