django queries - Page 156

SSAS DMVs: useful links

- by Davide Mauri

From time to time happens that I need to extract metadata informations from Analysis Services DMVS in order to quickly get an overview of the entire situation and/or drill down to detail level. As a memo I post the link I use most when need to get documentation on SSAS Objects Data DMVs: SSAS: Using DMV Queries to get Cube Metadata http://bennyaustin.wordpress.com/2011/03/01/ssas-dmv-queries-cube-metadata/ SSAS DMV (Dynamic Management View) http://dwbi1.wordpress.com/2010/01/01/ssas-dmv-dynamic-management-view/ Use Dynamic Management Views (DMVs) to Monitor Analysis Services http://msdn.microsoft.com/en-us/library/hh230820.aspx

Read the article

Advanced TSQL Tuning: Why Internals Knowledge Matters

- by Paul White

There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes. Query tuning is not complete as soon as the query returns results quickly in the development or test environments. In production, your query will compete for memory, CPU, locks, I/O and other resources on the server. Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data. In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row. The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type. A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL, CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL, CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table. The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results. This seems slow, considering that the table only has 50,000 rows. Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time. Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring. The script below monitors STATISTICS IO output and the amount of tempdb used by the test query. We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB. The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row. With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first). This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts. In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted. SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed. Sorts typically require a very large number of comparisons, so this is usually a very effective optimization. One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself. SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use. The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory. Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant. The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources. More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1). Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB. The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file. Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case. In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms. If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much. Why is the data size so much smaller? The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored. By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output. You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead. SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows. The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes. The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page. There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option. By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row. SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage. You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now. This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query). SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant. No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers. Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’. Why? Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed. SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages. This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this? Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need. 245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory. Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily? That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column. That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal. I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator. Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted. We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need. The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb. The small remaining inefficiency is in reading the id column values from the clustered primary key index. As a clustered index, it contains all the in-row data at its leaf. The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead. The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure. This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only. This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data. The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings. It isn’t about the internal differences between TEXT and MAX data types either. It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load. No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance. Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows. 5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is! If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb. Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough. Tuning for logical reads and adding covering indexes is not enough. If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on. The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008. They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator. Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

Read the article

MAXDOP in SQL Azure

- by Herve Roggero

In my search of better understanding the scalability options of SQL Azure I stumbled on an interesting aspect: Query Hints in SQL Azure. More specifically, the MAXDOP hint. A few years ago I did a lot of analysis on this query hint (see article on SQL Server Central: http://www.sqlservercentral.com/articles/Configuring/managingmaxdegreeofparallelism/1029/). Here is a quick synopsis of MAXDOP: It is a query hint you use when issuing a SQL statement that provides you control with how many processors SQL Server will use to execute the query. For complex queries with lots of I/O requirements, more CPUs can mean faster parallel searches. However the impact can be drastic on other running threads/processes. If your query takes all available processors at 100% for 5 minutes... guess what... nothing else works. The bottom line is that more is not always better. The use of MAXDOP is more art than science... and a whole lot of testing; it depends on two things: the underlying hardware architecture and the application design. So there isn't a magic number that will work for everyone... except 1... :) Let me explain. The rules of engagements are different. SQL Azure is about sharing. Yep... you are forced to nice with your neighbors. To achieve this goal SQL Azure sets the MAXDOP to 1 by default, and ignores the use of the MAXDOP hint altogether. That means that all you queries will use one and only one processor. It really isn't such a bad thing however. Keep in mind that in some of the largest SQL Server implementations MAXDOP is usually also set to 1. It is a well known configuration setting for large scale implementations. The reason is precisely to prevent rogue statements (like a SELECT * FROM HISTORY) from bringing down your systems (like a report that should have been running on a different in the first place) and to avoid the overhead generated by executing too many parallel queries that could cause internal memory management nightmares to the host Operating System. Is summary, forcing the MAXDOP to 1 in SQL Azure makes sense; it ensures that your database will continue to function normally even if one of the other tenants on the same server is running massive queries that would otherwise bring you down. Last but not least, keep in mind as well that when you test your database code for performance on-premise, make sure to set the DOP to 1 on your SQL Server databases to simulate SQL Azure conditions.

Read the article

Is inline SQL still classed as bad practice now that we have Micro ORMs?

- by Grofit

This is a bit of an open ended question but I wanted some opinions, as I grew up in a world where inline SQL scripts were the norm, then we were all made very aware of SQL injection based issues, and how fragile the sql was when doing string manipulations all over the place. Then came the dawn of the ORM where you were explaining the query to the ORM and letting it generate its own SQL, which in a lot of cases was not optimal but was safe and easy. Another good thing about ORMs or database abstraction layers were that the SQL was generated with its database engine in mind, so I could use Hibernate/Nhibernate with MSSQL, MYSQL and my code never changed it was just a configuration detail. Now fast forward to current day, where Micro ORMs seem to be winning over more developers I was wondering why we have seemingly taken a U-Turn on the whole in-line sql subject. I must admit I do like the idea of no ORM config files and being able to write my query in a more optimal manner but it feels like I am opening myself back up to the old vulnerabilities such as SQL injection and I am also tying myself to one database engine so if I want my software to support multiple database engines I would need to do some more string hackery which seems to then start to make code unreadable and more fragile. (Just before someone mentions it I know you can use parameter based arguments with most micro orms which offers protection in most cases from sql injection) So what are peoples opinions on this sort of thing? I am using Dapper as my Micro ORM in this instance and NHibernate as my regular ORM in this scenario, however most in each field are quite similar. What I term as inline sql is SQL strings within source code. There used to be design debates over SQL strings in source code detracting from the fundamental intent of the logic, which is why statically typed linq style queries became so popular its still just 1 language, but with lets say C# and Sql in one page you have 2 languages intermingled in your raw source code now. Just to clarify, the SQL injection is just one of the known issues with using sql strings, I already mention you can stop this from happening with parameter based queries, however I highlight other issues with having SQL queries ingrained in your source code, such as the lack of DB Vendor abstraction as well as losing any level of compile time error capturing on string based queries, these are all issues which we managed to side step with the dawn of ORMs with their higher level querying functionality, such as HQL or LINQ etc (not all of the issues but most of them). So I am less focused on the individual highlighted issues and more the bigger picture of is it now becoming more acceptable to have SQL strings directly in your source code again, as most Micro ORMs use this mechanism. Here is a similar question which has a few different view points, although is more about the inline sql without the micro orm context: http://stackoverflow.com/questions/5303746/is-inline-sql-hard-coding

Read the article

Creating Stored Procedures in MySQL Using HeidiSQL 4&#146;s Stored Routine Editor

In the "Write MySQL Queries Using HeidiSQL 4" article, we learned how to connect to a MySQL database and execute queries against it using the free HeidiSQL GUI client. Today, we'll learn how to create a stored procedure using HeidiSQL's Stored Procedure Editor.

Read the article

Fetching Data from Multiple Tables using Joins

Applying normalization to relational databases tends to promote better accuracy of queries, but it also leads to queries that take a little more work to develop, as the data may be spread amongst several tables. In today's article, we'll learn how to fetch data from multiple tables by using joins.

Read the article

SSAS DMVs: useful links

- by Davide Mauri

From time to time happens that I need to extract metadata informations from Analysis Services DMVS in order to quickly get an overview of the entire situation and/or drill down to detail level. As a memo I post the link I use most when need to get documentation on SSAS Objects Data DMVs: SSAS: Using DMV Queries to get Cube Metadata http://bennyaustin.wordpress.com/2011/03/01/ssas-dmv-queries-cube-metadata/ SSAS DMV (Dynamic Management View) http://dwbi1.wordpress.com/2010/01/01/ssas-dmv-dynamic-management-view/ Use Dynamic Management Views (DMVs) to Monitor Analysis Services http://msdn.microsoft.com/en-us/library/hh230820.aspx

Read the article

Designing Mobile SMS text advertising system

- by Ramraj Edagutti

Currently, I am working on a product where we have an SMS text advertising system, and using this, we setup advertising campaigns for clients, and later these campaigns are sent to the end users. This is very similar to Google Adwords, but targeted to Mobile users via SMS. Just to give an overview of the system Each Campaign is mapped to an advertiser Campaign has start date and end date Campaign has a filter condition(s) or query to select the target user base from our database (to whom we send Campaigns) Target user base can be fixed, for e.g send campaign to 10000 users Target user base can also be dynamic based on query condition, for e.g send campaign to users who are active and from a particular state, district, town etc. (this way user base will be keep changing on daily basis) Campaign can have multiple campaign messages Each campaign message has start date and end date Each campaign message can have multiple message texts for different locales, for e.g English,Hindi,Telugu etc After creating an advertisement campaign, we run daily night job to provision the target user base for that a particular campaign in a separate table, and another daily job runs on morning times and checks provisioned table for campaigns and targeted users and sends the campaign to users via SMS. Problem is, current UI for creating advertising campaigns is designed in a very technical manner, I mean, normal user or business owner or clients can not use the UI to create a campaign. Below are reasons why the UI is very technical in nature Filter condition(s) or query input filed, takes user ids or mobile numbers or SQL queries. Most of times or almost every time, we use big SQL queries So we end up storing SQL queries in a database for a campaign, later we use this SQL query to fetch targeted user base. For scheduling these campaigns, we have input filed on UI which takes quartz cron expression(s) ( for e.g. send campaign on "0 0 9 1-10 MAR 2012" ), again very technical in nature Normal user or business owner, can not use the UI for creating campaigns for reasons mentioned above, Currently, we ourself (developers) helping clients to setup/create campaigns. we are trying to re-design the UI to make it more user friendly so that any user can go to UI and create an advertisement campaign by himself. I am thinking of re-designing the current UI similar to Google Adwords interface, especially for selecting target users based on user geography like country, state, city etc. I also need to select users based user subscription(s), which might make system even more complex. And also, for campaign scheduling, I am thinking of using weekdays with hours. For example, I will shows Monday to Sunday on UI, and user can select the from hours, to hours etc. Any better ideas or suggestion on how to design UI in very user friendly manner and what design should be followed on server side code (we write backend code on java/jpa/spring/quartz)? And I am looking for ideas or design patterns on how to build SQL queries (using JPA/Hinernate) programmatically on server side, based on varies conditions like based on country, state, town, village, and user subscriptions.

Read the article

Proper Method name for XML builder

- by Wesley

I think this is the right stack for this. I have a helper class which builds CAML queries (SharePoint XML for getting list items from SQL) There is one method that is flexibly used to build the queries that get all related votes and comments for a social item. I don't want to call it BuildVoteorCommentXML or something long winded like that. Is there a good naming convention for getting all Join/Foreign Key objects from a core object?

Read the article

Google Analytics show zero for "Search Engine Optimizations" graph

- by Saeed Neamati

In Google Analytics new design, there is an area related to the queries and impressions related to your site. You can get there by following Traffic Sources = Search Engine Optimization = Queries. However, it now shows zero for the "Site Usage" graph, at the top section, while other areas of Google Analytics definitely show that site has visitors and has been used. No matter how much I search, I can't find the source of the problem. Does anyone know where the problem might be?

Read the article

Is MongoDB a good choice or not for my application?

- by shubham

I have a Reporting application which stores the reports in xml format as recieved from source (XML schema is not defined, it can be any format) and those reports contain some keys and values. Like jobid, setid be keys for 1 type of report and userid, groupId for another type of report etc. The type of keys that can be referred from the document is determined by the namespaces used in the xml doc. These keys are stored on the basis of namespace used in the xml document. For e.g. If a tag in xml fragment uses namespace= "myspace1", then I have keys A and B for myspace1 stored in another table. It will fetch those keys from that table for this namespace, look for their values in xml doc and store it in another table along with the pointer to this xml document (Id of a record storing complete xml document in a cell). Use cases: When the user comes and queries for that key and value, I return the document or a set of documents that are having those key/value pairs. When the user comes and queries for a certain key and provide a name for xslt (pre stored), I fetch the set of documents fulfilling that criteria and convert that xml to html with the specified xslt. When the user comes and asks for a particular fragment of a doc then it can fetch a subset from a particular document also. When the user comes and queries for top x values of a certain key, I return the set of documents that are having top 10 values of that key. I am using DB2 database for its support of xml along with relational capabilities. That makes easier for me to run xpath expressions and fetch values of keys and also aggregate a set of documents fullfilling a criteria, all on the database side. Problems: DB2 stores XML doc of upto 2GB in size. Retrieval is very slow. If some thing involves many documents, then it takes significant time for things to show up in browser, and the user has to wait. Can MongoDb help in this case, as it is document oriented? can I do xml related xpath queries and document transformations on db side? Or is it ok to use both in such a case?

Read the article

Is there any free host which supports php and mySQL in utf-8? [closed]

- by Maria Konnou

Possible Duplicate: How to find web hosting that meets my requirements? Is there any free host which supports php and mySQL queries in utf-8? I've already tried to use x10hosting and 000webhosting, but they don't support utf8 mysql queries (got mojibake). The default encoding of mysql in both sites is latin-1, and you're not able to change that. Is there any other free host that fully supports utf-8?

Read the article

Profiling Database Activity in the Entity Framework

It’s important to profile your database queries to see what happens in response to Entity Framework queries and other data access activities, says Julie Lerman, who gives you the details on several profiling options to improve you coding. Join SQL Backup’s 35,000+ customers to compress and strengthen your backups "SQL Backup will be a REAL boost to any DBA lucky enough to use it." Jonathan Allen. Download a free trial now.

Read the article

Read Committed Snapshot Isolation– Two Considerations

- by GavinPayneUK

The Read Committed Snapshot database option in SQL Server, known perhaps more accurately as Read Committed Snapshot Isolation or RCSI, can be enabled to help readers from blocking writers and writers from blocking readers. However, enabling it can cause two issues with the tempdb database which are often overlooked. One can slow down queries, the other can cause queries to fail . Overview of RCSI Enabling the option changes the behaviour of the default SQL Server isolation level, read...(read more)

Read the article

Should I have a separate method for Update(), Insert(), etc., or have a generic Query() that would be able to handle all of these?

- by Prayos

I'm currently trying to write a class library for a connection to a database. Looking over it, there are several different types of queries: Select From, Update, Insert, etc. My question is, what is the best practice for writing these queries in a C# application? Should I have a separate method for each of them(i.e. Update(), Insert()), or have a generic Query() that would be able to handle all of these? Thanks for any and all help!

Read the article

Improving 2D Range Query Performance in SQL Server

When using the BETWEEN operator on multiple columns, you are likely using a 2D range query. Such queries perform very poorly in SQL Server. This article examines rewriting these queries for improved performance. Join SQL Backup’s 35,000+ customers to compress and strengthen your backups "SQL Backup will be a REAL boost to any DBA lucky enough to use it." Jonathan Allen. Download a free trial now.

Read the article

Reporting what's not there

It's easy to write queries that will show data in the database that matches a criteria. However, if no data in the database matches the criteria, it becomes more difficult. This article examines two different scenarios where it's necessary to create data in order to be able to report zero values in queries.

Read the article

Getting Started With XML Indexes

XML Indexes make a huge difference to the speed of XML queries, as Seth Delconte explains; and demonstrates by running queries against half a million XML employee records. The execution time of a query is reduced from two seconds to being too quick to measure, purely by creating the right type of secondary index for the query. Schedule Azure backupsRed Gate’s Cloud Services makes it simple to create and schedule backups of your SQL Azure databases to Azure blob storage or Amazon S3. Try it for free today.

Read the article

11g ???:Active Data Guard

- by JaneZhang(???)

?Oracle 11g??,????(physical Standby)???redo???,???????,???mount??11g??,???redo???,????????read-only??,????Active Data Guard ???Active Data Guard,?????????????????,?????????????? Active Data Guard???????????,??,????????????,????????,????redo??,????????????,??????????? Oracle Active Data Guard ?Oracle Database Enterprise Edition?????,?????????????? ????Active Data Guard, ??????? read-only ????,???? ALTER DATABASE RECOVER MANAGED STANDBY DATABASE????????????:??????COMPATIBLE ????????11.0.0? ???????Active Data Guard,???V$DATABASE????"READ ONLY WITH APPLY': SQL> SELECT open_mode FROM V$DATABASE; OPEN_MODE -------------------- READ ONLY WITH APPLY ????????????,???????real-time apply: SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE; ?????????read-only????????: • Issue SELECT statements, including queries that require multiple sorts that leverage TEMP segments • Use ALTER SESSION and ALTER SYSTEM statements • Use SET ROLE • Call stored procedures • Use database links (dblinks) to write to remote databases • Use stored procedures to call remote procedures via dblinks • Use SET TRANSACTION READ ONLY for transaction level read consistency • Issue complex queries (such as grouping SET queries and WITH CLAUSE queries) ??????????read-only????????: • Any DMLs (excluding simple SELECT statements) or DDLs • Query accessing local sequences • DMLs to local temporary tables ?????Active Data Guard ??: • ????????????????? • ???Oracle Real Application Clusters (Oracle RAC) ,?????? • RAC???RAC?? Oracle Data Guard ?????,,????????: * ?????????????????: http://docs.oracle.com/cd/B28359_01/server.111/b28294/create_ps.htm * ???Oracle Real Application Clusters (Oracle RAC) ,??????: http://www.oracle.com/technetwork/database/features/availability/maa-wp-10g-racprimarysingleinstance-131970.pdf * RAC ???RAC ??: http://www.oracle.com/technetwork/database/features/availability/maa-wp-10g-racprimaryracphysicalsta-131940.pdf ??Active Data Guard???????,?????: http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr1-activedataguard-1-128199.pdf ??Oracle Maximum Availability Architecture Best Practices?????,???: http://www.oracle.com/goto/maa

Read the article

Apache console accesses network drives, service does not?

- by danspants

I have an apache 2.2 server running Django. We have a network drive T: which we need constant access to within our Django app. When running Apache as a service, we cannot access this drive, as far as any django code is concerned the drive does not exist. If I add... <Directory "t:/"> Options Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all </Directory> to the httpd.conf file the service no longer runs, but I can start apache as a console and it works fine, Django can find the network drive and all is well. Why is there a difference between the console and the service? Should there be a difference? I have the service using my own log on so in theory it should have the same access as I do. I'm keen to keep it running as a service as it's far less obtrusive when I'm working on the server (unless there's a way to hide the console?). Any help would be most appreciated.

Read the article

Network connection to Firebird 2.1 became slow after upgrading to Ubuntu 10.04

- by lyle

We've got a setup that we're using for different clients : a program connecting to a Firebird server on a local network. So far we mostly used 32bit processors running Ubuntu LTS (recently upgraded to 10.04). Now we introduced servers running on 64bit processors, running Ubuntu 10.04 64bit. Suddenly some queries run slower than they used to. In short: running the query locally works fine on both 64bit and 32bit servers, but when running the same queries over the network the 64bit server is suddenly much slower. We did a few checks with both local and remote connections to both 64bit and 32bit servers, using identical databases and identical queries, running in Flamerobin. Running the query locally takes a negligible amount of time: 0.008s on the 64bit server, 0.014s on the 32bit servers. So the servers themselves are running fine. Running the queries over the network, the 64bit server suddenly needs up to 0.160s to respond, while the 32bit server responds in 0.055s. So the older servers are twice as fast over the network, in spite of the newer servers being twice as fast if run locally. Apart from that the setup is identical. All servers are running the same installation of Ubuntu 10.04, same version of Firebird and so on, the only difference is that some are 64 and some 32bit. Any idea?? I tried to google it, but I couldn't find any complains that Firebird 64bit is slower than Firebird 32bit, except that the Firebird 2.1 change log mentions that there's a new network API which is twice as fast, as soon as the drivers are updated to use it. So I could imagine that the 64bit driver is still using the old API, but that's a bit of a stretch, I guess. Thanx in advance for any replies! :)

Read the article

Computer specs for a large database

- by SpeksETC

What sort of computer specs (CPU, RAM, disk speed) should I use for running queries on a database of 200+ million records? The queries are for a research project, so there is only one "user" and only one query will be running at a time. I tried it on my own laptop with SQL Server with an i3 processor, 2GB RAM, 5400 RPM disk and a simple query didn't finish even after 8+ hours. I have an option to connect a SSD via eSata and upgrade to 4GB RAM, but not sure if this will be enough... Thanks! Edit: The database is about 25 GB and the indexes are not setup properly. When I tried to add an index, I let it run for about 8 hours and it still hadn't finished so I gave up. Should I have more patience :)? In general, the queries will run once in a while and its ok even if it takes a couple hours to complete.... Also, the queries will produce probably about 10 million records which I need to process using Stata/Matlab and I'm concerned that my current laptop is not strong enough, but unsure of the bottleneck....

Read the article

Logging hurts MySQL performance - but, why?

- by jimbo

I'm quite surprised that I can't see an answer to this anywhere on the site already, nor in the MySQL documentation (section 5.2 seems to have logging otherwise well covered!) If I enable binlogs, I see a small performance hit (subjectively), which is to be expected with a little extra IO -- but when I enable a general query log, I see an enormous performance hit (double the time to run queries, or worse), way in excess of what I see with binlogs. Of course I'm now logging every SELECT as well as every UPDATE/INSERT, but, other daemons record their every request (Apache, Exim) without grinding to a halt. Am I just seeing the effects of being close to a performance "tipping point" when it comes to IO, or is there something fundamentally difficult about logging queries that causes this to happen? I'd love to be able to log all queries to make development easier, but I can't justify the kind of hardware it feels like we'd need to get performance back up with general query logging on. I do, of course, log slow queries, and there's negligible improvement in general usage if I disable this. (All of this is on Ubuntu 10.04 LTS, MySQLd 5.1.49, but research suggests this is a fairly universal issue)

Read the article

High Load mysql on Debian server stops every day. Why?

- by Oleg Abrazhaev

I have Debian server with 32 gb memory. And there is apache2, memcached and nginx on this server. Memory load always on maximum. Only 500m free. Most memory leak do MySql. Apache only 70 clients configured, other services small memory usage. When mysql use all memory it stops. And nothing works, need mysql reboot. Mysql configured use maximum 24 gb memory. I have hight weight InnoDB bases. (400000 rows, 30 gb). And on server multithread daemon, that makes many inserts in this tables, thats why InnoDB. There is my mysql config. [mysqld] # # * Basic Settings # default-time-zone = "+04:00" user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking default-time-zone='Europe/Moscow' # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. # # * Fine Tuning # #low_priority_updates = 1 concurrent_insert = ALWAYS wait_timeout = 600 interactive_timeout = 600 #normal key_buffer_size = 2024M #key_buffer_size = 1512M #70% hot cache key_cache_division_limit= 70 #16-32 max_allowed_packet = 32M #1-16M thread_stack = 8M #40-50 thread_cache_size = 50 #orderby groupby sort sort_buffer_size = 64M #same myisam_sort_buffer_size = 400M #temp table creates when group_by tmp_table_size = 3000M #tables in memory max_heap_table_size = 3000M #on disk open_files_limit = 10000 table_cache = 10000 join_buffer_size = 5M # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #myisam_use_mmap = 1 max_connections = 200 thread_concurrency = 8 # # * Query Cache Configuration # #more ignored query_cache_limit = 50M query_cache_size = 210M #on query cache query_cache_type = 1 # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log server-id = 1 log-bin = /var/lib/mysql/mysql-bin #replicate-do-db = gate log-bin-index = /var/lib/mysql/mysql-bin.index log-error = /var/lib/mysql/mysql-bin.err relay-log = /var/lib/mysql/relay-bin relay-log-info-file = /var/lib/mysql/relay-bin.info relay-log-index = /var/lib/mysql/relay-bin.index binlog_do_db = 24avia expire_logs_days = 10 max_binlog_size = 100M read_buffer_size = 4024288 innodb_buffer_pool_size = 5000M innodb_flush_log_at_trx_commit = 2 innodb_thread_concurrency = 8 table_definition_cache = 2000 group_concat_max_len = 16M #binlog_do_db = gate #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. #skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 500M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 32M key_buffer_size = 512M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/ Please, help me make it stable. Memory used /etc/mysql # free total used free shared buffers cached Mem: 32930800 32766424 164376 0 139208 23829196 -/+ buffers/cache: 8798020 24132780 Swap: 33553328 44660 33508668 Maybe my problem not in memory, but MySQL stops every day. As you can see, cache memory free 24 gb. Thank to Michael Hampton? for correction. Load overage on server 3.5. Maybe hdd or another problem? Maybe my config not optimal for 30gb InnoDB ? I'm already try mysqltuner and tunung-primer.sh , but they marked all green. Mysqltuner output mysqltuner >> MySQLTuner 1.0.1 - Major Hayden <[email protected]> >> Bug reports, feature requests, and downloads at http://mysqltuner.com/ >> Run with '--help' for additional options and output filtering -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.5.24-9-log [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 112G (Tables: 1528) [--] Data in InnoDB tables: 39G (Tables: 340) [--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17) [!!] Total fragmented tables: 344 -------- Performance Metrics ------------------------------------------------- [--] Up for: 8h 18m 33s (14M q [478.333 qps], 259K conn, TX: 9B, RX: 5B) [--] Reads / Writes: 84% / 16% [--] Total buffers: 10.5G global + 81.1M per thread (200 max threads) [OK] Maximum possible memory usage: 26.3G (83% of installed RAM) [OK] Slow queries: 1% (259K/14M) [!!] Highest connection usage: 100% (201/200) [OK] Key buffer size / total MyISAM indexes: 1.5G/5.6G [OK] Key buffer hit rate: 100.0% (6B cached / 1M reads) [OK] Query cache efficiency: 74.3% (8M cached / 11M selects) [OK] Query cache prunes per day: 0 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 247K sorts) [!!] Joins performed without indexes: 106025 [!!] Temporary tables created on disk: 49% (351K on disk / 715K total) [OK] Thread cache hit rate: 99% (249 created / 259K connections) [!!] Table cache hit rate: 15% (2K open / 13K opened) [OK] Open file limit used: 15% (3K/20K) [OK] Table locks acquired immediately: 99% (4M immediate / 4M locks) [!!] InnoDB data size / buffer pool: 39.4G/5.9G -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Reduce or eliminate persistent connections to reduce connection usage Adjust your join queries to always utilize indexes Temporary table size is already large - reduce result set size Reduce your SELECT DISTINCT queries without LIMIT clauses Increase table_cache gradually to avoid file descriptor limits Variables to adjust: max_connections (> 200) wait_timeout (< 600) interactive_timeout (< 600) join_buffer_size (> 5.0M, or always use indexes with joins) table_cache (> 10000) innodb_buffer_pool_size (>= 39G) Mysql primer output -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.5.24-9-log x86_64 Uptime = 0 days 8 hrs 20 min 50 sec Avg. qps = 478 Total Questions = 14369568 Threads Connected = 16 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1.000000 sec. You have 260626 out of 14369701 that take longer than 1.000000 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is enabled Binlog sync is not enabled, you could loose binlog records during a server crash WORKER THREADS Current thread_cache_size = 50 Current threads_cached = 45 Current threads_per_sec = 0 Historic threads_per_sec = 0 Your thread_cache_size is fine MAX CONNECTIONS Current max_connections = 200 Current threads_connected = 11 Historic max_used_connections = 201 The number of used connections is 100% of the configured maximum. You should raise max_connections INNODB STATUS Current InnoDB index space = 214 M Current InnoDB data space = 39.40 G Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 5.85 G Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 23.46 G Configured Max Per-thread Buffers : 15.84 G Configured Max Global Buffers : 7.54 G Configured Max Memory Limit : 23.39 G Physical Memory : 31.40 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 5.61 G Current key_buffer_size = 1.47 G Key cache miss rate is 1 : 5578 Key buffer free ratio = 77 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is enabled Current query_cache_size = 200 M Current query_cache_used = 101 M Current query_cache_limit = 50 M Current Query cache Memory fill ratio = 50.59 % Current query_cache_min_res_unit = 4 K MySQL won't cache query results that are larger than query_cache_limit in size SORT OPERATIONS Current sort_buffer_size = 64 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 5.00 M You have had 106606 queries where a join could not use an index properly You have had 8 joins without keys that check for key usage after each row join_buffer_size >= 4 M This is not advised You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. OPEN FILES LIMIT Current open_files_limit = 20210 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_open_cache = 10000 tables Current table_definition_cache = 2000 tables You have a total of 1910 tables You have 2151 open tables. The table_cache value seems to be fine TEMP TABLES Current max_heap_table_size = 2.92 G Current tmp_table_size = 2.92 G Of 366426 temp tables, 49% were created on disk Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables Note! BLOB and TEXT columns are not allow in memory tables. If you are using these columns raising these values might not impact your ratio of on disk temp tables. TABLE SCANS Current read_buffer_size = 3 M Current table scan ratio = 2846 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 185 You may benefit from selective use of InnoDB. If you have long running SELECT's against MyISAM tables and perform frequent updates consider setting 'low_priority_updates=1'

Read the article

Replicated MongoDB server slower than simple shards

- by displayName

I tried to compare the performance of a sharded configuration against a sharded and replicated configuration. The sharded configuration consists of 8 shards each running on three different machines thereby constituting a total of 24 shards. All 8 of these shards run in the same partition on each machine. The sharded and replicated version is 8 shards again just like plain sharding, and all 8 mongods run on the same partition in each machine. But apart from this, each of these three machine now run additional 16 threads on another partition which serve as the secondary for the 8 mongods running on other machines. This is the way I prepared a sharded and replicated configuration with data chunks having replication factor of 3. Important point to note is that once the data has been loaded, it is not modified. So after primary and secondaries have synchronized then it doesn't matter which one i read from. To run the queries, I use an entirely different machine (let's call it config) which runs mongos and this machine's only purpose is to receive queries and run them on the cluster. Contrary to my expectations, plain sharding of 8 threads on each machine (total = 3 * 8 = 24) is performing better for queries than the sharded + replicated configuration. I have a script written to perform the query. So in order to time the scripts, I use time ./testScript and see the result. I tried changing the reading preference for replicated cluster by logging to mongo of config and run db.getMongo().setReadPref('secondary') and then exit the shell and run the queries like time ./testScript. The questions are: Where am i going wrong in the replication? Why is it slower than its plain sharding version? Does the db.getMongo().ReadPref('secondary') persist when i leave the shell and try to perform the query? All the four machines are running Linux and i have already increased the ulimit -n to 2048 from initial value of 1024 to allow more connections. The collections are properly distributed and all the mongods have equal number of chunks. Goes without saying that indices in both configurations are the same.

Search Results

Search found 8161 results on 327 pages for 'django queries'.

Page 156/327 | < Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >

- by Davide Mauri

- by Paul White

- by Herve Roggero

- by Grofit

- by Davide Mauri

- by Ramraj Edagutti

- by Wesley

- by Saeed Neamati

- by shubham

- by Maria Konnou

- by GavinPayneUK

- by Prayos

- by JaneZhang(???)

- by danspants

- by lyle

- by SpeksETC

- by jimbo

- by Oleg Abrazhaev

- by displayName

< Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >