Search Results

Search found 40567 results on 1623 pages for 'database performance'.

Page 74/1623 | < Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81  | Next Page >

  • Performance boost for MacBook: Hybrid hard drive or 4GB RAM?

    - by user13572
    I have an aluminium 13" MacBook with 2GB or RAM and 5400RPM 500GB hard drive. The main tasks I perform are developing iPhone and Mac apps in Xcode and websites in Coda. I want to improve the performance so I am considering buying 4GB of RAM or a 500GB Seagate solid-state hybrid drive. What is likely to provide the biggest performance boost?

    Read the article

  • Does chunk size affect the read performance of a Linux md software RAID1 array?

    - by OldWolf
    This came up in relation to this question on determining chunk size of an existing RAID array. The general consensus seems to be that chunk size does not apply to RAID1 as it is not striped. On the other hand, the Linux RAID Wiki claims that it will have an affect on read performance. However, I cannot find any benchmarks testing/proving that. Can anyone point to conclusive documentation that it either does or does not affect read performance?

    Read the article

  • Performance boast for MacBook: Hybrid hard drive or 4GB RAM?

    - by user13572
    I have an aluminium 13" MacBook with 2GB or RAM and 5400RPM 500GB hard drive. The main tasks I perform are developing iPhone and Mac apps in Xcode and websites in Coda. I want to improve the performance so I am considering buying 4GB of RAM or a 500GB Seagate solid-state hybrid drive. What is likely to provide the biggest performance boast?

    Read the article

  • Custom Database integration with MOSS 2007

    - by Bob
    Hopefully someone has been down this road before and can offer some sound advice as far as which direction I should take. I am currently involved in a project in which we will be utilizing a custom database to store data extracted from excel files based on pre-established templates (to maintain consistency). We currently have a process (written in C#.Net 2008) that can extract the necessary data from the spreadsheets and import it into our custom database. What I am primarily interested in is figuring out the best method for integrating that process with our portal. What I would like to do is let SharePoint keep track of the metadata about the spreadsheet itself and let the custom database keep track of the data contained within the spreadsheet. So, one thing I need is a way to link spreadsheets from SharePoint to the custom database and vice versa. As these spreadsheets will be updated periodically, I need tried and true way of ensuring that the data remains synchronized between SharePoint and the custom database. I am also interested in finding out how to use the data from the custom database to create reports within the SharePoint portal. Any and all information will be greatly appreciated.

    Read the article

  • IList<Item> Collection Class accessing database

    - by Mike
    Hi, I have a database with Users. Users have Items. These Items can change actively. How do you access the items in a collection type format? For the user, I fill all the user properties at the time of instantiation. If I load the user's items at the time of the instantiation, and the items change, they will have old data. I was thinking, maybe I need an ItemCollection class and have that a field/property apart of the user class, that way to traverse all the user's items I could use a foreach loop. So, my question is, what is the best practice/best way of accessing the items from a database using some sort of collection? On accessing the particular Item, it needs to get the latest database information, and when the user does do a foreach loop, the latest item information must be available. I.e. What I'm trying to do Console.WriteLine(User.Items[3].ID); returns 5. //this updates the item information and saves it to the database. User.Items[3].ID = 13; //Add a new item to the database. User.Items.Add(new Item { id = 17}); foreach (Item item in User.Items) { //this would traverse all items in the database. //not some cached copy at the time of instantiation of the user. }

    Read the article

  • Connecting to database on web host in Visual Studio

    - by Anders Svensson
    I have a web site developed locally with a local Sql Server database. I also have a web host that provides one Sql Server database for my site. Now I want to deploy the application, and I would like to be able to manage the remote database from the Server Explorer in Visual Studio. I have the connection string used in the application, which works fine for adding, say, a datasource to a control etc. But I don't know if there's any way to use it to connect the database inside the Server Explorer so that I can add tables etc. I have read that you're supposed to be able to this instead of using the Sql Server Management Studio, but I have'nt read anything about how to connect to the remote database in it. What I have tried so far is this: I have selected Add database in Server Explorer. This brings up first a dialog where I choose Sql Server. And then I get a dialog where I can set Server name (which I tried using the ip address in the connection string below), and Authentication (where I chose Sql Server Authentication, with the user id and password from below). But when I test the connection it fails. Here's the connection string, which works fine when used for datasources in the application (obviously with different user name and password): Any help appreciated!

    Read the article

  • How to choose light version of database system

    - by adopilot
    I am starting one POS (Point of sale) project. Targeting system is going to be written in C# .NET 2 WinForms and as main database server We are going to use MS-SQL Server. As we have a lot of POS devices in chain for one store I will love to have backend local data base system on each POS device. Scenario are following: When main server goes down!! POS application should continue working "off-line" with local database, until connection to main server come up again. Now I am in dilemma which local database is going to be most adoptable for me. Here is some notes for helping me point me in right direction: To be Light "My POS devices art usually old and suffering with performances" To be Free "I have a lot of devices and I do not wont additional cost beside main SQL serer" One day Ill love to try all that port on Mono and Linux OS. Here is what I've researched so far: Simple XML "Light but I am afraid of performance, My main table of items is average of 10K records" SQL-Express "I am afraid that my POS devices is poor with hardware for SQLExpress, and also hard to install on each device and configure" Less known Advantage Database Server have free distribution of offline ADT system. DBF with extended Library,"Respect for good old DBFs but that era is behind Me with clipper and DBFs" MS Access Sqlite "Mostly like for now, but I am afraid how it is going to pair with MS SQL do they have same Data types". I know that in this SO is a lot of subjective data, but at least can someone recommended some others lite database system, or things that I shod most take attention before I choice database.

    Read the article

  • Looking for combinations of server and embedded database engines

    - by codeelegance
    I'm redesigning an application that will be run as both a single user and multiuser application. It is a .NET 2.0 application. I'm looking for server and embedded databases that work well together. I want to deploy the embedded database in the single user setup and of course, the server in the multiuser setup. Past releases have been based on MSDE but in the past year we've been having a lot of install issues: new installs hanging and leaving the system in an unknown state, upgrades disconnecting the database, etc. I migrated the application to SQL Server 2005 and the install is more reliable (as long as a user doesn't try to install over a broken MSDE installation). Since next year's release will be a complete redesign I figured now's the best time to address the database issue as well. The database has been abstracted from the rest of the application so I just need to choose which database(s) to use and write an implementation for each one. So far I've considered: SQL Server/ SQL Server Compact Edition Firebird (same DB engine is available in two different server modes and an embedded dll) Each has its own merits but I'm also interested in any other suggestions. This is a fairly simple program and its data requirements are simple as well. I don't expect it to strain whatever database I eventually choose. So easy configuration and deployment hold more weight than performance.

    Read the article

  • connecting to secure database from website host

    - by jim
    Hello all, I've got a requirement to both read and write data via a .net webservice to a sqlserver database that's on a private network. this database is currently accessed via a vpn connection by remote client software (on standard desktop machines) to get latest product prices and to upload product stock sales. I've been tasked with finding a way to centralise this access from a webservice that the clients then access, rather than them using the vpn route to connect directly to the database. My question is related to my .net service's relationship to the sqlserver database. What are the options for connecting to a private network vpn from a domain host in order to achive the functionality of allowing the webservice to both read and write data to the database. For now, I'm not too concerned about the client connectivity and security (tho i appreciate that this will have to be worked out too), I'm really just interested in discovering the options available in order to allow my .net webservice to connect to the private network in as painless and transparent a way as posible. The option of switching the database onto public hosting is not an option, so I have to work with the sdcenario as described above for now, unless there's a compelling rationale presented to do otherwise. thanks all... jim

    Read the article

  • connecting to secure database on private network from website host

    - by jim
    Hello all, I've got a requirement to both read and write data via a .net webservice to a sqlserver database that's on a private network. this database is currently accessed via a vpn connection by remote client software (on standard desktop machines) to get latest product prices and to upload product stock sales. I've been tasked with finding a way to centralise this access from a webservice that the clients then access, rather than them using the vpn route to connect directly to the database. My question is related to my .net service's relationship to the sqlserver database. What are the options for connecting to a private network vpn from a domain host in order to achive the functionality of allowing the webservice to both read and write data to the database. For now, I'm not too concerned about the client connectivity and security (tho i appreciate that this will have to be worked out too), I'm really just interested in discovering the options available in order to allow my .net webservice to connect to the private network in as painless and transparent a way as posible. [edit] the webservice will also be available to the retail website in order for it to lookup product info as well as allocate stock transfers to the same sqlserver db. it will therefore be located on the same domain as the retail site The option of switching the database onto public hosting is not feasible, so I have to work with the scenario as described above for now, unless there's a compelling rationale presented to do otherwise. thanks all... jim

    Read the article

  • Is it better to use a relational database or document-based database for an app like Wufoo?

    - by mboyle
    I'm working on an application that's similar to Wufoo in that it allows our users to create their own databases and collect/present records with auto generated forms and views. Since every user is creating a different schema (one user might have a database of their baseball card collection, another might have a database of their recipes) our current approach is using MySQL to create separate databases for every user with its own tables. So in other words, the databases our MySQL server contains look like: main-web-app-db (our web app containing tables for users account info, billing, etc) user_1_db (baseball_cards_table) user_2_db (recipes_table) .... And so on. If a user wants to set up a new database to keep track of their DVD collection, we'd do a "create database ..." with "create table ...". If they enter some data in and then decide they want to change a column we'd do an "alter table ....". Now, the further along I get with building this out the more it seems like MySQL is poorly suited to handling this. 1) My first concern is that switching databases every request, first to our main app's database for authentication etc, and then to the user's personal database, is going to be inefficient. 2) The second concern I have is that there's going to be a limit to the number of databases a single MySQL server can host. Pretending for a moment this application had 500,000 user databases, is MySQL designed to operate this way? What if it were a million, or more? 3) Lastly, is this method going to be a nightmare to support and scale? I've never heard of MySQL being used in this way so I do worry about how this affects things like replication and other methods of scaling. To me, it seems like MySQL wasn't built to be used in this way but what do I know. I've been looking at document-based databases like MongoDB, CouchDB, and Redis as alternatives because it seems like a schema-less approach to this particular problem makes a lot of sense. Can anyone offer some advice on this?

    Read the article

  • Database choices

    - by flobadob
    I have a prickly design issue regarding the choice of database technologies to use for a group of new applications. The final suite of applications would have the following database requirements... Central databases (more than one database) using mysql (myst be mysql due to justhost.com). An application to be written which accesses the multiple mysql databases on the web host. This application will also write to local serverless database (sqlite/firebird/vistadb/whatever). Different flavors of this application will be created for windows (.NET), windows mobile, android if possible, iphone if possible. So, the design task is to minimise the quantity of code to achieve this. This is going to be tricky since the languages used are already c# / java (android) and objc (iphone). Not too worried about that, but can the work required to implement the various database access layers be minimised? The serverless database will hold similar data to the mysql server, so some kind of inheritance in the DAL would be useful. Looking at hibernate/nhibernate and there is linq to whatever. So many choices!

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • Fragmented Log files could be slowing down your database

    - by Fatherjack
    Something that is sometimes forgotten by a lot of DBAs is the fact that database log files get fragmented in the same way that you get fragmentation in a data file. The cause is very different but the effect is the same – too much effort reading and writing data. Data files get fragmented as data is changed through normal system activity, INSERTs, UPDATEs and DELETEs cause fragmentation and most experienced DBAs are monitoring their indexes for fragmentation and dealing with it accordingly. However, you don’t hear about so many working on their log files. How can a log file get fragmented? I’m glad you asked. When you create a database there are at least two files created on the disk storage; an mdf for the data and an ldf for the log file (you can also have ndf files for extra data storage but that’s off topic for now). It is wholly possible to have more than one log file but in most cases there is little point in creating more than one as the log file is written to in a ‘wrap-around’ method (more on that later). When a log file is created at the time that a database is created the file is actually sub divided into a number of virtual log files (VLFs). The number and size of these VLFs depends on the size chosen for the log file. VLFs are also created in the space added to a log file when a log file growth event takes place. Do you have your log files set to auto grow? Then you have potentially been introducing many VLFs into your log file. Let’s get to see how many VLFs we have in a brand new database. USE master GO CREATE DATABASE VLF_Test ON ( NAME = VLF_Test, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test.mdf', SIZE = 100, MAXSIZE = 500, FILEGROWTH = 50 ) LOG ON ( NAME = VLF_Test_Log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test_log.ldf', SIZE = 5MB, MAXSIZE = 250MB, FILEGROWTH = 5MB ); go USE VLF_Test go DBCC LOGINFO; The results of this are firstly a new database is created with specified files sizes and the the DBCC LOGINFO results are returned to the script editor. The DBCC LOGINFO results have plenty of interesting information in them but lets first note there are 4 rows of information, this relates to the fact that 4 VLFs have been created in the log file. The values in the FileSize column are the sizes of each VLF in bytes, you will see that the last one to be created is slightly larger than the others. So, a 5MB log file has 4 VLFs of roughly 1.25 MB. Lets alter the CREATE DATABASE script to create a log file that’s a bit bigger and see what happens. Alter the code above so that the log file details are replaced by LOG ON ( NAME = VLF_Test_Log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test_log.ldf', SIZE = 1GB, MAXSIZE = 25GB, FILEGROWTH = 1GB ); With a bigger log file specified we get more VLFs What if we make it bigger again? LOG ON ( NAME = VLF_Test_Log, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.ROCK_2008\MSSQL\DATA\VLF_Test_log.ldf', SIZE = 5GB, MAXSIZE = 250GB, FILEGROWTH = 5GB ); This time we see more VLFs are created within our log file. We now have our 5GB log file comprised of 16 files of 320MB each. In fact these sizes fall into all the ranges that control the VLF creation criteria – what a coincidence! The rules that are followed when a log file is created or has it’s size increased are pretty basic. If the file growth is lower than 64MB then 4 VLFs are created If the growth is between 64MB and 1GB then 8 VLFs are created If the growth is greater than 1GB then 16 VLFs are created. Now the potential for chaos comes if the default values and settings for log file growth are used. By default a database log file gets a 1MB log file with unlimited growth in steps of 10%. The database we just created is 6 MB, let’s add some data and see what happens. USE vlf_test go -- we need somewhere to put the data so, a table is in order IF OBJECT_ID('A_Table') IS NOT NULL DROP TABLE A_Table go CREATE TABLE A_Table ( Col_A int IDENTITY, Col_B CHAR(8000) ) GO -- Let's check the state of the log file -- 4 VLFs found EXECUTE ('DBCC LOGINFO'); go -- We can go ahead and insert some data and then check the state of the log file again INSERT A_Table (col_b) SELECT TOP 500 REPLICATE('a',2000) FROM sys.columns AS sc, sys.columns AS sc2 GO -- insert 500 rows and we get 22 VLFs EXECUTE ('DBCC LOGINFO'); go -- Let's insert more rows INSERT A_Table (col_b) SELECT TOP 2000 REPLICATE('a',2000) FROM sys.columns AS sc, sys.columns AS sc2 GO 10 -- insert 2000 rows, in 10 batches and we suddenly have 107 VLFs EXECUTE ('DBCC LOGINFO'); Well, that escalated quickly! Our log file is split, internally, into 107 fragments after a few thousand inserts. The same happens with any logged transactions, I just chose to illustrate this with INSERTs. Having too many VLFs can cause performance degradation at times of database start up, log backup and log restore operations so it’s well worth keeping a check on this property. How do we prevent excessive VLF creation? Creating the database with larger files and also with larger growth steps and actively choosing to grow your databases rather than leaving it to the Auto Grow event can make sure that the growths are made with a size that is optimal. How do we resolve a situation of a database with too many VLFs? This process needs to be done when the database is under little or no stress so that you don’t affect system users. The steps are: BACKUP LOG YourDBName TO YourBackupDestinationOfChoice Shrink the log file to its smallest possible size DBCC SHRINKFILE(FileNameOfTLogHere, TRUNCATEONLY) * Re-size the log file to the size you want it to, taking in to account your expected needs for the coming months or year. ALTER DATABASE YourDBName MODIFY FILE ( NAME = FileNameOfTLogHere, SIZE = TheSizeYouWantItToBeIn_MB) * – If you don’t know the file name of your log file then run sp_helpfile while you are connected to the database that you want to work on and you will get the details you need. The resize step can take quite a while This is already detailed far better than I can explain it by Kimberley Tripp in her blog 8-Steps-to-better-Transaction-Log-throughput.aspx. The result of this will be a log file with a VLF count according to the bullet list above. Knowing when VLFs are being created By complete coincidence while I have been writing this blog (it’s been quite some time from it’s inception to going live) Jonathan Kehayias from SQLSkills.com has written a great article on how to track database file growth using Event Notifications and Service Broker. I strongly recommend taking a look at it as this is going to catch any sneaky auto grows that take place and let you know about them right away. Hassle free monitoring of VLFs If you are lucky or wise enough to be using SQL Monitor or another monitoring tool that let’s you write your own custom metrics then you can keep an eye on this very easily. There is a custom metric for VLFs (written by Stuart Ainsworth) already on the site and there are some others there are very useful so take a moment or two to look around while you are there. Resources MSDN – http://msdn.microsoft.com/en-us/library/ms179355(v=sql.105).aspx Kimberly Tripp from SQLSkills.com – http://www.sqlskills.com/BLOGS/KIMBERLY/post/8-Steps-to-better-Transaction-Log-throughput.aspx Thomas LaRock at Simple-Talk.com – http://www.simple-talk.com/sql/database-administration/monitoring-sql-server-virtual-log-file-fragmentation/ Disclosure I am a Friend of Red Gate. This means that I am more than likely to say good things about Red Gate DBA and Developer tools. No matter how awesome I make them sound, take the time to compare them with other products before you contact the Red Gate sales team to make your order.

    Read the article

  • SQL University: Database testing and refactoring tools and examples

    - by Mladen Prajdic
    This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 3 - Database testing and refactoring tools and examples This is the third and last part of the series and in it we’ll take a look at tools we can test and refactor with plus some an example of the both. Tools of the trade First a few thoughts about how to go about testing a database. I'm firmily against any testing tools that go into the database itself or need an extra database. Unit tests for the database and applications using the database should all be in one place using the same technology. By using database specific frameworks we fragment our tests into many places and increase test system complexity. Let’s take a look at some testing tools. 1. NUnit, xUnit, MbUnit All three are .Net testing frameworks meant to unit test .Net application. But we can test databases with them just fine. I use NUnit because I’ve always used it for work and personal projects. One day this might change. So the thing to remember is to be flexible if something better comes along. All three are quite similar and you should be able to switch between them without much problem. 2. TSQLUnit As much as this framework is helpful for the non-C# savvy folks I don’t like it for the reason I stated above. It lives in the database and thus fragments the testing infrastructure. Also it appears that it’s not being actively developed anymore. 3. DbFit I haven’t had the pleasure of trying this tool just yet but it’s on my to-do list. From what I’ve read and heard Gojko Adzic (@gojkoadzic on Twitter) has done a remarkable job with it. 4. Redgate SQL Refactor and Apex SQL Refactor Neither of these refactoring tools are free, however if you have hardcore refactoring planned they are worth while looking into. I’ve only used the Red Gate’s Refactor and was quite impressed with it. 5. Reverting the database state I’ve talked before about ways to revert a database to pre-test state after unit testing. This still holds and I haven’t changed my mind. Also make sure to read the comments as they are quite informative. I especially like the idea of setting up and tearing down the schema for each test group with NHibernate. Testing and refactoring example We’ll take a look at the simple schema and data test for a view and refactoring the SELECT * in that view. We’ll use a single table PhoneNumbers with ID and Phone columns. Then we’ll refactor the Phone column into 3 columns Prefix, Number and Suffix. Lastly we’ll remove the original Phone column. Then we’ll check how the view behaves with tests in NUnit. The comments in code explain the problem so be sure to read them. I’m assuming you know NUnit and C#. T-SQL Code C# test code USE tempdbGOCREATE TABLE PhoneNumbers( ID INT IDENTITY(1,1), Phone VARCHAR(20))GOINSERT INTO PhoneNumbers(Phone)SELECT '111 222333 444' UNION ALLSELECT '555 666777 888'GO-- notice we don't have WITH SCHEMABINDINGCREATE VIEW vPhoneNumbersAS SELECT * FROM PhoneNumbersGO-- Let's take a look at what the view returns -- If we add a new columns and rows both tests will failSELECT *FROM vPhoneNumbers GO -- DoesViewReturnCorrectColumns test will SUCCEED -- DoesViewReturnCorrectData test will SUCCEED -- refactor to split Phone column into 3 partsALTER TABLE PhoneNumbers ADD Prefix VARCHAR(3)ALTER TABLE PhoneNumbers ADD Number VARCHAR(6)ALTER TABLE PhoneNumbers ADD Suffix VARCHAR(3)GO-- update the new columnsUPDATE PhoneNumbers SET Prefix = LEFT(Phone, 3), Number = SUBSTRING(Phone, 5, 6), Suffix = RIGHT(Phone, 3)GO-- remove the old columnALTER TABLE PhoneNumbers DROP COLUMN PhoneGO-- This returns unexpected results!-- it returns 2 columns ID and Phone even though -- we don't have a Phone column anymore.-- Notice that the data is from the Prefix column-- This is a danger of SELECT *SELECT *FROM vPhoneNumbers -- DoesViewReturnCorrectColumns test will SUCCEED -- DoesViewReturnCorrectData test will FAIL -- for a fix we have to call sp_refreshview -- to refresh the view definitionEXEC sp_refreshview 'vPhoneNumbers'-- after the refresh the view returns 4 columns-- this breaks the input/output behavior of the database-- which refactoring MUST NOT doSELECT *FROM vPhoneNumbers -- DoesViewReturnCorrectColumns test will FAIL -- DoesViewReturnCorrectData test will FAIL -- to fix the input/output behavior change problem -- we have to concat the 3 columns into one named PhoneALTER VIEW vPhoneNumbersASSELECT ID, Prefix + ' ' + Number + ' ' + Suffix AS PhoneFROM PhoneNumbersGO-- now it works as expectedSELECT *FROM vPhoneNumbers -- DoesViewReturnCorrectColumns test will SUCCEED -- DoesViewReturnCorrectData test will SUCCEED -- clean upDROP VIEW vPhoneNumbersDROP TABLE PhoneNumbers [Test]public void DoesViewReturnCoorectColumns(){ // conn is a valid SqlConnection to the server's tempdb // note the SET FMTONLY ON with which we return only schema and no data using (SqlCommand cmd = new SqlCommand("SET FMTONLY ON; SELECT * FROM vPhoneNumbers", conn)) { DataTable dt = new DataTable(); dt.Load(cmd.ExecuteReader(CommandBehavior.CloseConnection)); // test returned schema: number of columns, column names and data types Assert.AreEqual(dt.Columns.Count, 2); Assert.AreEqual(dt.Columns[0].Caption, "ID"); Assert.AreEqual(dt.Columns[0].DataType, typeof(int)); Assert.AreEqual(dt.Columns[1].Caption, "Phone"); Assert.AreEqual(dt.Columns[1].DataType, typeof(string)); }} [Test]public void DoesViewReturnCorrectData(){ // conn is a valid SqlConnection to the server's tempdb using (SqlCommand cmd = new SqlCommand("SELECT * FROM vPhoneNumbers", conn)) { DataTable dt = new DataTable(); dt.Load(cmd.ExecuteReader(CommandBehavior.CloseConnection)); // test returned data: number of rows and their values Assert.AreEqual(dt.Rows.Count, 2); Assert.AreEqual(dt.Rows[0]["ID"], 1); Assert.AreEqual(dt.Rows[0]["Phone"], "111 222333 444"); Assert.AreEqual(dt.Rows[1]["ID"], 2); Assert.AreEqual(dt.Rows[1]["Phone"], "555 666777 888"); }}   With this simple example we’ve seen how a very simple schema can cause a lot of problems in the whole application/database system if it doesn’t have tests. Imagine what would happen if some outside process would depend on that view. It would get wrong data and propagate it silently throughout the system. And that is not good. So have tests at least for the crucial parts of your systems. And with that we conclude the Database Testing and Refactoring week at SQL University. Hope you learned something new and enjoy the learning weeks to come. Have fun!

    Read the article

  • SQL SERVER – SSMS: Database Consistency History Report

    - by Pinal Dave
    Doctor and Database The last place I like to visit is always a hospital. With the monsoon season starting, intermittent rains, it has become sort of a routine to get a cycle of fever every other year (seriously I hate it). So when I visit my doctor, it is always interesting in the way he quizzes me. The routine question of – “How many days have you had this?”, “Is there any pattern?”, “Did you drench in rain?”, “Do you have any other symptom?” and so on. The idea here is that the doctor wants to find any anomaly or a pattern that will guide him to a viral or bacterial type. Most of the time they get it based on experience and sometimes after a battery of tests. So if there is consistent behavior to your problem, there is always a solution out. SQL Server has its way to find if the server data / files are in consistent state using the DBCC commands. Back to SQL Server In real life, Database consistency check is one of the critical operations a DBA generally doesn’t give much priority. Many readers of my blogs have asked many times, how do we know if the database is consistent? How do I read output of DBCC CHECKDB and find if everything is right or not? My common answer to all of them is – look at the bottom of checkdb (or checktable) output and look for below line. CHECKDB found 0 allocation errors and 0 consistency errors in database ‘DatabaseName’. Above is a “good sign” because we are seeing zero allocation and zero consistency error. If you are seeing non-zero errors then there is some problem with the database. Sample output is shown as below: CHECKDB found 0 allocation errors and 2 consistency errors in database ‘DatabaseName’. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DatabaseName). If we see non-zero error then most of the time (not always) we get repair options depending on the level of corruption. There is risk involved with above option (repair_allow_data_loss), that is – we would lose the data. Sometimes the option would be repair_rebuild which is little safer. Though these options are available, it is important to find the root cause to the problem. In standard report, there is a report which can show the history of checkdb executed for the selected database. Since this is a database level report, we need to right click on database, click Reports, click Standard Reports and then choose “Database Consistency History” report. The information in this report is picked from default trace. If default trace is disabled or there is no checkdb run or information is not there in default trace (because it’s rolled over), we would get report like below. As we can see report says it very clearly: Currently, no execution history of CHECKDB is available or default trace is not enabled. To demonstrate, I have caused corruption in one of the database and did below steps. Run CheckDB so that errors are reported. Fix the corruption by losing the data using repair option Run CheckDB again to check if corruption is cleared. After that I have launched the report and below is what we would see. If you are lazy like me and don’t want to run the report manually for each database then below query would be handy to provide same report for all database. This query is runs behind the scenes by the report. All I have done is remove the filter for database name (at the last – highlighted). DECLARE @curr_tracefilename VARCHAR(500); DECLARE @base_tracefilename VARCHAR(500); DECLARE @indx INT; SELECT @curr_tracefilename = path FROM sys.traces WHERE is_default = 1; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SELECT @indx  = PATINDEX('%\%', @curr_tracefilename) ; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SET @base_tracefilename = LEFT( @curr_tracefilename,LEN(@curr_tracefilename) - @indx) + '\log.trc'; SELECT  SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),36, PATINDEX('%executed%',TEXTData)-36) AS command ,       LoginName ,       StartTime ,       CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%found%',TEXTData) +6,PATINDEX('%errors %',TEXTData)-PATINDEX('%found%',TEXTData)-6)) AS errors ,       CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%repaired%',TEXTData) +9,PATINDEX('%errors.%',TEXTData)-PATINDEX('%repaired%',TEXTData)-9)) repaired ,       SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%time:%',TEXTData)+6,PATINDEX('%hours%',TEXTData)-PATINDEX('%time:%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%hours%',TEXTData) +6,PATINDEX('%minutes%',TEXTData)-PATINDEX('%hours%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%minutes%',TEXTData) +8,PATINDEX('%seconds.%',TEXTData)-PATINDEX('%minutes%',TEXTData)-8) AS time FROM::fn_trace_gettable( @base_tracefilename, DEFAULT) WHERE EventClass = 22 AND SUBSTRING(TEXTData,36,12) = 'DBCC CHECKDB' -- AND DatabaseName = @DatabaseName; Don’t get worried about the logic above. All it is doing is reading the trace files, parsing below entry and getting out information for underlined words. DBCC CHECKDB (CorruptedDatabase) executed by sa found 2 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds.  Internal database snapshot has split point LSN = 00000029:00000030:0001 and first LSN = 00000029:00000020:0001. Hopefully now onwards you would run checkdb and understand the importance of it. As responsible DBAs I am sure you are already doing it, let me know how often do you actually run them on you production environment? Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

    Read the article

  • Local LINQtoSQL Database For Your Windows Phone 7 Application

    - by Tim Murphy
    There aren’t many applications that are of value without having some for of data store.  In Windows Phone development we have a few options.  You can store text directly to isolated storage.  You can also use a number of third party libraries to create or mimic databases in isolated storage.  With Mango we gained the ability to have a native .NET database approach which uses LINQ to SQL.  In this article I will try to bring together the components needed to implement this last type of data store and fill in some of the blanks that I think other articles have left out. Defining A Database The first things you are going to need to do is define classes that represent your tables and a data context class that is used as the overall database definition.  The table class consists of column definitions as you would expect.  They can have relationships and constraints as with any relational DBMS.  Below is an example of a table definition. First you will need to add some assembly references to the code file. using System.ComponentModel;using System.Data.Linq;using System.Data.Linq.Mapping; You can then add the table class and its associated columns.  It needs to implement INotifyPropertyChanged and INotifyPropertyChanging.  Each level of the class needs to be decorated with the attribute appropriate for that part of the definition.  Where the class represents the table the properties represent the columns.  In this example you will see that the column is marked as a primary key and not nullable with a an auto generated value. You will also notice that the in the column property’s set method It uses the NotifyPropertyChanging and NotifyPropertyChanged methods in order to make sure that the proper events are fired. [Table]public class MyTable: INotifyPropertyChanged, INotifyPropertyChanging{ public event PropertyChangedEventHandler PropertyChanged; private void NotifyPropertyChanged(string propertyName) { if(PropertyChanged != null) { PropertyChanged(this, new PropertyChangedEventArgs(propertyName)); } } public event PropertyChangingEventHandler PropertyChanging; private void NotifyPropertyChanging(string propertyName) { if(PropertyChanging != null) { PropertyChanging(this, new PropertyChangingEventArgs(propertyName)); } } private int _TableKey; [Column(IsPrimaryKey = true, IsDbGenerated = true, DbType = "INT NOT NULL Identity", CanBeNull = false, AutoSync = AutoSync.OnInsert)] public int TableKey { get { return _TableKey; } set { NotifyPropertyChanging("TableKey"); _TableKey = value; NotifyPropertyChanged("TableKey"); } } The last part of the database definition that needs to be created is the data context.  This is a simple class that takes an isolated storage location connection string its constructor and then instantiates tables as public properties. public class MyDataContext: DataContext{ public MyDataContext(string connectionString): base(connectionString) { MyRecords = this.GetTable<MyTable>(); } public Table<MyTable> MyRecords;} Creating A New Database Instance Now that we have a database definition it is time to create an instance of the data context within our Windows Phone app.  When your app fires up it should check if the database already exists and create an instance if it does not.  I would suggest that this be part of the constructor of your ViewModel. db = new MyDataContext(connectionString);if(!db.DatabaseExists()){ db.CreateDatabase();} The next thing you have to know is how the connection string for isolated storage should be constructed.  The main sticking point I have found is that the database cannot be created unless the file mode is read/write.  You may have different connection strings but the initial one needs to be similar to the following. string connString = "Data Source = 'isostore:/MyApp.sdf'; File Mode = read write"; Using you database Now that you have done all the up front work it is time to put the database to use.  To make your life a little easier and keep proper separation between your view and your viewmodel you should add a couple of methods to the viewmodel.  These will do the CRUD work of your application.  What you will notice is that the SubmitChanges method is the secret sauce in all of the methods that change data. private myDataContext myDb;private ObservableCollection<MyTable> _viewRecords;public ObservableCollection<MyTable> ViewRecords{ get { return _viewRecords; } set { _viewRecords = value; NotifyPropertyChanged("ViewRecords"); }}public void LoadMedstarDbData(){ var tempItems = from MyTable myRecord in myDb.LocalScans select myRecord; ViewRecords = new ObservableCollection<MyTable>(tempItems);}public void SaveChangesToDb(){ myDb.SubmitChanges();}public void AddMyTableItem(MyTable newScan){ myDb.LocalScans.InsertOnSubmit(newScan); myDb.SubmitChanges();}public void DeleteMyTableItem(MyTable newScan){ myDb.LocalScans.DeleteOnSubmit(newScan); myDb.SubmitChanges();} Updating existing database What happens when you need to change the structure of your database?  Unfortunately you have to add code to your application that checks the version of the database which over time will create some pollution in your codes base.  On the other hand it does give you control of the update.  In this example you will see the DatabaseSchemaUpdater in action.  Assuming we added a “Notes” field to the MyTable structure, the following code will check if the database is the latest version and add the field if it isn’t. if(!myDb.DatabaseExists()){ myDb.CreateDatabase();}else{ DatabaseSchemaUpdater dbUdater = myDb.CreateDatabaseSchemaUpdater(); if(dbUdater.DatabaseSchemaVersion < 2) { dbUdater.AddColumn<MyTable>("Notes"); dbUdater.DatabaseSchemaVersion = 2; dbUdater.Execute(); }} Summary This approach does take a fairly large amount of work, but I think the end product is robust and very native for .NET developers.  It turns out to be worth the investment. del.icio.us Tags: Windows Phone,Windows Phone 7,LINQ to SQL,LINQ,Database,Isolated Storage

    Read the article

  • Restoring a Publisher Database in SQL Server

    Introduction Restoring any database is a critical task which will be  complicated by the database to be  restored being a publisher database. For the purposes of this article, I will assume familiarity with the different types ... [Read Full Article]

    Read the article

  • Advice on software / database design to avoid using cursors when updating database

    - by Remnant
    I have a database that logs when an employee has attended a course and when they are next due to attend the course (courses tend to be annual). As an example, the following employee attended course '1' on 1st Jan 2010 and, as the course is annual, is due to attend next on the 1st Jan 2011. As today is 20th May 2010 the course status reads as 'Complete' i.e. they have done the course and do not need to do it again until next year: EmployeeID CourseID AttendanceDate DueDate Status 123456 1 01/01/2010 01/01/2011 Complete In terms of the DueDate I calculate this in SQL when I update the employee's record e.g. DueDate = AttendanceDate + CourseFrequency (I pull course frequency this from a separate table). In my web based app (asp.net mvc) I pull back this data for all employees and display it in a grid like format for HR managers to review. This allows HR to work out who needs to go on courses. The issue I have is as follows. Taking the example above, suppose today is 2nd Jan 2011. In this case, employee 123456 is now overdue for the course and I would like to set the Status to Incomplete so that the HR manager can see that they need to action this i.e. get employee on the course. I could build a trigger in the database to run overnight to update the Status field for all employees based on the current date. From what I have read I would need to use cursors to loop over each row to amend the status and this is considered bad practice / inefficient or at least something to avoid if you can??? Alternatively, I could compute the Status in my C# code after I have pulled back the data from the database and before I display it on screen. The issue with this is that the Status in the database would not necessarily match what is shown on screen which just feels plain wrong to me. Does anybody have any advice on the best practice approach to such an issue? It helps, if I did use a cursor I doubt I would be looping over more than 1000 records at any given time. Maybe this is such small volume that using cursors is okay?

    Read the article

  • Persistent (purely functional) Red-Black trees on disk performance

    - by Waneck
    I'm studying the best data structures to implement a simple open-source object temporal database, and currently I'm very fond of using Persistent Red-Black trees to do it. My main reasons for using persistent data structures is first of all to minimize the use of locks, so the database can be as parallel as possible. Also it will be easier to implement ACID transactions and even being able to abstract the database to work in parallel on a cluster of some kind. The great thing of this approach is that it makes possible implementing temporal databases almost for free. And this is something quite nice to have, specially for web and for data analysis (e.g. trends). All of this is very cool, but I'm a little suspicious about the overall performance of using a persistent data structure on disk. Even though there are some very fast disks available today, and all writes can be done asynchronously, so a response is always immediate, I don't want to build all application under a false premise, only to realize it isn't really a good way to do it. Here's my line of thought: - Since all writes are done asynchronously, and using a persistent data structure will enable not to invalidate the previous - and currently valid - structure, the write time isn't really a bottleneck. - There are some literature on structures like this that are exactly for disk usage. But it seems to me that these techniques will add more read overhead to achieve faster writes. But I think that exactly the opposite is preferable. Also many of these techniques really do end up with a multi-versioned trees, but they aren't strictly immutable, which is something very crucial to justify the persistent overhead. - I know there still will have to be some kind of locking when appending values to the database, and I also know there should be a good garbage collecting logic if not all versions are to be maintained (otherwise the file size will surely rise dramatically). Also a delta compression system could be thought about. - Of all search trees structures, I really think Red-Blacks are the most close to what I need, since they offer the least number of rotations. But there are some possible pitfalls along the way: - Asynchronous writes -could- affect applications that need the data in real time. But I don't think that is the case with web applications, most of the time. Also when real-time data is needed, another solutions could be devised, like a check-in/check-out system of specific data that will need to be worked on a more real-time manner. - Also they could lead to some commit conflicts, though I fail to think of a good example of when it could happen. Also commit conflicts can occur in normal RDBMS, if two threads are working with the same data, right? - The overhead of having an immutable interface like this will grow exponentially and everything is doomed to fail soon, so this all is a bad idea. Any thoughts? Thanks! edit: There seems to be a misunderstanding of what a persistent data structure is: http://en.wikipedia.org/wiki/Persistent_data_structure

    Read the article

< Previous Page | 70 71 72 73 74 75 76 77 78 79 80 81  | Next Page >