Search Results

Search found 679 results on 28 pages for 'consistency dbcc'.

Page 14/28 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >

Customer Experience Management for Retail 2.0 - part 2 / 2

- by Sanjeev Sharma

In the previous post, i discussed some of the key trends shaping up in the retail industry, their implications and the challenges facing retailers seeking to regain control of the buyer-seller relationship. Is Customer Experience Management the panacea for the ailing retailers who are now awakening to the power of the consumer? Quite honestly, customer acquisition, retention and satisfaction have been top of mind for retailers for quite some time now. The missing piece of this puzzle is bringing all those countless hours of strategy and planning to fruition. This is more of an execution gap than anything else. Although technology has made consumers more informed, more mobile and more social, customer experience is still largely defined by delivering on the following: Consistent experiences, whether shopping online or offline Personalize-able interaction ("mass market" sounds good as an internal strategy but not when you are a buyer!) Timely order fulfillment, if not pro-active notification of delays Below is a concept architecture for streamlining front-end, mid-office and back-end interfaces through shared process to achieve consistency and efficiency in managing the customer experience from order capture to order provisioning.

Read the article
Reformatting and version control

- by l0b0

Code formatting matters. Even indentation matters. And consistency is more important than minor improvements. But projects usually don't have a clear, complete, verifiable and enforced style guide from day 1, and major improvements may arrive any day. Maybe you find that SELECT id, name, address FROM persons JOIN addresses ON persons.id = addresses.person_id; could be better written as / is better written than SELECT persons.id, persons.name, addresses.address FROM persons JOIN addresses ON persons.id = addresses.person_id; while working on adding more columns to the query. Maybe this is the most complex of all four queries in your code, or a trivial query among thousands. No matter how difficult the transition, you decide it's worth it. But how do you track code changes across major formatting changes? You could just give up and say "this is the point where we start again", or you could reformat all queries in the entire repository history. If you're using a distributed version control system like Git you can revert to the first commit ever, and reformat your way from there to the current state. But it's a lot of work, and everyone else would have to pause work (or be prepared for the mother of all merges) while it's going on. Is there a better way to change history which gives the best of all results: Same style in all commits Minimal merge work ? To clarify, this is not about best practices when starting the project, but rather what should be done when a large refactoring has been deemed a Good Thing™ but you still want a traceable history? Never rewriting history is great if it's the only way to ensure that your versions always work the same, but what about the developer benefits of a clean rewrite? Especially if you have ways (tests, syntax definitions or an identical binary after compilation) to ensure that the rewritten version works exactly the same way as the original?

Read the article
Impact of Service Oriented Architecture (SOA) on Business and IT Operations

The impact of Service Oriented Architecture (SOA) on business and IT operations varies from company to company. I think more and more companies are starting to view SOA as just another technology that they can incorporate in an existing or new system. One of the driving factors in using SOA is the reduction in maintenance costs and decrease in the time needed to bring products to market. The reductions in costs, and reduced turnaround time can be directly converted in to increased profitability due to less expenditures that are needed in order to maintain or create new systems. My personal perspective on SOA is that it is great for what it is actually intended to do. SOA allows systems to be distributed across networks or even the world while ensuring enterprise processing consistency, data integrity and preventing code duplication. This being said a lot of preparation and work goes into properly designing and implementing an SOA especially if an enterprise wants to take full advantage of its benefits. Even though SOA has recently gotten a lot of hype about its benefits it does not a perfect fit for all situations. At the end of the day SOA is just another tool in my tool belt that I can pull from to create solutions that meet the business’s needs. Based on current industry trends SOA appears to be a very solid technology to use moving forward, especially as more and more companies shift towards cloud based computing. It is important to remember that SOA is one of many technologies that can be used in creating business solutions and I think more time will be spent in the future evaluating if SOA is the right technology for a solution once the initial hype of SOA has calmed down.

Read the article
Is it normal for a SAS drive to have a few bad blocks, or should I replace my drive ASAP?

- by Nate

I have a drive—part of a RAID 1 mirror—that has two bad blocks. Adaptec Storage Manger e-mailed me when it detected the blocks. It shows 4 medium errors for that drive, but state is still “optimal”. This is my first time using Adaptec RAID controllers. I don’t know if an occasional bad block is normal, or if I should immediately replace that drive. Update: The drive failed later the same day! The disk subsystem is: Adaptec 6405 with ZMM (2) Seagate near-line SAS drives (ST31000424SS) The other drive hasn’t reported any bad blocks yet. I am running a consistency check.

Read the article
Lots of goodies

- by wcoekaer

We just issued a press release with a number of very good updates for everyone There are a few things of importance : 1) As of right now, Oracle Linux 6 with the Unbreakable Kernel is certified with a number of Oracle products such as Oracle Database 11gR2 and Oracle Fusion Middleware. The certification pages in the Oracle Support portal will be updated with the latest certification status for the various products. As always we have gone through a long period of very comprehensive testing and validation to ensure that the whole stack works really well together, with very large database workloads, middleware application workloads etc. 2) Standard certification efforts for Oracle Linux 6 with the Red Hat Compatible Kernel are in progress and we expect that to be completed in the next few months. Because of the compatibility between OL6 and RHEL6 we can then also state certification for RHEL6. 3) Oracle Linux binaries (and of course source code) have been free for download -and- use (including production, not just trial periods) since day one. You can freely redistribute the binaries, unlike many other Linux vendors where you need to pay a support subscription to even get access to the binaries. We offered both the base distribution release DVDs (OL4, OL5, OL6) and the update releases, such as 5.1, 5.2 etc. this way. Today, in this announcement, we also started to make available the bugfix and security updates released in between these update releases. So the errata streams (both binary and source code) for OL4, 5 and 6 are now free for download and use from http://public-yum.oracle.com. This includes uek and uek2. The nice thing is, if you want a complete up to date system without support, use this, if you then need support, get a support subscription. Simple, convenient, effective. We have great SLA's in producing our update streams, consistency in release timing and testing of all the components. Have at it!

Read the article
If you favor "T *var", do you ever write "T*"?

- by Roger Pate

Thinking about where we place our asterisks; how do those that prefer to keep the "pointerness" away from the type and with the identifier (int *i) write code when the identifier is missing? void f(int*); // 1 void f(int *); // 2 The former seems much more common, no matter what your preference when with the identifier. Is this a special case? What makes it an exception? However, the first still isn't universal, because I have seen the latter style. Besides consistency along the lines of "there's always a space with the identifier, so we have one without", are there any other reasons to prefer it? What about casts or array and function types? How would you re-write these: (void*)var /*or*/ (void *)var int[3] /*or*/ int [3] // more relevant in C++ than C: Example<int[3]> void(int) /*or*/ void (int) // more relevant in C++ than C: std::function<void(int)> The latter two would rarely, if ever, be used in C, but are seen with C++ templates.

Read the article
Template syntax for users - is there a right way to do it?

- by RickM

Ok, I'm in the middle of building a saas system, and as part of that, the hosted clients need to be able to edit certain layout templates, baqsically just html, css and javascript files. I'm obviously going to be wanting to use a template syntax here as it would be dumb to let people execute PHP code, so in this instance template syntax does need to be used. I know that in the grand scale of things, this is a very minor thing, but what template syntax do you use, and why? Is there one that's considered better than others? I've seen all sorts being used with no real consistency, for example: Smarty Style: {$someVar} {foreach from="foo" item="bar"} {$bar.food} {/foreach} ASP Style: {% someVar %} {% foreach foo as bar %} {% bar.food %} {% endforeach %} HTML Style: <someVar> <foreach from="foo" item="bar"> <bar:food> </foreach> PyroCMS/FuelPHP "LEX" Style: {{ someVar }} {{ foreach from="foo" item="bar" }} {{ bar:food }} {{ endforeach }} Obviously these arent 100% accurate (for example, LEX is used alongside PHP for loops), and are only to give you an example of what I mean. What, in your opinion would be the best one (if any) to go with. I ask this bearing in mind that people using this are likely to be novice users. I did look around at a bunch of hosted CMS and E-Commerce systems as these seem to make use of user-editable templates, and most seem to be using some form of their own syntax. I should note that whatever style I end up going with, it will be with a custom template handler due to the complexity of the system and how template files are stored. Plus I'd not want to touch the likes of Smarty with a barge pole!

Read the article
Information I need to know as a Java Developer [on hold]

- by Woy

I'm a java developer. I'm trying to get more knowledge to become a better programmer. I've listed a number of technologies to learn. Instead of what I've listed, what technologies would you suggest to learn as well for a Junior Java Developer? I realize, there's a lot of things to study. Java: - how a garbage collector works - resource management - network programming - TCP/IP HTTP - transactions, - consistency: interfaces, classes collections, hash codes, algorithms, comp. complexity concurrent programming: synchronizing, semafores steam management metability: thread-safety byte code manipulations, reflections, Aspect-Oriented Programming as base to understand frameworks such as Spring etc. Web stack: servlets, filters, socket programming Libraries: JDK, GWT, Apache Commons, Joda-Time, Dependency Injections: Spring, Nano Tools: IDE: very good knowledge - debugger - profiler - web analyzers: Wireshark, firebugs - unit testing SQL/Databases: Basics SELECTing columns from a table Aggregates Part 1: COUNT, SUM, MAX/MIN Aggregates Part 2: DISTINCT, GROUP BY, HAVING + Intermediate JOINs, ANSI-89 and ANSI-92 syntax + UNION vs UNION ALL x NULL handling: COALESCE & Native NULL handling Subqueries: IN, EXISTS, and inline views Subqueries: Correlated ITH syntax: Subquery Factoring/CTE Views Advanced Topics Functions, Stored Procedures, Packages Pivoting data: CASE & PIVOT syntax Hierarchical Queries Cursors: Implicit and Explicit Triggers Dynamic SQL Materialized Views Query Optimization: Indexes Query Optimization: Explain Plans Query Optimization: Profiling Data Modelling: Normal Forms, 1 through 3 Data Modelling: Primary & Foreign Keys Data Modelling: Table Constraints Data Modelling: Link/Corrollary Tables Full Text Searching XML Isolation Levels Entity Relationship Diagrams (ERDs), Logical and Physical Transactions: COMMIT, ROLLBACK, Error Handling

Read the article
Mirroring of Apps across servers

- by user1038814

We wish to host multiple apps across multiple servers. What we are looking for (ideally) is an existing solution which will work. For example, normally to do it we'd follow a route (for failover) like: App is installed on one server along with mysql database App is also installed on a second server. Rsync is used to mirror the files over to the second server and ensure consistency MySQL is installed with a Master-Slave setup. We use a service such as DNS Made Easy which has a DNS failover. If one server goes down it automatically routes traffic to the backup server We have done the above a few times and generally its fine. The issue I have here is that the above is for one app. What I would like to look at is how we can manage for multiple apps and if there is a layer (such as VMWare) that has complete mirroring built in at the OS level? For example how do web hosts currently do it when they ensure that more than one machine is running a bunch of hosted websites. If you were running hosting and you had 200 clients on a server you would want the same clients across 2 or more servers and want everything mirrored. Any advice would be much appreciated.

Read the article
How to choose NoSQL database engine?

- by Poma

We have a database with following specs: 30k records, 7mb in size 20 inserts/second 1000 updates/second 1000 range selects/second, by secondary index, approx 10 rows each needs at least one secondary index needs some mechanism to expire keys if they are not updated for 75 secs (can be done via programmatic garbage collector but will require additional 'last_update' index and will add some load) consistency is not required durability is not required db should be stored in memory For now we use Redis, but it does not have secondary index and it's keys index:foo:* is too slow. Membase also does not have secondary index (as far as I know). MongoDB and MySQL memory engine have table-level locks. What engine will fit our use case?

Read the article
Should sanity be a property of a programmer or a program?

- by toplel32

I design and implement languages, that can range from object notations to markup languages. In many cases I have considered restrictions in favor of sanity (common knowledge), like in the case of control characters in identifiers. There are two consequences to consider before doing this: It takes extra computation It narrows liberty I'm interested to learn how developers think of decisions like this. As you may know Microsoft C# is very open on the contrary. If you really want to prefix your integer as Long with 'l' instead of 'L' and so risk other developers of confusing '1' and 'l', no problem. If you want to name your variables in non-latin script so they will contrast with C#'s latin keywords, no problem. Or if you want to distribute a string over multiple lines and so break a series of indentation, no problem. It is cheap to ensure consistency with restrictions and this makes it tempting to implement. But in the case of disallowing non-latin characters (concerning the second example), it means a discredit to Unicode, because one would not take full advantage of its capacity.

Read the article
share code between check and process methods

- by undu

My job is to refactor an old library for GIS vector data processing. The main class encapsulates a collection of building outlines, and offers different methods for checking data consistency. Those checking functions have an optional parameter that allows to perform some process. For instance: std::vector<Point> checkIntersections(int process_mode = 0); This method tests if some building outlines are intersecting, and return the intersection points. But if you pass a non null argument, the method will modify the outlines to remove the intersection. I think it's pretty bad (at call site, a reader not familiar with the code base will assume that a method called checkSomething only performs a check and doesn't modifiy data) and I want to change this. I also want to avoid code duplication as check and process methods are mostly similar. So I was thinking to something like this: // a private worker std::vector<Point> workerIntersections(int process_mode = 0) { // it's the equivalent of the current checkIntersections, it may perform // a process depending on process_mode } // public interfaces for check and process std::vector<Point> checkIntersections() /* const */ { workerIntersections(0); } std::vector<Point> processIntersections(int process_mode /*I have different process modes*/) { workerIntersections(process_mode); } But that forces me to break const correctness as workerIntersections is a non-const method. How can I separate check and process, avoiding code duplication and keeping const-correctness?

Read the article
How do I shrink my SQL Server Database?

- by Rory Becker

I have a Database nearly 1.9Gb Database in size MSDE2000 does not allow DBs that exceed 2.0Gb I need to shrink this DB (and many like it at various client locations) I have found and deleted many 100's of 1000's of records which are considered unneeded. these records account for a large percentage of some of the main (largest) tables in the Database. Therefore it's reasonable to assume much space should now be retrievable. So now I need to shrink the DB to account for the missing records I execute "DBCC ShrinkDatabase('MyDB')"......No effect. I have tried the various shrink facilities provided in MSSMS.... Still no effect. I have backed up the database and restored it... Still no effect. Still 1.9Gb Why? Whatever procedure I eventually find needs to be replayable on a client machine with access to nothing other than OSql or similar.

Read the article
Failed to Kill Process in SQL 2008

- by Andrea.Ko

I have a process with the following information, and i execute the kill process to kill this id, and it return me "Only user processes can be killed." SPID:11 Status:BACKGROUND Login:sa HostName: . BlkBy: . DBName: SAFEMIG Command:CHECKPOINT Normally, all the session to login to this server, it should have a HostName which display our PC name, but this connection is with a dot, so not sure who is executing what process that have this connection. I execute "dbcc inputbuffer(11)" It return me"EventType= No Event, Parameters = 0 and EventInfo=Null" Appreciate for any help\advice on this problem!

Read the article
Unaccounted for database size

- by Nazadus

I currently have a database that is 20GB in size. I've run a few scripts which show on each tables size (and other incredibly useful information such as index stuff) and the biggest table is 1.1 million records which takes up 150MB of data. We have less than 50 tables most of which take up less than 1MB of data. After looking at the size of each table I don't understand why the database shouldn't be 1GB in size after a shrink. The amount of available free space that SqlServer (2005) reports is 0%. The log mode is set to simple. At this point my main concern is I feel like I have 19GB of unaccounted for used space. Is there something else I should look at? Normally I wouldn't care and would make this a passive research project except this particular situation calls for us to do a backup and restore on a weekly basis to put a copy on a satellite (which has no internet, so it must be done manually). I'd much rather copy 1GB (or even if it were down to 5GB!) than 20GB of data each week. sp_spaceused reports the following: Navigator-Production 19184.56 MB 3.02 MB And the second part of it: 19640872 KB 19512112 KB 108184 KB 20576 KB while I've found a few other scripts (such as the one from two of the server database size questions here, they all report the same information either found above or below). The script I am using is from SqlTeam. Here is the header info: * BigTables.sql * Bill Graziano (SQLTeam.com) * graz@<email removed> * v1.11 The top few tables show this (table, rows, reserved space, data, index, unused, etc): Activity 1143639 131 MB 89 MB 41768 KB 1648 KB 46% 1% EventAttendance 883261 90 MB 58 MB 32264 KB 328 KB 54% 0% Person 113437 31 MB 15 MB 15752 KB 912 KB 103% 3% HouseholdMember 113443 12 MB 6 MB 5224 KB 432 KB 82% 4% PostalAddress 48870 8 MB 6 MB 2200 KB 280 KB 36% 3% The rest of the tables are either the same in size or smaller. No more than 50 tables. Update 1: - All tables use unique identifiers. Usually an int incremented by 1 per row. I've also re-indexed everything. I ran the dbcc shrink command as well as updating the usage before and after. And over and over. An interesting thing I found is that when I restarted the server and confirmed no one was using it (and no maintenance procs are running, this is a very new application -- under a week old) and when I went to run the shrink, every now and then it would say something about data changed. Googling yielded too few useful answers with the obvious not applying (it was 1am and I disconnected everyone, so it seems impossible that was really the case). The data was migrated via C# code which basically looked at another server and brought things over. The quantity of deletes, at this point in time, are probably under 50k in rows. Even if those rows were the biggest rows, that wouldn't be more than 100M I would imagine. When I go to shrink via the GUI it reports 0% available to shrink, indicating that I've already gotten it as small as it thinks it can go. Update 2: sp_spaceused 'Activity' yields this (which seems right on the money): Activity 1143639 134488 KB 91072 KB 41768 KB 1648 KB Fill factor was 90. All primary keys are ints. Here is the command I used to 'updateusage': DBCC UPDATEUSAGE(0); Update 3: Per Edosoft's request: Image 111975 2407773 19262184 It appears as though the image table believes it's the 19GB portion. I don't understand what this means though. Is it really 19GB or is it misrepresented? Update 4: Talking to a co-worker and I found out that it's because of the pages, as someone else here has also state the potential for that. The only index on the image table is a clustered PK. Is this something I can fix or do I just have to deal with it? The regular script shows the Image table to be 6MB in size. Update 5: I think I'm just going to have to deal with it after further research. The images have been resized to be roughly 2-5KB each and on a normal file system doesn't consume much space but on SqlServer it seems to consume considerably more. The real answer, in the long run, will likely be separating that table in to another partition or something similar.

Read the article
Backing Up Transaction Logs to Tape?

- by David Stein

I'm about to put my database in Full Recovery Model and start taking transaction log backups. I am taking a full nightly backup to another server and later in the evening this file and many others are backed up to tape. My question is this. I will take hourly (or more if necessary) t-log backups and store them on the other server as well. However, if my full backups are passing DBCC and integrity checks, do I need to put my T-Logs on tape? If someone wants point in time recovery to yesterday at 2pm, I would need the previous full backup and the transaction logs. However, other than that case, if I know my full back ups are good, is there value in keeping the previous day's transaction log backups?

Read the article
How to consolidate multiple LOG files into one .LDF file in SQL2000

- by John Galt

Here is what sp_helpfile says about my current database (recovery model is Simple) in SQL2000: name fileid filename size maxsize growth usage MasterScratchPad_Data 1 C:\SQLDATA\MasterScratchPad_Data.MDF 6041600 KB Unlimited 5120000 KB data only MasterScratchPad_Log 2 C:\SQLDATA\MasterScratchPad_Log.LDF 2111304 KB Unlimited 10% log only MasterScratchPad_X1_Log 3 E:\SQLDATA\MasterScratchPad_X1_Log.LDF 191944 KB Unlimited 10% log only I'm trying to prepare this for a detach then an attach to a sql2008 instance but I don't want to have the 2nd .LDF file (I'd like to have just one file for the log). I have backed up the database. I have issued: BACKUP LOG MasterScratchPad WITH TRUNCATE_ONLY. I have run multiple DBCC SHRINKFILE commands on both of the LOG files. How can I accomplish this goal of having just one .LDF? I cannot find anything on how to delete the one with fileid of 3 and/or how to consolidate multiple files into one log file.

Read the article
SQL Server 2008 log size management problems

- by b0x0rz

I'm trying to shrink the log of a database AND set the recovery to simple, but always there is an error, whatever i try. USE 4_o5; GO ALTER DATABASE 4_o5 SET RECOVERY SIMPLE; GO DBCC SHRINKFILE (4_o5_log, 10); GO the output of sp_helpfile says that log file is located under (hosted solution): I:\dataroot\4_o5_log.LDF please help me perform this operation as the log file got large when importing a lot of data and now this info is no longer needed, have multiple (lots of) backups since then. the exact error message when performing the query above is: incorrect syntax near '4'. RECOVERY is not a recognized SET option. incorrect syntax near _5_log'. i am using visual studio 2010 (also have SQL Server Express installed locally, SQL Server 2008 proper installed at provider (shared)) thnx a lot

Read the article
Using current database name in T-SQL has Using statement

- by AmRoSH

Hello everybody. I have application runs T-SQL statements to update more than one database the problem is i'm using the following t-sql USE [msdb] GO DECLARE @jobId BINARY(16) EXEC msdb.dbo.sp_add_job @job_name=N'test2', @enabled=1, @start_step_id=1, @notify_level_eventlog=0, @notify_level_email=2, @notify_level_netsend=2, @notify_level_page=2, @delete_level=0, @description=N'', @category_name=N'[Uncategorized (Local)]', @owner_login_name=N'sa', @notify_email_operator_name=N'', @notify_netsend_operator_name=N'', @notify_page_operator_name=N'', @job_id = @jobId OUTPUT select @jobId GO EXEC msdb.dbo.sp_add_jobserver @job_name=N'test2', @server_name = N'AMR-PC\SQL2008' GO USE [msdb] GO EXEC msdb.dbo.sp_add_jobstep @job_name=N'test2', @step_name=N'test', @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_fail_action=2, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'EXEC sp_MSforeachdb '' EXEC sp_MSforeachtable @command1=''''DBCC DBREINDEX (''''''''*'''''''')'''', @replacechar=''''*''''''', @database_name=N'Client5281', @output_file_name=N'C:\Documents and Settings\Amr\Desktop\Scheduel Reports\report', @flags=2 GO USE [msdb] GO DECLARE @schedule_id int EXEC msdb.dbo.sp_add_jobschedule @job_name=N'test2', @name=N'test', @enabled=1, @freq_type=8, @freq_interval=1, @freq_subday_type=1, @freq_subday_interval=0, @freq_relative_interval=0, @freq_recurrence_factor=1, @active_start_date=20100517, @active_end_date=99991231, @active_start_time=0, @active_end_time=235959, @schedule_id = @schedule_id OUTPUT select @schedule_id GO and i'm using (USE [msdb]) before any block and i want to get database name to replace this @database_name=N'**Client5281**', with the current database name instead of ([msdb]) that i'm using. i hope that i explained what i want well.

Read the article
Truncating a table referenced by a foreign key

- by born to hula

Hey, We have two tables in a SQL Server 2005 database, A and B.There is a service which truncates table A every day. Recently, a foreign key constraint was added to table B, referencing table A. As a result, it isn't possible truncating table A anymore, even if table B is empty. Is there any workaround to get the same result as truncating table A? I've already tried the approach below but the identity wasn't reset. DBCC CHECKIDENT (TABLENAME, RESEED, 0) PS. before anyone points this as a duplicate, the different thing here is that I'm not allowed to drop constraints, nor creating any.

Read the article
Where's the rest of the space used in this table?

- by Eric H.

I'm using SQL Server 2005. I have a table whose row size should be 124 bytes. It's all ints or floats, no NULL columns (so everything is fixed width). There is only one index, clustered. The fill factor is 0. After inserting a ton of data, sp_spaceused returns the following name rows reserved data index_size unused OHLC_Bar_Trl 117076054 29807664 KB 29711624 KB 92344 KB 3696 KB which shows a rowsize of approx (29807664*1024)/117076054 = 260 bytes/row. Where's the rest of the space? Is there some DBCC command I need to run to tighten up this table (I could not insert the rows in correct index order, so maybe it's just internal fragmentation)?

Read the article
How do I view the parameters of currently running procs in SQL Server 2008

- by Pez

I am trying to troubleshoot an issue that is popping up on our new SQL Server. While viewing the running processes (sp_who2) I can't tell what parameters a proc was started with. I can find the name of the proc using: DBCC INPUTBUFFER (spid) I can even find some additional info, but I can't see a way to show the parameters. (http://sqlserverpedia.com/blog/sql-server-bloggers/sql-server-%E2%80%93-get-last-running-query-based-on-spid/) I know I can see the parameters if I do a trace, but that doesn't help in this case.

Read the article
SQL SERVER – Summary of Month – Wait Type – Day 28 of 28

- by pinaldave

I am glad to announce that the month of Wait Types and Queues very successful. I am glad that it was very well received and there was great amount of participation from community. I am fortunate to have some of the excellent comments throughout the series. I want to dedicate this series to all the guest blogger – Jonathan, Jacob, Glenn, and Feodor for their kindness to take a participation in this series. Here is the complete list of the blog posts in this series. I enjoyed writing the series and I plan to continue writing similar series. Please offer your opinion. SQL SERVER – Introduction to Wait Stats and Wait Types – Wait Type – Day 1 of 28 SQL SERVER – Signal Wait Time Introduction with Simple Example – Wait Type – Day 2 of 28 SQL SERVER – DMV – sys.dm_os_wait_stats Explanation – Wait Type – Day 3 of 28 SQL SERVER – DMV – sys.dm_os_waiting_tasks and sys.dm_exec_requests – Wait Type – Day 4 of 28 SQL SERVER – Capturing Wait Types and Wait Stats Information at Interval – Wait Type – Day 5 of 28 SQL SERVER – CXPACKET – Parallelism – Usual Solution – Wait Type – Day 6 of 28 SQL SERVER – CXPACKET – Parallelism – Advanced Solution – Wait Type – Day 7 of 28 SQL SERVER – SOS_SCHEDULER_YIELD – Wait Type – Day 8 of 28 SQL SERVER – PAGEIOLATCH_DT, PAGEIOLATCH_EX, PAGEIOLATCH_KP, PAGEIOLATCH_SH, PAGEIOLATCH_UP – Wait Type – Day 9 of 28 SQL SERVER – IO_COMPLETION – Wait Type – Day 10 of 28 SQL SERVER – ASYNC_IO_COMPLETION – Wait Type – Day 11 of 28 SQL SERVER – PAGELATCH_DT, PAGELATCH_EX, PAGELATCH_KP, PAGELATCH_SH, PAGELATCH_UP – Wait Type – Day 12 of 28 SQL SERVER – FT_IFTS_SCHEDULER_IDLE_WAIT – Full Text – Wait Type – Day 13 of 28 SQL SERVER – BACKUPIO, BACKUPBUFFER – Wait Type – Day 14 of 28 SQL SERVER – LCK_M_XXX – Wait Type – Day 15 of 28 SQL SERVER – Guest Post – Jonathan Kehayias – Wait Type – Day 16 of 28 SQL SERVER – WRITELOG – Wait Type – Day 17 of 28 SQL SERVER – LOGBUFFER – Wait Type – Day 18 of 28 SQL SERVER – PREEMPTIVE and Non-PREEMPTIVE – Wait Type – Day 19 of 28 SQL SERVER – MSQL_XP – Wait Type – Day 20 of 28 SQL SERVER – Guest Posts – Feodor Georgiev – The Context of Our Database Environment – Going Beyond the Internal SQL Server Waits – Wait Type – Day 21 of 28 SQL SERVER – Guest Post – Jacob Sebastian – Filestream – Wait Types – Wait Queues – Day 22 of 28 SQL SERVER – OLEDB – Link Server – Wait Type – Day 23 of 28 SQL SERVER – 2000 – DBCC SQLPERF(waitstats) – Wait Type – Day 24 of 28 SQL SERVER – 2011 – Wait Type – Day 25 of 28 SQL SERVER – Guest Post – Glenn Berry – Wait Type – Day 26 of 28 SQL SERVER – Best Reference – Wait Type – Day 27 of 28 SQL SERVER – Summary of Month – Wait Type – Day 28 of 28 Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Optimization, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, SQLServer, T SQL, Technology

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
SQL SERVER – What the Business Says Is Not What the Business Wants

- by pinaldave

This blog post is written in response to T-SQL Tuesday hosted by Steve Jones. Steve raised a very interesting question; every DBA and Database Developer has already faced this situation. When I read the topic, I felt that I can write several different examples here. Today, I will cover this scenario, which seems quite amusing. Shrinking Database Earlier this year, I was working on SQL Server Performance Tuning consultancy; I had faced very interesting situation. No matter how much I attempt to reduce the fragmentation, I always end up with heavy fragmentation on the server. After careful research, I figured out that one of the jobs was continuously Shrinking the Database – which is a very bad practice. I have blogged about my experience over here SQL SERVER – SHRINKDATABASE For Every Database in the SQL Server. I removed the incorrect shrinking process right away; once it was removed, everything continued working as it should be. After a couple of days, I learned that one of their DBAs had put back the same DBCC process. I requested the Senior DBA to find out what is going on and he came up with the following reason: “Business Requirement.” I cannot believe this! Now, it was time for me to go deep into the subject. Moreover, it had become necessary to understand the need. After talking to the concerned people here, I understood what they needed. Please read the exact business need in their own language. The Shrinking “Business Need” “We shrink the database because if we take backup after shrinking the database, the size of the same is smaller. Once we take backup, we have to send it to our remote location site. Our business requirement is that we need to always make sure that the file is smallest when we transfer it to remote server.” The backup is not affected in any way if you shrink the database or not. The size of backup will be the same. After a couple of the tests, they agreed with me. Shrinking will create performance issues for the same as it will introduce heavy fragmentation in the database. The Real Solution The real business need was that they needed the smallest possible backup file. We finally implemented a quick solution which they are still using to date. The solution was compressed backup. I have written about this subject in detail few years before SQL SERVER – 2008 – Introduction to New Feature of Backup Compression. Compressed backup not only creates a small filesize but also increases the speed of the database as well. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article

Search Results

Search found 679 results on 28 pages for 'consistency dbcc'.

Page 14/28 | < Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >

- by Sanjeev Sharma

- by l0b0

- by Nate

- by wcoekaer

- by Roger Pate

- by RickM

- by Woy

- by user1038814

- by Poma

- by toplel32

- by undu

- by Rory Becker

- by Andrea.Ko

- by Nazadus

- by David Stein

- by John Galt

- by b0x0rz

- by AmRoSH

- by born to hula

- by Eric H.

- by Pez

- by pinaldave

- by Paul White

- by pinaldave

< Previous Page | 10 11 12 13 14 15 16 17 18 19 20 21 | Next Page >