Search Results

Search found 43338 results on 1734 pages for 'table less design'.

Page 15/1734 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >

Normalized class design and code first

- by dc7a9163d9

There are the following two classes. public class Employee { int EmployeeId { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public string Street { get; set; } public string Street2 { get; set; } public string City { get; set; } public string State { get; set; } public string Zip { get; set; } } public class Company { int CompanyId { get; set; } public string Name { get; set; } public string Street { get; set; } public string Street2 { get; set; } public string City { get; set; } public string State { get; set; } public string Zip { get; set; } } In a DDD seminar, the speaker said the better design should be, class PersonName { public string FirstName { get; set; } public string LastName { get; set; } } class Address { public string Street { get; set; } public string Street2 { get; set; } public string City { get; set; } public string State { get; set; } public string Zip { get; set; } } public class Employee { int EmployeeId { get; set; } public PersonName Name { get; set; } [ForeignKey("EmployerAddress")] public int EmployerAddressId { get; set; } public virtual Address EmployerAddress { get; set; } } public class Company { int CompanyId { get; set; } public string Name { get; set; } [ForeignKey("CompanyAddress")] public int CompanyAddressId { get; set; } public virtual Address CompanyAddress { get; set; } } Is it the optimized design? How the code first generate the PersonName table and link it to Employee?

Read the article
postgresql duplicate table names best practice

- by veilig

My company has a handful of apps that we deploy in the websites we build. Recently a very old app needed to be included along side a newer app and there was a conflict w/ a duplicate table name needed to be used by both apps. We are now in the process of updating an old app and there will be some DB updates. I'm curious what people consider best practice (or how do you do it) to help ensure these name collisions don't happen. I've looked at schema's but not sure if thats the right path we want to take. As the documentation prescribes, I don't want to "wire" a particular schema name into an application and if I add schema's to the user search path how would it know which table I was referring to if two schema's have the same table name. although, maybe I'm reading to much into this. Any insights or words of wisdom would be greatly appreciated!

Read the article
drawbacks of storing all ''things' in a central table

- by naiquevin

Hi, I am not sure if there is a term to describe this, but I have observed that content management systems store all kinds of data in a single table with their bare minimum properties while the meta data is stored in another table in form of key value pairs. for eg. everything (blog posts, pages, images, events etc) is stored in one table and considered as a post. I understand that this allows for abstraction and easy extensibility we are considering designing our new project this way. It is not exactly a CMS but we plan to keep adding modules to it in stages. Lets say initially there will be only posts and images on which comments can be posted. Later on we might add videos which will also have the commenting feature. what are the drawbacks of this approach ? and will it work for a requirement like ours ? Thanks

Read the article
Configuration data: single-row table vs. name-value-pair table

- by Heinzi

Let's say you write an application that can be configured by the user. For storing this "configuration data" into a database, two patterns are commonly used. The single-row table CompanyName | StartFullScreen | RefreshSeconds | ... ---------------+-------------------+------------------+-------- ACME Inc. | true | 20 | ... The name-value-pair table ConfigOption | Value -----------------+------------- CompanyName | ACME Inc. StartFullScreen | true (or 1, or Y, ...) RefreshSeconds | 20 ... | ... I've seen both options in the wild, and both have obvious advantages and disadvantages, for example: The single-row tables limits the number of configuration options you can have (since the number of columns in a row is usually limited). Every additional configuration option requires a DB schema change. In a name-value-pair table everything is "stringly typed" (you have to encode/decode your Boolean/Date/etc. parameters). (many more) Is there some consensus within the development community about which option is preferable?

Read the article
Multiple foreign keys in one table to 1 other table in mysql

- by djerry

Hey guys, I got 2 tables in my database: user and call. User exists of 3 fields: id, name, number and call : id, 'source', 'destination', 'referred', date. I need to monitor calls in my app. The 3 ' ' fields above are actually userid numbers. now i'm wondering, can i make those 3 field foreign key elements of the id-field in table user? Thanks in advance...

Read the article
Design Pattern for building a Budget

- by Scott

So I've looked at the Builder Pattern, Abstract Interfaces, other design patterns, etc. - and I think I'm over thinking the simplicity behind what I'm trying to do, so I'm asking you guys for some help with either recommending a design pattern I should use, or an architecture style I'm not familiar with that fits my task. So I have one model that represents a Budget in my code. At a high level, it looks like this: public class Budget { public int Id { get; set; } public List<MonthlySummary> Months { get; set; } public float SavingsPriority { get; set; } public float DebtPriority { get; set; } public List<Savings> SavingsCollection { get; set; } public UserProjectionParameters UserProjectionParameters { get; set; } public List<Debt> DebtCollection { get; set; } public string Name { get; set; } public List<Expense> Expenses { get; set; } public List<Income> IncomeCollection { get; set; } public bool AutoSave { get; set; } public decimal AutoSaveAmount { get; set; } public FundType AutoSaveType { get; set; } public decimal TotalExcess { get; set; } public decimal AccountMinimum { get; set; } } To go into more detail about some of the properties here shouldn't be necessary, but if you have any questions about those I will fill more out for you guys. Now, I'm trying to create code that builds one of these things based on a set of BudgetBuildParameters that the user will create and supply. There are going to be multiple types of these parameters. For example, on the sites homepage, there will be an example section where you can quickly see what your numbers look like, so they would be a much simpler set of SampleBudgetBuildParameters then say after a user registers and wants to create a fully filled out Budget using much more information in the DebtBudgetBuildParameters. Now a lot of these builds are going to be using similar code for certain tasks, but might want to also check the status of a users DebtCollection when formulating a monthly spending report, where as a Budget that only focuses on savings might not want to. I'd like to reduce code duplication (obviously) as much as possible, but in my head, every way I can think to do this would require using a base BudgetBuilderFactory to return the correct builder to the caller, and then creating say a SimpleBudgetBuilder that inherits from a BudgetBuilder, and put all duplicate code in the BudgetBuilder, and let the SimpleBudgetBuilder handle it's own cases. Problem is, a lot of the unique cases are unique to 2/4 builders, so there will be duplicate code somewhere in there obviously if I did that. Can anyone think of a better way to either explain a solution to this that may or may not be similar to mine, or a completely different pattern or way of thinking here? I really appreciate it.

Read the article
ASIHTTPRequest code design

- by nico

I'm using ASIHTTPRequest to communicate with the server asynchronously. It works great, but I'm doing requests in different controllers and now duplicated methods are in all those controllers. What is the best way to abstract that code (requests) in a single class, so I can easily re-use the code, so I can keep the controllers more simple. I can put it in a singleton (or in the app delegate), but I don't think that's a good approach. Or maybe make my own protocol for it with delegate callback. Any advice on a good design approach would be helpful. Thanks.

Read the article
UI Design Help / Advice

- by Greg Andora

Hey everyone, I have a dillema where our client relations department has been brought in for advice on UI and I vehemently disagree with it...even though I don't consider myself a designer at all. While I have been vocal about my disagreement about it, I've been asked to point to design standards to prove that what I'm saying is correct and that the guys in Client Relations are flat out wrong. A mockup is below, I'm trying to argue that the icons of the airplane, boat, and couch (ya, I didn't choose those either) belong in the header of the page (same area as the logo) and not in the content area of the page. Can anybody please help me by pointing me to something that helps prove my point? Thanks a lot, Greg Andora

Read the article
Database design suggestions for a configurable product eshop

- by solomongaby

Hello, I am biulding an e-shop that will have configurable products. The configurable parts will need to have different prices and stocks from the main product. What database design would be best in this case? I started with something like this. Features id name Features Options id id_feature value Products id name price Products Features id id_product id_feature value ( save the value from the feature-options for ease in search ) configurable (yes, no) The problem is that now I am stuck on how to save the configurable product features. I was thinking of saving their value as a json. But that will make saving price modification for a certain option difficult. How would you go about this ? Thank you.

Read the article
Best approach to design a service oriented system

- by Gustavo Paulillo

Thinking about service orientation, our team are involved on new application designs. We consist in a group of 4 developers and a manager (that knows something about programming and distributed systems). Each one, having own opinion on service design. It consists in a distributed system: a user interface (web app) accessing the services in a dedicated server (inside the firewall), to obtain the business logic operations. So we got 2 main approachs that I list above : Modular services Having many modules, each one consisting of a service (WCF). Example: namespaces SystemX.DebtService, SystemX.CreditService, SystemX.SimulatorService Unique service All the business logic is centralized in a unique service. Example: SystemX.OperationService. The web app calls the same service for all operations. In your opinion, whats the best? Or having another approach is better for this scenario?

Read the article
Design Pattern for Server Emulator

- by adisembiring

I wanna build server socket emulator, but I want implement some design pattern there. I will described my case study that I have simplified like these: My Server Socket will always listen client socket. While some request message come from the client socket, the server emulator will response the client through the socket. the response is response code. '00' will describe request message processed successfully, and another response code expect '00' will describe there are some error while processing the message request. IN the server there are some UI, this UI contain check response parameter such as. response code timeout interval While the server want to response the client message, the response code taken from input parameter response form UI check the timeout interval, it will create sleep thread and the interval taken from timeout interval input from UI. I have implement the function, but I create it in one class. I feel it so sucks. Can you suggest me what class / interface that I must create to refactor my code.

Read the article
Project design / FS layout for large django projects

- by rcreswick

What is the best way to layout a large django project? The tutuorials provide simple instructions for setting up apps, models, and views, but there is less information about how apps and projects should be broken down, how much sharing is allowable/necessary between apps in a typical project (obviously that is largely dependent on the project) and how/where general templates should be kept. Does anyone have examples, suggestions, and explanations as to why a certain project layout is better than another? I am particularly interested in the incorporation of large numbers of unit tests (2-5x the size of the actual code base) and string externalization / templates.

Read the article
Server Emulator Design Pattern

- by adisembiring

I wanna build server socket emulator, but I want implement some design pattern there. I will described my case study that I have simplified like these: My Server Socket will always listen client socket. While some request message come from the client socket, the server emulator will response the client through the socket. the response is response code. '00' will describe request message processed successfully, and another response code expect '00' will describe there are some error while processing the message request. IN the server there are some UI, this UI contain check response parameter such as. response code timeout interval While the server want to response the client message, the response code taken from input parameter response form UI check the timeout interval, it will create sleep thread and the interval taken from timeout interval input from UI. I have implement the function, but I create it in one class. I feel it so sucks. Can you suggest me what class / interface that I must create to refactor my code.

Read the article
Java Program Design Layout Recommendations?

- by Leebuntu

I've learned enough to begin writing programs from scratch, but I'm running into the problem of not knowing how to design the layout and implementation of a program. To be more precise, I'm having difficulty finding a good way to come up with an action plan before I dive in to the programming part. I really want to know what classes, methods, and objects I would need beforehand instead of just adding them along the way. My intuition is leading me to using some kind of charting software that gives a hierarchal view of all the classes and methods. I've been using OmniGraffle Pro and while it does seem to work somewhat, I'm still having trouble planning out the program in its entirety. How should I approach this problem? What softwares out there are available to help with this problem? Any good reads out there on this issue? Thanks so much! Edit: Oh yeah, I'm using Eclipse and I code mainly in Java right now.

Read the article
Windows Services -- High availability scenarios and design approach

- by Vadi

Let's say I have a standalone windows service running in a windows server machine. How to make sure it is highly available? 1). What are all the design level guidelines that you can propose? 2). How to make it highly available like primary/secondary, eg., the clustering solutions currently available in the market 3). How to deal with cross-cutting concerns in case any fail-over scenarios If any other you can think of please add it here .. Note: The question is only related to windows and windows services, please try to obey this rule :)

Read the article
design patterns for hierarchical structures

- by JLBarros

Anyone knows some design patterns for hierarchical structures? For example, to manage inventory categories, accounting chart of accounts, divisions of human resources, etc.. Thank you very much in advance EDIT: Thanks for your interest. I am looking for a better way of dealing with hierarchical items to which they should apply operations depending on the level of hierarchy. I have been studying the patterns by Martin Fowler, for example Accounting, but I wonder if there are other more generic. The problem is that operations apply to the items must be possible to change even at run time and may depend on other external variables. I thought of a kind of strategy pattern but would like to combine it with the fact that it is a hierarchical scheme. I would appreciate any reference to hierarchical patterns and you'll take care of them in depth.

Read the article
'is instanceof' Interface bad design

- by peterRit

Say I have a class A class A { Z source; } Now, the context tells me that 'Z' can be an instance of different classes (say, B and C) which doesn't share any common class in their inheritance tree. I guess the naive approach is to make 'Z' an Interface class, and make classes B and C implement it. But something still doesn't convince me because every time an instance of class A is used, I need to know the type of 'source'. So all finishes in multiple 'ifs' making 'is instanceof' which doesn't sound quite nice. Maybe in the future some other class implements Z, and having hardcoded 'ifs' of this type definitely could break something. The escence of the problem is that I cannot resolve the issue by adding functions to Z, because the work done in each instance type of Z is different. I hope someone can give me and advice, maybe about some useful design pattern. Thanks

Read the article
CSS LESS class inheritance

- by Haradzieniec

Here is a css description of properties for my #myform1 .btn1 class: #myform1 .btn1 { ... } #myform1 .btn1:hover { ... } #myform1 .btn1.active { ... } #myform1 .btn1.disabled { ... } Is it possible to add absolutely the same properties for my #myform2 .btn2 class using LESS (any way is OK) without writing #myform1 .btn1 ,#myform2 .btn2 { ... } #myform1 .btn1:hover, #myform2 .btn2:hover { ... } #myform1 .btn1.active, #myform2 .btn2.active { ... } #myform1 .btn1.disabled, #myform2 .btn2.disabled { ... } Is it possible?

Read the article
The Template Method Design Pattern using C# .Net

- by nijhawan.saurabh

First of all I'll just put this pattern in context and describe its intent as in the GOF book: Template Method: Define the skeleton of an algorithm in an operation, deferring some steps to Subclasses. Template Method lets subclasses redefine certain steps of an algorithm without changing the Algorithm's Structure. Usage: When you are certain about the High Level steps involved in an Algorithm/Work flow you can use the Template Pattern which allows the Base Class to define the Sequence of the Steps but permits the Sub classes to alter the implementation of any/all steps. Example in the .Net framework: The most common example is the Asp.Net Page Life Cycle. The Page Life Cycle has a few methods which are called in a sequence but we have the liberty to modify the functionality of any of the methods by overriding them. Sample implementation of Template Method Pattern: Let's see the class diagram first: Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:8.0pt; mso-para-margin-left:0in; line-height:107%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt; mso-ligatures:standard;} And here goes the code:EmailBase.cs 1 using System; 2 using System.Collections.Generic; 3 using System.Linq; 4 using System.Text; 5 using System.Threading.Tasks; 6 7 namespace TemplateMethod 8 { 9 public abstract class EmailBase 10 { 11 12 public bool SendEmail() 13 { 14 if (CheckEmailAddress() == true) // Method1 in the sequence 15 { 16 if (ValidateMessage() == true) // Method2 in the sequence 17 { 18 if (SendMail() == true) // Method3 in the sequence 19 { 20 return true; 21 } 22 else 23 { 24 return false; 25 } 26 27 } 28 else 29 { 30 return false; 31 } 32 33 } 34 else 35 { 36 return false; 37 38 } 39 40 41 } 42 43 protected abstract bool CheckEmailAddress(); 44 protected abstract bool ValidateMessage(); 45 protected abstract bool SendMail(); 46 47 48 } 49 } 50 EmailYahoo.cs 1 using System; 2 using System.Collections.Generic; 3 using System.Linq; 4 using System.Text; 5 using System.Threading.Tasks; 6 7 namespace TemplateMethod 8 { 9 public class EmailYahoo:EmailBase 10 { 11 12 protected override bool CheckEmailAddress() 13 { 14 Console.WriteLine("Checking Email Address : YahooEmail"); 15 return true; 16 } 17 protected override bool ValidateMessage() 18 { 19 Console.WriteLine("Validating Email Message : YahooEmail"); 20 return true; 21 } 22 23 24 protected override bool SendMail() 25 { 26 Console.WriteLine("Semding Email : YahooEmail"); 27 return true; 28 } 29 30 31 } 32 } 33 EmailGoogle.cs 1 using System; 2 using System.Collections.Generic; 3 using System.Linq; 4 using System.Text; 5 using System.Threading.Tasks; 6 7 namespace TemplateMethod 8 { 9 public class EmailGoogle:EmailBase 10 { 11 12 protected override bool CheckEmailAddress() 13 { 14 Console.WriteLine("Checking Email Address : GoogleEmail"); 15 return true; 16 } 17 protected override bool ValidateMessage() 18 { 19 Console.WriteLine("Validating Email Message : GoogleEmail"); 20 return true; 21 } 22 23 24 protected override bool SendMail() 25 { 26 Console.WriteLine("Semding Email : GoogleEmail"); 27 return true; 28 } 29 30 31 } 32 } 33 Program.cs 1 using System; 2 using System.Collections.Generic; 3 using System.Linq; 4 using System.Text; 5 using System.Threading.Tasks; 6 7 namespace TemplateMethod 8 { 9 class Program 10 { 11 static void Main(string[] args) 12 { 13 Console.WriteLine("Please choose an Email Account to send an Email:"); 14 Console.WriteLine("Choose 1 for Google"); 15 Console.WriteLine("Choose 2 for Yahoo"); 16 string choice = Console.ReadLine(); 17 18 if (choice == "1") 19 { 20 EmailBase email = new EmailGoogle(); // Rather than newing it up here, you may use a factory to do so. 21 email.SendEmail(); 22 23 } 24 if (choice == "2") 25 { 26 EmailBase email = new EmailYahoo(); // Rather than newing it up here, you may use a factory to do so. 27 email.SendEmail(); 28 } 29 } 30 } 31 } 32 Final Words: It's very obvious that why the Template Method Pattern is a popular pattern, everything at last revolves around Algorithms and if you are clear with the steps involved it makes real sense to delegate the duty of implementing the step's functionality to the sub classes. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:8.0pt; mso-para-margin-left:0in; line-height:107%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-font-kerning:1.0pt; mso-ligatures:standard;}

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
Fastest way to document software architecture and design

- by Karsten

We are a small team of 5 developers and I'm looking for some great advices about how to document the software architecture and design. I'm going for the sweet spot, where the time invested pays off. I don't want to use more time documenting than necessary. I'll quickly give you my thoughts. What are the diagrams I should made? I'm thinking an overall diagram showing the various applications and services. And then some sequence diagrams showing the most important or complicated processes. About the code it self, I really don't see much value in describing or making diagrams for the code outside the .cs files them self. About text documents, I'm a bit uncertain about when to put down on paper. Most developers don't like to either write or read long documents.

Read the article
SQL In The City Charlotte - Fundamentals of Database Design

- by drsql

Next Monday, October 14, at Red-Gate's SQL In The City conference in Charlotte, NC (one day before PASS), I will be presenting my Fundamentals of Database Design session. It is my big-time chestnut session, the one that I do the most and have the most fun with. This will be the "single" version of the session, weighing in at just under an hour, and it is a lot of material to go over (even with no code samples to go awry to take up time.) In this hour long session (presented in widescreen...(read more)

Read the article
Book recommend: Start learning web design with css with basic HTML knowledge

- by Minh Hieu

I've already known some HTML, tables, link, image,...etc but just at a basic level. Now I want to learn how to build a layout for a website and design also. I want to start building a layout right a way and just learning from it, not really like reading so much theories, explanations. Many books are so verbose, they teach from the beginning of HTML or explain things too much. I don't want to waste my time. So are there any good books for me?

Read the article
Fastest way to document software architecture and design

- by Karsten

We are a small team of 5 developers and I'm looking for some great advices about how to document the software architecture and design. I'm going for the sweet spot, where the time invested pays off. I don't want to use more time documenting than necessary. I'll quickly give you my thoughts. What are the diagrams I should made? I'm thinking an overall diagram showing the various applications and services. And then some sequence diagrams showing the most important or complicated processes. About the code it self, I really don't see much value in describing or making diagrams for the code outside the .cs files them self. About text documents, I'm a bit uncertain about when to put down on paper. Most developers don't like to either write or read long documents.

Read the article
Software Design for Product Verticals and Service Verticals

- by Rachel

In every industry there are two verticals Product Vertical and Service Vertical, so my question is: How does design approach changes while designing Software for Product Vertical as compared to developing Software for Service Vertical ? What are the pros and cons for each case ? Also, in case of Product Vertical, How you go about designing Product or Features and what are steps involved ? Lastly, I was reading How Facebook Ships Code article and it appears that Product Managers have very little influence on how Product is developed and responsibility lies mainly with the Developer for the feature. So is this good practice and why one would go for this approach ? What would be your comment on this kind of approach ?

Read the article

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >