Search Results

Search found 30 results on 2 pages for 'denormalization'.

Page 2/2 | < Previous Page | 1 2 

  • How Do You Actually Model Data?

    Since the 1970’s Developers, Analysts and DBAs have been able to represent concepts and relations in the form of data through the use of generic symbols.  But what is data modeling?  The first time I actually heard this term I could not understand why anyone would want to display a computer on a fashion show runway. Hey, what do you expect? At that time I was a freshman in community college, and obviously this was a long time ago.  I have since had the chance to learn what data modeling truly is through using it. Data modeling is a process of breaking down information and/or requirements in to common categories called objects. Once objects start being defined then relationships start to form based on dependencies found amongst other existing objects.  Currently, there are several tools on the market that help data designer actually map out objects and their relationships through the use of symbols and lines.  These diagrams allow for designs to be review from several perspectives so that designers can ensure that they have the optimal data design for their project and that the design is flexible enough to allow for potential changes and/or extension in the future. Additionally these basic models can always be further refined to show different levels of details depending on the target audience through the use of three different types of models. Conceptual Data Model(CDM)Conceptual Data Models include all key entities and relationships giving a viewer a high level understanding of attributes. Conceptual data model are created by gathering and analyzing information from various sources pertaining to a project during the typical planning phase of a project. Logical Data Model (LDM)Logical Data Models are conceptual data models that have been expanded to include implementation details pertaining to the data that it will store. Additionally, this model typically represents an origination’s business requirements and business rules by defining various attribute data types and relationships regarding each entity. This additional information can be directly translated to the Physical Data Model which reduces the actual time need to implement it. Physical Data Model(PDMs)Physical Data Model are transformed Logical Data Models that include the necessary tables, columns, relationships, database properties for the creation of a database. This model also allows for considerations regarding performance, indexing and denormalization that are applied through database rules, data integrity. Further expanding on why we actually use models in modern application/database development can be seen in the benefits that data modeling provides for data modelers and projects themselves, Benefits of Data Modeling according to Applied Information Science Abstraction that allows data designers remove concepts and ideas form hard facts in the form of data. This gives the data designers the ability to express general concepts and/or ideas in a generic form through the use of symbols to represent data items and the relationships between the items. Transparency through the use of data models allows complex ideas to be translated in to simple symbols so that the concept can be understood by all viewpoints and limits the amount of confusion and misunderstanding. Effectiveness in regards to tuning a model for acceptable performance while maintaining affordable operational costs. In addition it allows systems to be built on a solid foundation in terms of data. I shudder at the thought of a world without data modeling, think about it? Data is everywhere in our lives. Data modeling allows for optimizing a design for performance and the reduction of duplication. If one was to design a database without data modeling then I would think that the first things to get impacted would be database performance due to poorly designed database and there would be greater chances of unnecessary data duplication that would also play in to the excessive query times because unneeded records would need to be processed. You could say that a data designer designing a database is like a box of chocolates. You will never know what kind of database you will get until after it is built.

    Read the article

  • Relational vs. Dimensional Databases, what's the difference?

    - by grautur
    I'm trying to learn about OLAP and data warehousing, and I'm confused about the difference between relational and dimensional modeling. Is dimensional modeling basically relational modeling, but allowing for redundant/un-normalized data? For example, let's say I have historical sales data on (product, city, # sales). I understand that the following would be a relational point-of-view: Product | City | # Sales Apples, San Francisco, 400 Apples, Boston, 700 Apples, Seattle, 600 Oranges, San Francisco, 550 Oranges, Boston, 500 Oranges, Seattle, 600 While the following is a more dimensional point-of-view: Product | San Francisco | Boston | Seattle Apples, 400, 700, 600 Oranges, 550, 500, 600 But it seems like both points of view would nonetheless be implemented in an identical star schema: Fact table: Product ID, Region ID, # Sales Product dimension: Product ID, Product Name City dimension: City ID, City Name And it's not until you start adding some additional details to each dimension that the differences start popping up. For instance, if you wanted to track regions as well, a relational database would tend to have a separate region table, in order to keep everything normalized: City dimension: City ID, City Name, Region ID Region dimension: Region ID, Region Name, Region Manager, # Regional Stores While a dimensional database would allow for denormalization to keep the region data inside the city dimension, in order to make it easier to slice the data: City dimension: City ID, City Name, Region Name, Region Manager, # Regional Stores Is this correct?

    Read the article

  • Schema-less design guidelines for Google App Engine Datastore and other NoSQL DBs

    - by jamesaharvey
    Coming from a relational database background, as I'm sure many others are, I'm looking for some solid guidelines for setting up / designing my datastore on Google App Engine. Are there any good rules of thumb people have for setting up these kinds of schema-less data stores? I understand some of the basics such as denormalizing since you can't do joins, but I was wondering what other recommendations people had. The particular simple example I am working with concerns storing searches and their results. For example I have the following 2 models defined in my Google App Engine app using Python: class Search(db.Model): who = db.StringProperty() what = db.StringProperty() where = db.StringProperty() createDate = db.DateTimeProperty(auto_now_add=True) class SearchResult(db.Model): title = db.StringProperty() content = db.StringProperty() who = db.StringProperty() what = db.StringProperty() where = db.StringProperty() createDate = db.DateTimeProperty(auto_now_add=True) I'm duplicating a bunch of properties between the models for the sake of denormalization since I can't join Search and SearchResult together. Does this make sense? Or should I store a search ID in the SearchResult model and effectively 'join' the 2 models in code when I retrieve them from the datastore? Please keep in mind that this is a simple example. Both models will have a lot more properties and the way I'm approaching this right now, I would put any property I put in the Search model in the SearchResult model as well.

    Read the article

  • Scalable Database Tagging Schema

    - by Longpoke
    EDIT: To people building tagging systems. Don't read this. It is not what you are looking for. I asked this when I wasn't aware that RDBMS all have their own optimization methods, just use a simple many to many scheme. I have a posting system that has millions of posts. Each post can have an infinite number of tags associated with it. Users can create tags which have notes, date created, owner, etc. A tag is almost like a post itself, because people can post notes about the tag. Each tag association has an owner and date, so we can see who added the tag and when. My question is how can I implement this? It has to be fast searching posts by tag, or tags by post. Also, users can add tags to posts by typing the name into a field, kind of like the google search bar, it has to fill in the rest of the tag name for you. I have 3 solutions at the moment, but not sure which is the best, or if there is a better way. Note that I'm not showing the layout of notes since it will be trivial once I get a proper solution for tags. Method 1. Linked list tagId in post points to a linked list in tag_assoc, the application must traverse the list until flink=0 post: id, content, ownerId, date, tagId, notesId tag_assoc: id, tagId, ownerId, flink tag: id, name, notesId Method 2. Denormalization tags is simply a VARCHAR or TEXT field containing a tab delimited array of tagId:ownerId. It cannot be a fixed size. post: id, content, ownerId, date, tags, notesId tag: id, name, notesId Method 3. Toxi (from: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html, also same thing here: http://stackoverflow.com/questions/20856/how-do-you-recommend-implementing-tags-or-tagging) post: id, content, ownerId, date, notesId tag_assoc: ownerId, tagId, postId tag: id, name, notesId Method 3 raises the question, how fast will it be to iterate through every single row in tag_assoc? Methods 1 and 2 should be fast for returning tags by post, but for posts by tag, another lookup table must be made. The last thing I have to worry about is optimizing searching tags by name, I have not worked that out yet. I made an ASCII diagram here: http://pastebin.com/f1c4e0e53

    Read the article

  • Inheritance Mapping Strategies with Entity Framework Code First CTP5 Part 1: Table per Hierarchy (TPH)

    - by mortezam
    A simple strategy for mapping classes to database tables might be “one table for every entity persistent class.” This approach sounds simple enough and, indeed, works well until we encounter inheritance. Inheritance is such a visible structural mismatch between the object-oriented and relational worlds because object-oriented systems model both “is a” and “has a” relationships. SQL-based models provide only "has a" relationships between entities; SQL database management systems don’t support type inheritance—and even when it’s available, it’s usually proprietary or incomplete. There are three different approaches to representing an inheritance hierarchy: Table per Hierarchy (TPH): Enable polymorphism by denormalizing the SQL schema, and utilize a type discriminator column that holds type information. Table per Type (TPT): Represent "is a" (inheritance) relationships as "has a" (foreign key) relationships. Table per Concrete class (TPC): Discard polymorphism and inheritance relationships completely from the SQL schema.I will explain each of these strategies in a series of posts and this one is dedicated to TPH. In this series we'll deeply dig into each of these strategies and will learn about "why" to choose them as well as "how" to implement them. Hopefully it will give you a better idea about which strategy to choose in a particular scenario. Inheritance Mapping with Entity Framework Code FirstAll of the inheritance mapping strategies that we discuss in this series will be implemented by EF Code First CTP5. The CTP5 build of the new EF Code First library has been released by ADO.NET team earlier this month. EF Code-First enables a pretty powerful code-centric development workflow for working with data. I’m a big fan of the EF Code First approach, and I’m pretty excited about a lot of productivity and power that it brings. When it comes to inheritance mapping, not only Code First fully supports all the strategies but also gives you ultimate flexibility to work with domain models that involves inheritance. The fluent API for inheritance mapping in CTP5 has been improved a lot and now it's more intuitive and concise in compare to CTP4. A Note For Those Who Follow Other Entity Framework ApproachesIf you are following EF's "Database First" or "Model First" approaches, I still recommend to read this series since although the implementation is Code First specific but the explanations around each of the strategies is perfectly applied to all approaches be it Code First or others. A Note For Those Who are New to Entity Framework and Code-FirstIf you choose to learn EF you've chosen well. If you choose to learn EF with Code First you've done even better. To get started, you can find a great walkthrough by Scott Guthrie here and another one by ADO.NET team here. In this post, I assume you already setup your machine to do Code First development and also that you are familiar with Code First fundamentals and basic concepts. You might also want to check out my other posts on EF Code First like Complex Types and Shared Primary Key Associations. A Top Down Development ScenarioThese posts take a top-down approach; it assumes that you’re starting with a domain model and trying to derive a new SQL schema. Therefore, we start with an existing domain model, implement it in C# and then let Code First create the database schema for us. However, the mapping strategies described are just as relevant if you’re working bottom up, starting with existing database tables. I’ll show some tricks along the way that help you dealing with nonperfect table layouts. Let’s start with the mapping of entity inheritance. -- The Domain ModelIn our domain model, we have a BillingDetail base class which is abstract (note the italic font on the UML class diagram below). We do allow various billing types and represent them as subclasses of BillingDetail class. As for now, we support CreditCard and BankAccount: Implement the Object Model with Code First As always, we start with the POCO classes. Note that in our DbContext, I only define one DbSet for the base class which is BillingDetail. Code First will find the other classes in the hierarchy based on Reachability Convention. public abstract class BillingDetail  {     public int BillingDetailId { get; set; }     public string Owner { get; set; }             public string Number { get; set; } } public class BankAccount : BillingDetail {     public string BankName { get; set; }     public string Swift { get; set; } } public class CreditCard : BillingDetail {     public int CardType { get; set; }                     public string ExpiryMonth { get; set; }     public string ExpiryYear { get; set; } } public class InheritanceMappingContext : DbContext {     public DbSet<BillingDetail> BillingDetails { get; set; } } This object model is all that is needed to enable inheritance with Code First. If you put this in your application you would be able to immediately start working with the database and do CRUD operations. Before going into details about how EF Code First maps this object model to the database, we need to learn about one of the core concepts of inheritance mapping: polymorphic and non-polymorphic queries. Polymorphic Queries LINQ to Entities and EntitySQL, as object-oriented query languages, both support polymorphic queries—that is, queries for instances of a class and all instances of its subclasses, respectively. For example, consider the following query: IQueryable<BillingDetail> linqQuery = from b in context.BillingDetails select b; List<BillingDetail> billingDetails = linqQuery.ToList(); Or the same query in EntitySQL: string eSqlQuery = @"SELECT VAlUE b FROM BillingDetails AS b"; ObjectQuery<BillingDetail> objectQuery = ((IObjectContextAdapter)context).ObjectContext                                                                          .CreateQuery<BillingDetail>(eSqlQuery); List<BillingDetail> billingDetails = objectQuery.ToList(); linqQuery and eSqlQuery are both polymorphic and return a list of objects of the type BillingDetail, which is an abstract class but the actual concrete objects in the list are of the subtypes of BillingDetail: CreditCard and BankAccount. Non-polymorphic QueriesAll LINQ to Entities and EntitySQL queries are polymorphic which return not only instances of the specific entity class to which it refers, but all subclasses of that class as well. On the other hand, Non-polymorphic queries are queries whose polymorphism is restricted and only returns instances of a particular subclass. In LINQ to Entities, this can be specified by using OfType<T>() Method. For example, the following query returns only instances of BankAccount: IQueryable<BankAccount> query = from b in context.BillingDetails.OfType<BankAccount>() select b; EntitySQL has OFTYPE operator that does the same thing: string eSqlQuery = @"SELECT VAlUE b FROM OFTYPE(BillingDetails, Model.BankAccount) AS b"; In fact, the above query with OFTYPE operator is a short form of the following query expression that uses TREAT and IS OF operators: string eSqlQuery = @"SELECT VAlUE TREAT(b as Model.BankAccount)                       FROM BillingDetails AS b                       WHERE b IS OF(Model.BankAccount)"; (Note that in the above query, Model.BankAccount is the fully qualified name for BankAccount class. You need to change "Model" with your own namespace name.) Table per Class Hierarchy (TPH)An entire class hierarchy can be mapped to a single table. This table includes columns for all properties of all classes in the hierarchy. The concrete subclass represented by a particular row is identified by the value of a type discriminator column. You don’t have to do anything special in Code First to enable TPH. It's the default inheritance mapping strategy: This mapping strategy is a winner in terms of both performance and simplicity. It’s the best-performing way to represent polymorphism—both polymorphic and nonpolymorphic queries perform well—and it’s even easy to implement by hand. Ad-hoc reporting is possible without complex joins or unions. Schema evolution is straightforward. Discriminator Column As you can see in the DB schema above, Code First has to add a special column to distinguish between persistent classes: the discriminator. This isn’t a property of the persistent class in our object model; it’s used internally by EF Code First. By default, the column name is "Discriminator", and its type is string. The values defaults to the persistent class names —in this case, “BankAccount” or “CreditCard”. EF Code First automatically sets and retrieves the discriminator values. TPH Requires Properties in SubClasses to be Nullable in the Database TPH has one major problem: Columns for properties declared by subclasses will be nullable in the database. For example, Code First created an (INT, NULL) column to map CardType property in CreditCard class. However, in a typical mapping scenario, Code First always creates an (INT, NOT NULL) column in the database for an int property in persistent class. But in this case, since BankAccount instance won’t have a CardType property, the CardType field must be NULL for that row so Code First creates an (INT, NULL) instead. If your subclasses each define several non-nullable properties, the loss of NOT NULL constraints may be a serious problem from the point of view of data integrity. TPH Violates the Third Normal FormAnother important issue is normalization. We’ve created functional dependencies between nonkey columns, violating the third normal form. Basically, the value of Discriminator column determines the corresponding values of the columns that belong to the subclasses (e.g. BankName) but Discriminator is not part of the primary key for the table. As always, denormalization for performance can be misleading, because it sacrifices long-term stability, maintainability, and the integrity of data for immediate gains that may be also achieved by proper optimization of the SQL execution plans (in other words, ask your DBA). Generated SQL QueryLet's take a look at the SQL statements that EF Code First sends to the database when we write queries in LINQ to Entities or EntitySQL. For example, the polymorphic query for BillingDetails that you saw, generates the following SQL statement: SELECT  [Extent1].[Discriminator] AS [Discriminator],  [Extent1].[BillingDetailId] AS [BillingDetailId],  [Extent1].[Owner] AS [Owner],  [Extent1].[Number] AS [Number],  [Extent1].[BankName] AS [BankName],  [Extent1].[Swift] AS [Swift],  [Extent1].[CardType] AS [CardType],  [Extent1].[ExpiryMonth] AS [ExpiryMonth],  [Extent1].[ExpiryYear] AS [ExpiryYear] FROM [dbo].[BillingDetails] AS [Extent1] WHERE [Extent1].[Discriminator] IN ('BankAccount','CreditCard') Or the non-polymorphic query for the BankAccount subclass generates this SQL statement: SELECT  [Extent1].[BillingDetailId] AS [BillingDetailId],  [Extent1].[Owner] AS [Owner],  [Extent1].[Number] AS [Number],  [Extent1].[BankName] AS [BankName],  [Extent1].[Swift] AS [Swift] FROM [dbo].[BillingDetails] AS [Extent1] WHERE [Extent1].[Discriminator] = 'BankAccount' Note how Code First adds a restriction on the discriminator column and also how it only selects those columns that belong to BankAccount entity. Change Discriminator Column Data Type and Values With Fluent API Sometimes, especially in legacy schemas, you need to override the conventions for the discriminator column so that Code First can work with the schema. The following fluent API code will change the discriminator column name to "BillingDetailType" and the values to "BA" and "CC" for BankAccount and CreditCard respectively: protected override void OnModelCreating(System.Data.Entity.ModelConfiguration.ModelBuilder modelBuilder) {     modelBuilder.Entity<BillingDetail>()                 .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue("BA"))                 .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue("CC")); } Also, changing the data type of discriminator column is interesting. In the above code, we passed strings to HasValue method but this method has been defined to accepts a type of object: public void HasValue(object value); Therefore, if for example we pass a value of type int to it then Code First not only use our desired values (i.e. 1 & 2) in the discriminator column but also changes the column type to be (INT, NOT NULL): modelBuilder.Entity<BillingDetail>()             .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue(1))             .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue(2)); SummaryIn this post we learned about Table per Hierarchy as the default mapping strategy in Code First. The disadvantages of the TPH strategy may be too serious for your design—after all, denormalized schemas can become a major burden in the long run. Your DBA may not like it at all. In the next post, we will learn about Table per Type (TPT) strategy that doesn’t expose you to this problem. References ADO.NET team blog Java Persistence with Hibernate book a { text-decoration: none; } a:visited { color: Blue; } .title { padding-bottom: 5px; font-family: Segoe UI; font-size: 11pt; font-weight: bold; padding-top: 15px; } .code, .typeName { font-family: consolas; } .typeName { color: #2b91af; } .padTop5 { padding-top: 5px; } .padTop10 { padding-top: 10px; } p.MsoNormal { margin-top: 0in; margin-right: 0in; margin-bottom: 10.0pt; margin-left: 0in; line-height: 115%; font-size: 11.0pt; font-family: "Calibri" , "sans-serif"; }

    Read the article

< Previous Page | 1 2