Search Results

Search found 33242 results on 1330 pages for 'database optimization'.

Page 118/1330 | < Previous Page | 114 115 116 117 118 119 120 121 122 123 124 125  | Next Page >

  • Should I Split Tables Relevant to X Module Into Different DB? Mysql

    - by Michael Robinson
    I've inherited a rather large and somewhat messy codebase, and have been tasked with making it faster, less noodly and generally better. Currently we use one big database to hold all data for all aspects of the site. As we need to plan for significant growth in the future, I'm considering splitting tables relevant to specific sections of the site into different databases, so if/when one gets too large for one server I can more easily migrate some user data to different mysql servers while retaining overall integrity. I would still need to use joins on some tables across the new databases. Is this a normal thing to do? Would I incur a performance hit because of this?

    Read the article

  • How to find foreign-key dependencies pointing to one record in Oracle?

    - by daveslab
    Hi folks, I have a very large Oracle database, with many many tables and millions of rows. I need to delete one of them, but want to make sure that dropping it will not break any other dependent rows that point to it as a foreign key record. Is there a way to get a list of all the other records, or at least table schemas, that point to this row? I know that I could just try to delete it myself, and catch the exception, but I won't be running the script myself and need it to run clean the first time through. I have the tools SQL Developer from Oracle, and PL/SQL Developer from AllRoundAutomations at my disposal. Thanks in advance!

    Read the article

  • Options for storing large text blobs in/with an SQL database?

    - by kdt
    Hi, I have some large volumes of text (log files) which may be very large (up to gigabytes). They are associated with entities which I'm storing in a database, and I'm trying to figure out whether I should store them within the SQL database, or in external files. It seems like in-database storage may be limited to 4GB for LONGTEXT fields in MySQL, and presumably other DBs have similar limits. Also, storing in the database presumably precludes any kind of seeking when viewing this data -- I'd have to load the full length of the data to render any part of it, right? So it seems like I'm leaning towards storing this data out-of-DB: are my misgivings about storing large blobs in the database valid, and if I'm going to store them out of the database then are there any frameworks/libraries to help with that? (I'm working in python but am interested in technologies in other languages too)

    Read the article

  • Need a field / flag / status number for mutliple use ?

    - by Jules
    I want to create a field in my database which will be easy to query. I think if I give a bit of background this will make more sense. My table has listings shown on my website. I run a program which looks at the listings a decides whether to hide them from being shown on the site. I also hide listings manually for various reasons. I want to store these reasons in a field, so more than one reason could be made for hiding. So I need some form of logic to determine which reasons have been used. Can anyone offer me any guidance on what will be future-proof aka new reasons and what will be quick and easy to query upon ?

    Read the article

  • how to design a db like Facebook where users can update their status and of the fb page as admin

    - by Harsha M V
    i am designing a database where users can update status messages of theirs and they can create pages groups like facebook fan page and post status like the admin of the page and not as a user. user(id, name..) group(id, name...) group_admin(group_id, user_id) this is my set up. Is this the way to do it. How to post under the group as an admin. will i need to make a check to every user if he is the admin or not ?

    Read the article

  • Three customer addresses in one table or in separate tables?

    - by DR
    In my application I have a Customer class and an Address class. The Customer class has three instances of the Address class: customerAddress, deliveryAddress, invoiceAddress. Whats the best way to reflect this structure in a database? The straightforward way would be a customer table and a separate address table. A more denormalized way would be just a customer table with columns for every address (Example for "street": customer_street, delivery_street, invoice_street) What are your experiences with that? Are there any advantages and disadvantages of these approaches?

    Read the article

  • How the "migrations" approach makes database continuous integration possible

    - by David Atkinson
    Testing a database upgrade script as part of a continuous integration process will only work if there is an easy way to automate the generation of the upgrade scripts. There are two common approaches to managing upgrade scripts. The first is to maintain a set of scripts as-you-go-along. Many SQL developers I've encountered will store these in a folder prefixed numerically to ensure they are ordered as they are intended to be run. Occasionally there is an accompanying document or a batch file that ensures that the scripts are run in the defined order. Writing these scripts during the course of development requires discipline. It's all too easy to load up the table designer and to make a change directly to the development database, rather than to save off the ALTER statement that is required when the same change is made to production. This discipline can add considerable overhead to the development process. However, come the end of the project, everything is ready for final testing and deployment. The second development paradigm is to not do the above. Changes are made to the development database without considering the incremental update scripts required to effect the changes. At the end of the project, the SQL developer or DBA, is tasked to work out what changes have been made, and to hand-craft the upgrade scripts retrospectively. The end of the project is the wrong time to be doing this, as the pressure is mounting to ship the product. And where data deployment is involved, it is prudent not to feel rushed. Schema comparison tools such as SQL Compare have made this latter technique more bearable. These tools work by analyzing the before and after states of a database schema, and calculating the SQL required to transition the database. Problem solved? Not entirely. Schema comparison tools are huge time savers, but they have their limitations. There are certain changes that can be made to a database that can't be determined purely from observing the static schema states. If a column is split, how do we determine the algorithm required to copy the data into the new columns? If a NOT NULL column is added without a default, how do we populate the new field for existing records in the target? If we rename a table, how do we know we've done a rename, as we could equally have dropped a table and created a new one? All the above are examples of situations where developer intent is required to supplement the script generation engine. SQL Source Control 3 and SQL Compare 10 introduced a new feature, migration scripts, allowing developers to add custom scripts to replace the default script generation behavior. These scripts are committed to source control alongside the schema changes, and are associated with one or more changesets. Before this capability was introduced, any schema change that required additional developer intent would break any attempt at auto-generation of the upgrade script, rendering deployment testing as part of continuous integration useless. SQL Compare will now generate upgrade scripts not only using its diffing engine, but also using the knowledge supplied by developers in the guise of migration scripts. In future posts I will describe the necessary command line syntax to leverage this feature as part of an automated build process such as continuous integration.

    Read the article

  • How can I store large amount of data from a database to XML (speed problem, part three)?

    - by Andrija
    After getting some responses, the current situation is that I'm using this tip: http://www.ibm.com/developerworks/xml/library/x-tipbigdoc5.html (Listing 1. Turning ResultSets into XML), and XMLWriter for Java from http://www.megginson.com/downloads/ . Basically, it reads date from the database and writes them to a file as characters, using column names to create opening and closing tags. While doing so, I need to make two changes to the input stream, namely to the dates and numbers. // Iterate over the set while (rs.next()) { w.startElement("row"); for (int i = 0; i < count; i++) { Object ob = rs.getObject(i + 1); if (rs.wasNull()) { ob = null; } String colName = meta.getColumnLabel(i + 1); if (ob != null ) { if (ob instanceof Timestamp) { w.dataElement(colName, Util.formatDate((Timestamp)ob, dateFormat)); } else if (ob instanceof BigDecimal){ w.dataElement(colName, Util.transformToHTML(new Integer(((BigDecimal)ob).intValue()))); } else { w.dataElement(colName, ob.toString()); } } else { w.emptyElement(colName); } } w.endElement("row"); } The SQL that gets the results has the to_number command (e.g. to_number(sif.ID) ID ) and the to_date command (e.g. TO_DATE (sif.datum_do, 'DD.MM.RRRR') datum_do). The problems are that the returning date is a timestamp, meaning I don't get 14.02.2010 but rather 14.02.2010 00:00:000 so I have to format it to the dd.mm.yyyy format. The second problem are the numbers; for some reason, they are in database as varchar2 and can have leading zeroes that need to be stripped; I'm guessing I could do that in my SQL with the trim function so the Util.transformToHTML is unnecessary (for clarification, here's the method): public static String transformToHTML(Integer number) { String result = ""; try { result = number.toString(); } catch (Exception e) {} return result; } What I'd like to know is a) Can I get the date in the format I want and skip additional processing thus shortening the processing time? b) Is there a better way to do this? We're talking about XML files that are in the 50 MB - 250 MB filesize category.

    Read the article

  • Best pattern for storing (product) attributes in SQL Server

    - by EdH
    We are starting a new project where we need to store product and many product attributes in a database. The technology stack is MS SQL 2008 and Entity Framework 4.0 / LINQ for data access. The products (and Products Table) are pretty straightforward (a SKU, manufacturer, price, etc..). However there are also many attributes to store with each product (think industrial widgets). These may range from color to certification(s) to pipe size. Every product may have different attributes, and some may have multiples of the same attribute (Ex: Certifications). The current proposal is that we will basically have a name/value pair table with a FK back to the product ID in each row. An example of the attributes Table may look like this: ProdID AttributeName AttributeValue 123 Color Blue 123 FittingSize 1.25 123 Certification AS1111 123 Certification EE2212 123 Certification FM.3 456 Pipe 11 678 Color Red 999 Certification AE1111 ... Note: Attribute name would likely come from a lookup table or enum. So the main question here is: Is this the best pattern for doing something like this? How will the performance be? Queries will be based on a JOIN of the product and attributes table, and generally need many WHEREs to filter on specific attributes - the most common search will be to find a product based on a set of known/desired attributes. If anyone has any suggestions or a better pattern for this type of data, please let me know. Thanks! -Ed

    Read the article

  • Struggling with a data modeling problem

    - by rpat
    I am struggling with a data model (I use MySQL for the database). I am uneasy about what I have come up with. If someone could suggest a better approach, or point me to some reference matter I would appreciate it. The data would have organizations of many types. I am trying to do a 3 level classification (Class, Category, Type). Say if I have 'Italian Restaurant', it will have the following classification Food Services Restaurants Italian However, an organization may belong to multiple groups. A restaurant may also serve Chinese and Italian. So it will fit into 2 classifications Food Services Restaurants Italian Food Services Restaurants Chinese The classification reference tables would be like the following: ORG_CLASS (RowId, ClassCode, ClassName) 1, FOOD, Food Services ORG_CATEGORY(RowId, ClassCode, CategoryCode, CategoryName) 1, FOOD, REST, Restaurants ORG_TYPE (RowId, ClassCode, CategoryCode, TypeCode, TypeName) 100, FOOD, REST, ITAL, Italian 101, FOOD, REST, CHIN, Chinese 102, FOOD, REST, SPAN, Spanish 103, FOOD, REST, MEXI, Mexican 104, FOOD, REST, FREN, French 105, FOOD, REST, MIDL, Middle Eastern The actual data tables would be like the following: I will allow an organization a max of 3 classifications. I will have 3 GroupIds each pointing to a row in ORG_TYPE. So I have my ORGANIZATION_TABLE ORGANIZATION_TABLE (OrgGroupId1, OrgGroupId2, OrgGroupId3, OrgName, OrgAddres) 100,103,NULL,MyRestaurant1, MyAddr1 100,102,NULL,MyRestaurant2, MyAddr2 100,104,105, MyRestaurant3, MyAddr3 During data add, a dialog could let the user choose the clssa, category, type and the corresponding GroupId could be populated with the rowid from the ORG_TYPE table. During Search, If all three classification are chosen, It will be more specific. For example, if Food Services Restaurants Italian is the criteria, the where clause would be 'where OrgGroupId1 = 100' If only 2 levels are chosen Food Services Restaurants I have to do 'where OrgGroupId1 in (100,101,102,103,104,105, .....)' - There could be a hundred in that list I will disallow class level search. That is I will force selection of a class and category The Ids would be integers. I am trying to see performance issues and other issues. Overall, would this work? or I need to throw this out and start from scratch.

    Read the article

  • Graph-structured databases and Php

    - by stagas
    I want to use a graph database using php. Can you point out some resources on where to get started? Is there any example code / tutorial out there? Or are there any other methods of storing data that relate to each other in totally random/abstract situations? - Very abstract example of the relations needed: John relates to Mary, both relate to School, John is Tall, Mary is Short, John has Blue Eyes, Mary has Green Eyes, query I want is which people are related to 'Short people that have Green Eyes and go to School' - answer John - Another example: TrackA -> ArtistA -> ArtistB -> AlbumA -----> [ label ] -> AlbumB -----> [ A ] -> TrackA:Remix -> Genre:House -> [ Album ] -----> [ label ] TrackB -> [ C ] [ B ] Example queries: Which Genre is TrackB closer to? answer: House - because it's related to Album C, which is related to TrackA and is related to Genre:House Get all Genre:House related albums of Label A : result: AlbumA, AlbumB - because they both have TrackA which is related to Genre:House - It is possible in MySQL but it would require a fixed set of attributes/columns for each item and a complex non-flexible query, instead I need every attribute to be an item by itself and instead of 'belonging' to something, to be 'related' to something.

    Read the article

  • Building a News-feed that comprises posts "created by user's connections" && "on the topics user is following"

    - by aklin81
    I am working on a project of Questions & Answers website that allows a user to follow questions on certain topics from his network. A user's news-feed wall comprises of only those questions that have been posted by his connections and tagged on the topics that he is following(his expertise topics). I am confused what database's datamodel would be most fitting for such an application. The project needs to consider the future provisions for scalability and high performance issues. I have been looking at Cassandra and MySQL solutions as of now. After my study of Cassandra I realized that Simple news-feed design that shows all the posts from network would be easy to design using Cassandra by executing fast writes to all followers of a user about the post from user. But for my kind of application where there is an additional filter of 'followed topics', (ie, the user receives posts "created by his network" && "on topics user is following"), I could not convince myself with a good schema design in Cassandra. I hope if I missed something because of my short understanding of cassandra, perhaps, can you please help me out with your suggestions of how this news-feed could be implemented in Cassandra ? Looking for a great project with Cassandra ! Edit: There are going to be maximum 5 tags allowed for tagging the question (ie, max 5 topics can be tagged on a question).

    Read the article

  • Modifying my website to allow anonymous comments

    - by David
    I write the code for my own website as an educational/fun exercise. Right now part of the website is a blog (like every other site out there :-/) which supports the usual basic blog features, including commenting on posts. But I only have comments enabled for logged-in users; I want to alter the code to allow anonymous comments - that is, I want to allow people to post comments without first creating a user account on my site, although there will still be some sort of authentication involved to prevent spam. Question: what information should I save for anonymous comments? I'm thinking at least display name and email address (for displaying a Gravatar), and probably website URL because I eventually want to accept OpenID as well, but would anything else make sense? Other question: how should I modify the database to store this information? The schema I have for the comment table is currently comment_id smallint(5) // The unique comment ID post_id smallint(5) // The ID of the post the comment was made on user_id smallint(5) // The ID of the user account who made the comment comment_subject varchar(128) comment_date timestamp comment_text text Should I add additional fields for name, email address, etc. to the comment table? (seems like a bad idea) Create a new "anonymous users" table? (and if so, how to keep anonymous user ids from conflicting with regular user ids) Or create fake user accounts for anonymous users in my existing users table? Part of what's making this tricky is that if someone tries to post an anonymous comment using an email address (or OpenID) that's already associated with an account on my site, I'd like to catch that and prompt them to log in.

    Read the article

  • Schema design: many to many plus additional one to many

    - by chrisj
    Hi, I have this scenario and I'm not sure exactly how it should be modeled in the database. The objects I'm trying to model are: teams, players, the team-player membership, and a list of fees due for each player on a given team. So, the fees depend on both the team and the player. So, my current approach is the following: **teams** id name **players** id name **team_players** id player_id team_id **team_player_fees** id team_players_id amount send_reminder_on Schema layout ERD In this schema, team_players is the junction table for teams and players. And the table team_player_fees has records that belong to records to the junction table. For example, playerA is on teamA and has the fees of $10 and $20 due in Aug and Feb. PlayerA is also on teamB and has the fees of $25 and $25 due in May and June. Each player/team combination can have a different set of fees. Questions: Are there better ways to handle such a scenario? Is there a term for this type of relationship? (so I can google it) Or know of any references with similar structures?

    Read the article

  • smallest mysql type that accomodates single decimal

    - by donpal
    Database newbie here. I'm setting up a mysql table. One of the fields will accept a value in increment of a 0.5. e.g. 0.5, 1.0, 1.5, 2.0, .... 200.5, etc. I've tried int but it doesn't capture the decimals. `value` int(10), What would be the smallest type that can accommodate this value, considering it's only a single decimal. I also was considering that because the decimal will always be 0.5 if at all, I could store it in a separate boolean field? So I would have 2 fields instead. Is this a stupid or somewhat over complicated idea? I don't know if it really saves me any memory, and it might get slower now that I'm accessing 2 fields instead of 1 `value` int(10), `half` bool, //or something similar to boolean What are your suggestions guys? Is the first option better, and what's the smallest data type in that case that would get me the 0.5?

    Read the article

  • Joining Tables Based on Foreign Keys

    - by maestrojed
    I have a table that has a lot of fields that are foreign keys referencing a related table. I am writing a script in PHP that will do the db queries. When I query this table for its data I need to know the values associated with these keys not the key. How do most people go about this? A 101 way to do this would be to query this table for its data including the foreign keys and then query the related tables to get each key's value. This could be a lot of queries (~10). Question 1: I think I could write 1 query with a bunch of joins. Would that be better? This approach also requires the querying script to know which table fields are foreign keys. Since I have many tables like this but all with different fields, this means writing nice generic functions is hard. MySQL InnoDB tables allow for foreign constraints. I know the database has these set up correctly. Question 2: What about the idea of querying the table and identifying what the constraints are and then matching them up using whatever process I decide on from Question 1. I like this idea but never see it being used in code. Makes me think its not a good idea for some reason. I would use something like SHOW CREATE TABLE tbl_name; to find what constraints/relationships exist for that table. Thank you for any suggestions or advice.

    Read the article

  • Help a CRUD programmer think about an "approval workflow"

    - by gerdemb
    I've been working on a web application that is basically a CRUD application (Create, Read, Update, Delete). Recently, I've started working on what I'm calling an "approval workflow". Basically, a request is generated for a material and then sent for approval to a manager. Depending on what is requested, different people need to approve the request or perhaps send it back to the requester for modification. The approvers need to keep track of what to approve what has been approved and the requesters need to see the status of their requests. As a "CRUD" developer, I'm having a hard-time wrapping my head around how to design this. What database tables should I have? How do I keep track of the state of the request? How should I notify users of actions that have happened to their requests? Is their a design pattern that could help me with this? Should I be drawing state-machines in my code? I think this is a generic programing question, but if it makes any difference I'm using Django with MySQL.

    Read the article

  • Delivering activity feed items in a moderately scalable way

    - by sotangochips
    The application I'm working on has an activity feed where each user can see their friends' activity (much like Facebook). I'm looking for a moderately scalable way to show a given users' activity stream on the fly. I say 'moderately' because I'm looking to do this with just a database (Postgresql) and maybe memcached. For instance, I want this solution to scale to 200k users each with 100 friends. Currently, there is a master activity table that stores the rendered html for the given activity (Jim added a friend, George installed an application, etc.). This master activity table keeps the source user, the html, and a timestamp. Then, there's a separate ('join') table that simply keeps a pointer to the person who should see this activity in their friend feed, and a pointer to the object in the main activity table. So, if I have 100 friends, and I do 3 activities, then the join table will then grow to 300 items. Clearly this table will grow very quickly. It has the nice property, though, that fetching activity to show to a user takes a single (relatively) inexpensive query. The other option is to just keep the main activity table and query it by saying something like: select * from activity where source_user in (1, 2, 44, 2423, ... my friend list) This has the disadvantage that you're querying for users who may never be active, and as your friend list grows, this query can get slower and slower. I see the pros and the cons of both sides, but I'm wondering if some SO folks might help me weigh the options and suggest one way or they other. I'm also open to other solutions, though I'd like to keep it simple and not install something like CouchDB, etc. Many thanks!

    Read the article

  • If we make a number every millisecond, how much data would we have in a day?

    - by Roger Travis
    I'm a bit confused here... I'm being offered to get into a project, where would be an array of certain sensors, that would give off reading every millisecond ( yes, 1000 reading in a second ). Reading would be a 3 or 4 digit number, for example like 818 or 1529. This reading need to be stored in a database on a server and accessed remotely. I never worked with such big amounts of data, what do you think, how much in terms of MBs reading from one sensor for a day would be?... 4(digits)x1000x60x60x24 ... = 345600000 bits ... right ? about 42 MB per day... doesn't seem too bad, right? therefor a DB of, say, 1 GB, would hold 23 days of info from 1 sensor, correct? I understand that MySQL & PHP probably would not be able to handle it... what would you suggest, maybe some aps? azure? oracle? ... Thansk!

    Read the article

  • Having to insert a record, then update the same record warrants 1:1 relationship design?

    - by dianovich
    Let's say an Order has many Line items and we're storing the total cost of an order (based on the sum of prices on order lines) in the orders table. -------------- orders -------------- id ref total_cost -------------- -------------- lines -------------- id order_id price -------------- In a simple application, the order and line are created during the same step of the checkout process. So this means INSERT INTO orders .... -- Get ID of inserted order record INSERT into lines VALUES(null, order_id, ...), ... where we get the order ID after creating the order record. The problem I'm having is trying to figure out the best way to store the total cost of an order. I don't want to have to create an order create lines on an order calculate cost on order based on lines then update record created in 1. in orders table This would mean a nullable total_cost field on orders for starters... My solution thus far is to have an order_totals table with a 1:1 relationship to the orders table. But I think it's redundant. Ideally, since everything required to calculate total costs (lines on an order) is in the database, I would work out the value every time I need it, but this is very expensive. What are your thoughts?

    Read the article

  • guarantee child records either in one table or another, but not both?

    - by user151841
    I have a table with two child tables. For each record in the parent table, I want one and only one record in one of the child tables -- not one in each, not none. How to I define that? Here's the backstory. Feel free to criticize this implementation, but please answer the question above, because this isn't the only time I've encountered it: I have a database that holds data pertaining to user surveys. It was originally designed with one authentication method for starting a survey. Since then, requirements have changed, and now there are two different ways someone could sign on to start a survey. Originally I captured the authentication token in a column in the survey table. Since requirements changed, there are three other bits of data that I want to capture in authentication. So for each record in the survey table, I'm either going to have one token, or a set of three. All four of these are of different types, so my thought was, instead of having four columns where either one is going to be null, or three are going to be null ( or even worse, a bad mashup of either of those scenarios ), I would have two child tables, one for holding the single authentication token, the other for holding the three. Problem is, I don't know offhand how to define that in DDL. I'm using MySQL, so maybe there's a feature that MySQL doesn't implement that lets me do this.

    Read the article

  • SQL SERVER – Copy Data from One Table to Another Table – SQL in Sixty Seconds #031 – Video

    - by pinaldave
    Copy data from one table to another table is one of the most requested questions on forums, Facebook and Twitter. The question has come in many formats and there are places I have seen developers are using cursor instead of this direct method. Earlier I have written the similar article a few years ago - SQL SERVER – Insert Data From One Table to Another Table – INSERT INTO SELECT – SELECT INTO TABLE. The article has been very popular and I have received many interesting and constructive comments. However there were two specific comments keep on ending up on my mailbox. 1) SQL Server AdventureWorks Samples Database does not have table I used in the example 2) If there is a video tutorial of the same example. After carefully thinking I decided to build a new set of the scripts for the example which are very similar to the old one as well video tutorial of the same. There was no better place than our SQL in Sixty Second Series to cover this interesting small concept. Let me know what you think of this video. Here is the updated script. -- Method 1 : INSERT INTO SELECT USE AdventureWorks2012 GO ----Create TestTable CREATE TABLE TestTable (FirstName VARCHAR(100), LastName VARCHAR(100)) ----INSERT INTO TestTable using SELECT INSERT INTO TestTable (FirstName, LastName) SELECT FirstName, LastName FROM Person.Person WHERE EmailPromotion = 2 ----Verify that Data in TestTable SELECT FirstName, LastName FROM TestTable ----Clean Up Database DROP TABLE TestTable GO --------------------------------------------------------- --------------------------------------------------------- -- Method 2 : SELECT INTO USE AdventureWorks2012 GO ----Create new table and insert into table using SELECT INSERT SELECT FirstName, LastName INTO TestTable FROM Person.Person WHERE EmailPromotion = 2 ----Verify that Data in TestTable SELECT FirstName, LastName FROM TestTable ----Clean Up Database DROP TABLE TestTable GO Related Tips in SQL in Sixty Seconds: SQL SERVER – Insert Data From One Table to Another Table – INSERT INTO SELECT – SELECT INTO TABLE Powershell – Importing CSV File Into Database – Video SQL SERVER – 2005 – Export Data From SQL Server 2005 to Microsoft Excel Datasheet SQL SERVER – Import CSV File into Database Table Using SSIS SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server SQL SERVER – 2005 – Generate Script with Data from DatabaseDatabase Publishing Wizard What would you like to see in the next SQL in Sixty Seconds video? Reference: Pinal Dave (http://blog.sqlauthority.com)   Filed under: Database, Pinal Dave, PostADay, SQL, SQL Authority, SQL in Sixty Seconds, SQL Query, SQL Scripts, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL, Technology, Video Tagged: Excel

    Read the article

  • SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

    - by pinaldave
    Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

< Previous Page | 114 115 116 117 118 119 120 121 122 123 124 125  | Next Page >