Search Results

Search found 4730 results on 190 pages for 'geek tricks'.

Page 49/190 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >

  • SQL SERVER – ?Finding Out What Changed in a Deleted Database – Notes from the Field #041

    - by Pinal Dave
    [Note from Pinal]: This is a 41th episode of Notes from the Field series. The real world is full of challenges. When we are reading theory or book, we sometimes do not realize how real world reacts works and that is why we have the series notes from the field, which is extremely popular with developers and DBA. Let us talk about interesting problem of how to figure out what has changed in the DELETED database. Well, you think I am just throwing the words but in reality this kind of problems are making our DBA’s life interesting and in this blog post we have amazing story from Brian Kelley about the same subject. In this episode of the Notes from the Field series database expert Brian Kelley explains a how to find out what has changed in deleted database. Read the experience of Brian in his own words. Sometimes, one of the hardest questions to answer is, “What changed?” A similar question is, “Did anything change other than what we expected to change?” The First Place to Check – Schema Changes History Report: Pinal has recently written on the Schema Changes History report and its requirement for the Default Trace to be enabled. This is always the first place I look when I am trying to answer these questions. There are a couple of obvious limitations with the Schema Changes History report. First, while it reports what changed, when it changed, and who changed it, other than the base DDL operation (CREATE, ALTER, DELETE), it does not present what the changes actually were. This is not something covered by the default trace. Second, the default trace has a fixed size. When it hits that size, the changes begin to overwrite. As a result, if you wait too long, especially on a busy database server, you may find your changes rolled off. But the Database Has Been Deleted! Pinal cited another issue, and that’s the inability to run the Schema Changes History report if the database has been dropped. Thankfully, all is not lost. One thing to remember is that the Schema Changes History report is ultimately driven by the Default Trace. As you may have guess, it’s a trace, like any other database trace. And the Default Trace does write to disk. The trace files are written to the defined LOG directory for that SQL Server instance and have a prefix of log_: Therefore, you can read the trace files like any other. Tip: Copy the files to a working directory. Otherwise, you may occasionally receive a file in use error. With the Default Trace files, if you ask the question early enough, you can see the information for a deleted database just the same as any other database. Testing with a Deleted Database: Here’s a short script that will create a database, create a schema, create an object, and then drop the database. Without the database, you can’t do a standard Schema Changes History report. CREATE DATABASE DeleteMe; GO USE DeleteMe; GO CREATE SCHEMA Test AUTHORIZATION dbo; GO CREATE TABLE Test.Foo (FooID INT); GO USE MASTER; GO DROP DATABASE DeleteMe; GO This sets up the perfect situation where we can’t retrieve the information using the Schema Changes History report but where it’s still available. Finding the Information: I’ve sorted the columns so I can see the Event Subclass, the Start Time, the Database Name, the Object Name, and the Object Type at the front, but otherwise, I’m just looking at the trace files using SQL Profiler. As you can see, the information is definitely there: Therefore, even in the case of a dropped/deleted database, you can still determine who did what and when. You can even determine who dropped the database (loginame is captured). The key is to get the default trace files in a timely manner in order to extract the information. If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Security, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Backing Up and Recovering the Tail End of a Transaction Log – Notes from the Field #042

    - by Pinal Dave
    [Notes from Pinal]: The biggest challenge which people face is not taking backup, but the biggest challenge is to restore a backup successfully. I have seen so many different examples where users have failed to restore their database because they made some mistake while they take backup and were not aware of the same. Tail Log backup was such an issue in earlier version of SQL Server but in the latest version of SQL Server, Microsoft team has fixed the confusion with additional information on the backup and restore screen itself. Now they have additional information, there are a few more people confused as they have no clue about this. Previously they did not find this as a issue and now they are finding tail log as a new learning. Linchpin People are database coaches and wellness experts for a data driven world. In this 42nd episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple words, Backing Up and Recovering the Tail End of a Transaction Log. Many times when restoring a database over an existing database SQL Server will warn you about needing to make a tail end of the log backup. This might be your reminder that you have to choose to overwrite the database or could be your reminder that you are about to write over and lose any transactions since the last transaction log backup. You might be asking yourself “What is the tail end of the transaction log”. The tail end of the transaction log is simply any committed transactions that have occurred since the last transaction log backup. This is a very crucial part of a recovery strategy if you are lucky enough to be able to capture this part of the log. Most organizations have chosen to accept some amount of data loss. You might be shaking your head at this statement however if your organization is taking transaction logs backup every 15 minutes, then your potential risk of data loss is up to 15 minutes. Depending on the extent of the issue causing you to have to perform a restore, you may or may not have access to the transaction log (LDF) to be able to back up those vital transactions. For example, if the storage array or disk that holds your transaction log file becomes corrupt or damaged then you wouldn’t be able to recover the tail end of the log. If you do have access to the physical log file then you can still back up the tail end of the log. In 2013 I presented a session at the PASS Summit called “The Ultimate Tail Log Backup and Restore” and have been invited back this year to present it again. During this session I demonstrate how you can back up the tail end of the log even after the data file becomes corrupt. In my demonstration I set my database offline and then delete the data file (MDF). The database can’t become more corrupt than that. I attempt to bring the database back online to change the state to RECOVERY PENDING and then backup the tail end of the log. I can do this by specifying WITH NO_TRUNCATE. Using NO_TRUNCATE is equivalent to specifying both COPY_ONLY and CONTINUE_AFTER_ERROR. It as its name says, does not try to truncate the log. This is a great demo however how could I achieve backing up the tail end of the log if the failure destroys my entire instance of SQL and all I had was the LDF file? During my demonstration I also demonstrate that I can attach the log file to a database on another instance and then back up the tail end of the log. If I am performing proper backups then my most recent full, differential and log files should be on a server other than the one that crashed. I am able to achieve this task by creating new database with the same name as the failed database. I then set the database offline, delete my data file and overwrite the log with my good log file. I attempt to bring the database back online and then backup the log with NO_TRUNCATE just like in the first example. I encourage each of you to view my blog post and watch the video demonstration on how to perform these tasks. I really hope that none of you ever have to perform this in production, however it is a really good idea to know how to do this just in case. It really isn’t a matter of “IF” you will have to perform a restore of a production system but more of a “WHEN”. Being able to recover the tail end of the log in these sever cases could be the difference of having to notify all your business customers of data loss or not. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Developer Training Resources and Summary Roundup

    - by pinaldave
    It is always pleasure for any author when other renowned authors in the industry write about you. Earlier I wrote a five part blog series on Developer Training and I have received a phenomenal response to the series. I have received plenty of comments, questions and feedback. I thought it would be nice to sum up the whole series as well answer a few of the questions received. Quick Recap Developer Training - Importance and Significance - Part 1 In this part we discussed the importance of training in the real world. The most important and valuable resource any company is its employee. Employees who have been well-trained will be better at their jobs and produce a better product.  An employee who is well trained obviously knows more about their job and all the technical aspects. I have a very high opinion about training employees and it is the most important task. Developer Training – Employee Morals and Ethics – Part 2 In this part we discussed the most crucial components of training. Often employees are expecting the company to pay for their training and the company expresses no interest in training the employee. Quite often training expenses are the real issue for both the employee and employer. There are companies that pay for 100% of the expenses and there are employees who opt for training on their own expense during their personal time. Training is often looked at as vacation by employee and employers and we need to change this mind-set. One of the ways is to report back the learning to your manager and implement newly learned knowledge in day-to-day work. Developer Training – Difficult Questions and Alternative Perspective - Part 3 This part was the most difficult to write as I tried to address a few difficult questions and answers. Training is such a sensitive issue that many developers when not receiving chance for training think about leaving the organization. The manager often feels pressure to accommodate every single employee for training even though his training budget is limited. It is indeed the responsibility of the developer to get maximum advantage from the training. Training immediately helps organizations but stays as a part of an employee’s knowledge forever. Developer Training – Various Options for Developer Training – Part 4 In this part I tried to explore a few methods and options for training. The generic feedback I received on this blog post was short and I should have explored each of the subject of the training in details. I believe there are two big buckets of training 1) Instructor Lead Training and 2) Self Lead Training. The common element between both the methods is “learning material”. Learning material can be of any format – videos, books, paper notes or just a plain black board. Instructor-led training is a very effective mode but not possible every single time. During the course of the developer’s career, one has to learn lots of new technology and it is almost impossible to have a quality trainer available on that subject at that time. Books are most effective and proven methods, however, it always helps if someone explains the concepts of the book with a demonstration. In recent times I have started to believe in online trainings which leads to a hybrid experience. Online trainings take the best part of the books and the best part of the instructor-led training and gives effective training in a matter of hours. Developer Training – A Conclusive Summary- Part 5 In this part, I shared what I was continuously thinking about developer training. There is no better teacher than oneself. There is no better motivation than a personal desire to learn new technology. Honestly there is nothing more personal learning. That “change is the only constant” and “adapt & overcome” are the essential lessons of life. One cannot stop the learning and resist the change. In the IT industry “ego of knowing all” and the “resistance to change” are the most challenging issues. Once someone overcomes them, life is much easier. I believe that proper and appropriate high quality training can help to address the burning issues. Opinion of Friends I invited a few of my friends to express their opinion about developer training and here are their opinions. I am listing them here in the order of the blog post publishing date. Nakul Vachhrajani - Developer Trainings-Importance, Benefits, Tips and follow-up Nakul’s sums of many of the concepts which are complementary to my blog posts. Nakul addresses the burning question of developer training with different angles. I am personally very impressed by his following statement - “Being skilled does not mean having just a stack of certifications, but it also means having an understanding about the internals of the products that you are working on – and using that knowledge to improve the efficiency & productivity at the workplace in turn resulting in better products, better consulting abilities and a happier self.” Nakul also suggests the online training options of Pluralsight. Vinod Kumar - Training–a necessity or bonus Vinod Kumar comes up with excellent follow up on developer training. Vinod is known for his inspirational writing about SQL Server. Vinod starts with a story of a student who is extremely eager to learn the wisdom of life from a monk but the monk does not accept him as a disciple for a long time. The conversation between student and monk is indeed an essence of all learning. We all want to learn quickly and be successful but the most important thing in life is to have the right attitude towards learning and more so towards life. The blog post end with a very important thought about how to avoid the famous excuse – “I don’t have enough time.” Ritesh Shah - Training – useful or useless? Ritesh brings up very important concept related to training. Ritesh in his meticulous style explains why training is an important and lifelong process. Training must not stop at any age but should continue forever. The moment training stops, progress stops along with. Paras Doshi - Professional Development Resource Paras is known for his to–the-point writing, and has summarized the five part series very precisely. He read the five part series and created a digest summary of the blog post. If you are in a rush and have no time to read my five series – I suggest you read his blog post. Training Resources I am often asked what the best resources for learning new technology are. This is the most difficult question EVER. There are plenty of good training resources available. When it is about training our needs are different, our preference of learning is different and we all have an opinion. Additionally, we all are located in different geographic locations worldwide and there is no way one solution will fit all. However, let me list a few of the training resources which I have built so far and you can consume them if you find it relevant to your need. SQL Server Books SQL Server Interview Questions and Answers SQL Wait Stats SQL Programming Joes 2 Pros SQL Server Video Tutorials SQL Server Questions and Answers SQL Server Performance: Indexing Basics SQL Server Performance: Introduction to Query Tuning SQL in Sixty Seconds Series of Sixty Seconds Learning Video on YouTube Trust me worldwide web is very big and there are plenty of high quality learning materials available worldwide – trainer-led as well online. I suggest you explore various options and make the best choice for yourself. Remember, training is your personal journey and it should never stop. Are you ready? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Developer Training, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – How to Recover SQL Database Data Deleted by Accident

    - by Pinal Dave
    In Repair a SQL Server database using a transaction log explorer, I showed how to use ApexSQL Log, a SQL Server transaction log viewer, to recover a SQL Server database after a disaster. In this blog, I’ll show you how to use another SQL Server disaster recovery tool from ApexSQL in a situation when data is accidentally deleted. You can download ApexSQL Recover here, install, and play along. With a good SQL Server disaster recovery strategy, data recovery is not a problem. You have a reliable full database backup with valid data, a full database backup and subsequent differential database backups, or a full database backup and a chain of transaction log backups. But not all situations are ideal. Here we’ll address some sub-optimal scenarios, where you can still successfully recover data. If you have only a full database backup This is the least optimal SQL Server disaster recovery strategy, as it doesn’t ensure minimal data loss. For example, data was deleted on Wednesday. Your last full database backup was created on Sunday, three days before the records were deleted. By using the full database backup created on Sunday, you will be able to recover SQL database records that existed in the table on Sunday. If there were any records inserted into the table on Monday or Tuesday, they will be lost forever. The same goes for records modified in this period. This method will not bring back modified records, only the old records that existed on Sunday. If you restore this full database backup, all your changes (intentional and accidental) will be lost and the database will be reverted to the state it had on Sunday. What you have to do is compare the records that were in the table on Sunday to the records on Wednesday, create a synchronization script, and execute it against the Wednesday database. If you have a full database backup followed by differential database backups Let’s say the situation is the same as in the example above, only you create a differential database backup every night. Use the full database backup created on Sunday, and the last differential database backup (created on Tuesday). In this scenario, you will lose only the data inserted and updated after the differential backup created on Tuesday. If you have a full database backup and a chain of transaction log backups This is the SQL Server disaster recovery strategy that provides minimal data loss. With a full chain of transaction logs, you can recover the SQL database to an exact point in time. To provide optimal results, you have to know exactly when the records were deleted, because restoring to a later point will not bring back the records. This method requires restoring the full database backup first. If you have any differential log backup created after the last full database backup, restore the most recent one. Then, restore transaction log backups, one by one, it the order they were created starting with the first created after the restored differential database backup. Now, the table will be in the state before the records were deleted. You have to identify the deleted records, script them and run the script against the original database. Although this method is reliable, it is time-consuming and requires a lot of space on disk. How to easily recover deleted records? The following solution enables you to recover SQL database records even if you have no full or differential database backups and no transaction log backups. To understand how ApexSQL Recover works, I’ll explain what happens when table data is deleted. Table data is stored in data pages. When you delete table records, they are not immediately deleted from the data pages, but marked to be overwritten by new records. Such records are not shown as existing anymore, but ApexSQL Recover can read them and create undo script for them. How long will deleted records stay in the MDF file? It depends on many factors, as time passes it’s less likely that the records will not be overwritten. The more transactions occur after the deletion, the more chances the records will be overwritten and permanently lost. Therefore, it’s recommended to create a copy of the database MDF and LDF files immediately (if you cannot take your database offline until the issue is solved) and run ApexSQL Recover on them. Note that a full database backup will not help here, as the records marked for overwriting are not included in the backup. First, I’ll delete some records from the Person.EmailAddress table in the AdventureWorks database.   I can delete these records in SQL Server Management Studio, or execute a script such as DELETE FROM Person.EmailAddress WHERE BusinessEntityID BETWEEN 70 AND 80 Then, I’ll start ApexSQL Recover and select From DELETE operation in the Recovery tab.   In the Select the database to recover step, first select the SQL Server instance. If it’s not shown in the drop-down list, click the Server icon right to the Server drop-down list and browse for the SQL Server instance, or type the instance name manually. Specify the authentication type and select the database in the Database drop-down list.   In the next step, you’re prompted to add additional data sources. As this can be a tricky step, especially for new users, ApexSQL Recover offers help via the Help me decide option.   The Help me decide option guides you through a series of questions about the database transaction log and advises what files to add. If you know that you have no transaction log backups or detached transaction logs, or the online transaction log file has been truncated after the data was deleted, select No additional transaction logs are available. If you know that you have transaction log backups that contain the delete transactions you want to recover, click Add transaction logs. The online transaction log is listed and selected automatically.   Click Add if to add transaction log backups. It would be best if you have a full transaction log chain, as explained above. The next step for this option is to specify the time range.   Selecting a small time range for the time of deletion will create the recovery script just for the accidentally deleted records. A wide time range might script the records deleted on purpose, and you don’t want that. If needed, you can check the script generated and manually remove such records. After that, for all data sources options, the next step is to select the tables. Be careful here, if you deleted some data from other tables on purpose, and don’t want to recover them, don’t select all tables, as ApexSQL Recover will create the INSERT script for them too.   The next step offers two options: to create a recovery script that will insert the deleted records back into the Person.EmailAddress table, or to create a new database, create the Person.EmailAddress table in it, and insert the deleted records. I’ll select the first one.   The recovery process is completed and 11 records are found and scripted, as expected.   To see the script, click View script. ApexSQL Recover has its own script editor, where you can review, modify, and execute the recovery script. The insert into statements look like: INSERT INTO Person.EmailAddress( BusinessEntityID, EmailAddressID, EmailAddress, rowguid, ModifiedDate) VALUES( 70, 70, N'[email protected]' COLLATE SQL_Latin1_General_CP1_CI_AS, 'd62c5b4e-c91f-403f-b630-7b7e0fda70ce', '20030109 00:00:00.000' ); To execute the script, click Execute in the menu.   If you want to check whether the records are really back, execute SELECT * FROM Person.EmailAddress WHERE BusinessEntityID BETWEEN 70 AND 80 As shown, ApexSQL Recover recovers SQL database data after accidental deletes even without the database backup that contains the deleted data and relevant transaction log backups. ApexSQL Recover reads the deleted data from the database data file, so this method can be used even for databases in the Simple recovery model. Besides recovering SQL database records from a DELETE statement, ApexSQL Recover can help when the records are lost due to a DROP TABLE, or TRUNCATE statement, as well as repair a corrupted MDF file that cannot be attached to as SQL Server instance. You can find more information about how to recover SQL database lost data and repair a SQL Server database on ApexSQL Solution center. There are solutions for various situations when data needs to be recovered. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQLAuthority News – Pluralsight Course Review – Practices for Software Startups – Part 1 of 2

    - by pinaldave
    This is first part of the two part series of Practices for Software Startup Pluralsight Course. The course is written by Stephen Forte (Blog | Twitter). Stephen Forte is the Chief Strategy Officer of the venture backed company, Telerik, a leading vendor of developer and team productivity tools. Stephen is also a Certified Scrum Master, Certified Scrum Professional, PMP, and also speaks regularly at industry conferences around the world. He has written several books on application and database development.  Stephen is also a board member of the Scrum Alliance. Startups – Everybodies Dream Start-up companies are an important topic right now – everyone wants to start their own business.  It is also important to remember that all companies were a start up at one point – from your corner store to the giants like Microsoft and Apple.  Research proves that not every start-up succeeds, in fact, most will fail before their first year.  There are many reasons for this, and this could be due to the fact that there are many stages to a start-up company, and stumbling at any of these stages can lead to failure.  It is important to understand what makes a start-up company succeed at all its hurdles to become successful.  It is even important to define success.  For most start-ups this would mean becoming their own independently functioning company or to be bought out for a hefty profit by a larger company.  The idea of making a hefty profit by living your dream is extremely important, and you can even think of start-ups as the new craze.  That’s why studying them is so important – they are very popular, but things have changed a lot since their inception. Starting the Startups Beginning a start-up company used to be difficult, but now facilities and information is widely available, and it is much easier.  But that means it is much easier to fail, also.  Previously to start your own company, everything was planned and organized, resources were ensured and backed up before beginning; even the idea of starting your own business was a big thing.  Now anybody can do it, and the steps are simple and outlines everywhere – you can get online software and easily outsource , cloud source, or crowdsource a lot of your material.  But without the type of planning previously required, things can often go badly. New Products – New Ideas – New World There are so many fantastic new products, but they don’t reach success all the time.  I find start-up companies very interesting, and whenever I meet someone who is interested in the subject or already starting their own company, I always ask what they are doing, their plans, goals, market, etc.  I am sorry to say that in most cases, they cannot answer my questions.  It is true that many fantastic ideas fail because of bad decisions.  These bad decisions were not made intentionally, but people were simply unaware of what they should be doing.  This will always lead to failure.  But I am happy to say that all these issues can be gone because Pluralsight is now offering a course all about start-ups by Stephen Forte.  Stephen is a start up leader.  He has successfully started many companies and most are still going strong, or have gone on to even bigger and better things. Beginning Course on Startup I have always thought start-ups are a fascinating subject, and decided to take his course, but it is three hours long.  This would be hard to fit into my busy work day all at once, so I decided to do half of his course before my daughter wakes up, and the other half after she goes to sleep.  The course is divided into six modules, so this would be easy to do.  I began the first chapter early in the morning, at 5 am.  Stephen jumped right into the middle of the subject in the very first module – designing your business plan.  The first question you will have to answer to yourself, to others, and to investors is: What is your product and when will we be able to see it?  So a very important concept is a “minimal viable product.”  This means setting goals for yourself and your product.  We all have large dreams, but your minimal viable product doesn’t have to be your final vision at the very first.  For example: Apple is a giant company, but it is still evolving.  Steve Jobs didn’t envision the iPhone 6 at the very beginning.  He had to start at the first iPhone and do his market research, and the idea evolved into the technology you see now.  So for yourself, you should decide a beginning and stop point.  Do your market research.  Determine who you want to reach, what audience you want for your product.  You can have a great idea that simply will not work in the market, do need, bottlenecks, lack of resources, or competition.  There is a lot of research that needs to be done before you even write a business plan, and Stephen covers it in the very first chapter. The Team – Unique Key to Success After jumping right into the subject in the very first module, I wondered what Stephen could have in store for me for the rest of the course.  Chapter number two is building a team.  Having a team is important regardless of what your startup is.  You can be a true visionary with endless ideas and energy, but one person can still not do everything.  It is important to decide from the very beginning if you will have cofounders, team leaders, and how many employees you’ll need.  Even more important, you’ll need to decide what kind of team you want – what personalities, skills, and type of energy you want each of your employees to bring.  Do you want to have an A+ team with a B- idea, or do you have a B- idea that needs an A+ team to sell it?  Stephen asks all the hard questions!  I was especially impressed by his insight on developing.  You have to decide if you need developers, how many, and what their skills should be. I found this insight extremely useful for everyday usage, not just for start-up companies.  I would apply this kind of information in management at any position.  An amazing team will build an amazing product – and that doesn’t matter if you’re a start-up company or a small team working for a much larger business. Customer Development – The Ultimate Obective Chapter three was about customer development. According to Stephen, there are four different steps to develop a customer base.  The first question to ask yourself is if you are envisioning a large customer base buying a few products each, or a small, dedicated base that buys a lot of your product – quantity vs. Quality.  He also discusses how to earn, retain, and get more customers.  He also says that each customer should be placed in a different role – some will be like investors, who regularly spend with you and invest their money in your business.  It is then your job to take that investment and turn it into a better product in the future.  You need to deal with their money properly – think of it is as theirs as investors, not yours as profit.  At the end of this module I felt that only Stephen could provide this kind of insight, and then he listed all the resources he took his information from.  I have never seen a group of people so passionate about their customers. It was indeed a long day for me. In tomorrow’s part 2 we will discuss rest of the three module and also will see a quick video of the Practices for Software Startup Pluralsight Course. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Best Practices, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQLAuthority News – Memories at Anniversary of SQL Wait Stats Book

    - by pinaldave
    SQL Wait Stats About a year ago, I had very proud moment. I had published my second book SQL Server Wait Stats with me as a primary author. It has been a long journey since then. The book got great response and it was widely accepted in the community. It was first of its kind of book written specifically on Wait Stats and Performance. The book was based on my earlier month long series written on the same subject SQL Server Wait Stats. Today, on the anniversary of the book, lots of things come to my mind let me share a few here. Idea behind Blog Series A very common question I often receive is why I wrote a 30 day series on Wait Stats. There were two reasons for it. 1) I have been working with SQL Server for a long time and have troubleshoot more than hundreds of SQL Server which are related to performance tuning. It was a great experience and it taught me a lot of new things. I always documented my experience. After a while I found that I was able to completely rely on my own notes when I was troubleshooting any servers. It is right then I decided to document my experience for the community. 2) While working with wait stats there were a few things, which I thought I knew it well as they were working. However, there was always a fear in the back of mind that what happens if what I believed was incorrect and I was on the wrong path all the time. There was only one way to get it validated. Put it out in front community with my understanding and request further help to improve my understanding. It worked, it worked beautifully. I received plenty of conversations, emails and comments. I refined my content based on various conversations and make it more relevant and near accurate. I guess above two are the major reasons for beginning my journey on writing Wait Stats blog series. Idea behind Book After writing a blog series there was a good amount of request I keep on receiving that I should convert it to eBook or proper book as reading blog posts is great but it goes not give a comprehensive understanding of the subject. The very common feedback from users who were beginning the subject that they will prefer to read it in a structured method. After hearing the feedback for more than 4 months, I decided to write a book based on the blog posts. When I envisioned book, I wanted to make sure this book addresses the wait stats concepts from the fundamentals and fill the gaps of blogs I wrote earlier. Rick Morelan and Joes 2 Pros Team I must acknowledge my co-author Rick Morelan for his unconditional support in writing this book. I had already authored one book before I published this book. The experience to write the book was out of the world. Writing blog posts are much much easier than writing books. The efforts it takes to write a book is 100 times more even though the content is ready. I could have not done it myself if there was not tremendous support of my co-author and editor’s team. We spend days and days researching and discussing various concepts covered in the book. When we were in doubt we reached out to experts as well did a practical reproduction of the scenarios to validate the concepts and claims. After continuous 3 months of hard work we were able to get this book out in the community. September 1st – the lucky day Well, we had to select any day to publish the books. When book was completed in August last week we felt very glad. We all had worked hard and having a sample draft book in hand was feeling like having a newborn baby in our hand. Every time my books are published I feel the same joy which I had when my daughter was born. The feeling of holding a new book in hand is the (almost) same feeling as holding newborn baby. I am excited. For me September 1st has been the luckiest day in mind life. My daughter Shaivi was born on September 1st. Since then every September first has been excellent day and have taken me to the next step in life. I believe anything and everything I do on September 1st it is turning out to be successful and blessed. Rick and I had finished a book in the last week of August. We sent it to the publisher (printer) and asked him to take the book live as soon as possible. We did not decide on any date as we wanted the book to get out as fast as it can. Interesting enough, the publisher/printer selected September 1st for publishing the book. He published the book on 1st September and I knew it at the same time that this book will go next level. Book Model – The Most Beautiful Girl We were done with book. We had no budget left for marketing. Rick and I had a long conversation regarding how to spread the words for the book so it can reach to many people. While we were talking about marketing Rick come up with the idea that we should hire a most beautiful girl around who acknowledge our book and genuinely care for book. It was a difficult task and Rick asked me to find a more beautiful girl. I am a father and the most beautiful girl for me my daughter. This was not a difficult task for me. Rick had given me task to find the most beautiful girl and I just could not think of anyone else than my own daughter. I still do not know what Rick thought about this idea but I had already made up my mind. You can see the detailed blog post here. The Fun Experiments Book Signing Event We had lots of fun moments along this book. We have given away more books to people for free than we have sold them actually. We had done book signing events, contests, and just plain give away when we found people can be benefited from this book. There was never an intention to make money and get rich. We just wanted that more and more people know about this new concept and learn from it. Today when I look back to the earnings there is nothing much we have earned if you talk about dollars. However the best reward which we have received is the satisfaction and love of community. The amount of emails, conversations we have so far received for this book is over thousands. We had fun writing this book, it was indeed a very satisfying journey. I have earned lots of friends while learning and exploring. Availability The book is one year old but still very relevant when it is about performance tuning. It is available at various online book stores. If you have read the book, do let me know what you think of it. Amazon | Kindle | Flipkart | Indiaplaza Reference:  Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority, SQLAuthority Book Review, T SQL, Technology

    Read the article

  • SQL SERVER – Guest Post – Jonathan Kehayias – Wait Type – Day 16 of 28

    - by pinaldave
    Jonathan Kehayias (Blog | Twitter) is a MCITP Database Administrator and Developer, who got started in SQL Server in 2004 as a database developer and report writer in the natural gas industry. After spending two and a half years working in TSQL, in late 2006, he transitioned to the role of SQL Database Administrator. His primary passion is performance tuning, where he frequently rewrites queries for better performance and performs in depth analysis of index implementation and usage. Jonathan blogs regularly on SQLBlog, and was a coauthor of Professional SQL Server 2008 Internals and Troubleshooting. On a personal note, I think Jonathan is extremely positive person. In every conversation with him I have found that he is always eager to help and encourage. Every time he finds something needs to be approved, he has contacted me without hesitation and guided me to improve, change and learn. During all the time, he has not lost his focus to help larger community. I am honored that he has accepted to provide his views on complex subject of Wait Types and Queues. Currently I am reading his series on Extended Events. Here is the guest blog post by Jonathan: SQL Server troubleshooting is all about correlating related pieces of information together to indentify where exactly the root cause of a problem lies. In my daily work as a DBA, I generally get phone calls like, “So and so application is slow, what’s wrong with the SQL Server.” One of the funny things about the letters DBA is that they go so well with Default Blame Acceptor, and I really wish that I knew exactly who the first person was that pointed that out to me, because it really fits at times. A lot of times when I get this call, the problem isn’t related to SQL Server at all, but every now and then in my initial quick checks, something pops up that makes me start looking at things further. The SQL Server is slow, we see a number of tasks waiting on ASYNC_IO_COMPLETION, IO_COMPLETION, or PAGEIOLATCH_* waits in sys.dm_exec_requests and sys.dm_exec_waiting_tasks. These are also some of the highest wait types in sys.dm_os_wait_stats for the server, so it would appear that we have a disk I/O bottleneck on the machine. A quick check of sys.dm_io_virtual_file_stats() and tempdb shows a high write stall rate, while our user databases show high read stall rates on the data files. A quick check of some performance counters and Page Life Expectancy on the server is bouncing up and down in the 50-150 range, the Free Page counter consistently hits zero, and the Free List Stalls/sec counter keeps jumping over 10, but Buffer Cache Hit Ratio is 98-99%. Where exactly is the problem? In this case, which happens to be based on a real scenario I faced a few years back, the problem may not be a disk bottleneck at all; it may very well be a memory pressure issue on the server. A quick check of the system spec’s and it is a dual duo core server with 8GB RAM running SQL Server 2005 SP1 x64 on Windows Server 2003 R2 x64. Max Server memory is configured at 6GB and we think that this should be enough to handle the workload; or is it? This is a unique scenario because there are a couple of things happening inside of this system, and they all relate to what the root cause of the performance problem is on the system. If we were to query sys.dm_exec_query_stats for the TOP 10 queries, by max_physical_reads, max_logical_reads, and max_worker_time, we may be able to find some queries that were using excessive I/O and possibly CPU against the system in their worst single execution. We can also CROSS APPLY to sys.dm_exec_sql_text() and see the statement text, and also CROSS APPLY sys.dm_exec_query_plan() to get the execution plan stored in cache. Ok, quick check, the plans are pretty big, I see some large index seeks, that estimate 2.8GB of data movement between operators, but everything looks like it is optimized the best it can be. Nothing really stands out in the code, and the indexing looks correct, and I should have enough memory to handle this in cache, so it must be a disk I/O problem right? Not exactly! If we were to look at how much memory the plan cache is taking by querying sys.dm_os_memory_clerks for the CACHESTORE_SQLCP and CACHESTORE_OBJCP clerks we might be surprised at what we find. In SQL Server 2005 RTM and SP1, the plan cache was allowed to take up to 75% of the memory under 8GB. I’ll give you a second to go back and read that again. Yes, you read it correctly, it says 75% of the memory under 8GB, but you don’t have to take my word for it, you can validate this by reading Changes in Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2. In this scenario the application uses an entirely adhoc workload against SQL Server and this leads to plan cache bloat, and up to 4.5GB of our 6GB of memory for SQL can be consumed by the plan cache in SQL Server 2005 SP1. This in turn reduces the size of the buffer cache to just 1.5GB, causing our 2.8GB of data movement in this expensive plan to cause complete flushing of the buffer cache, not just once initially, but then another time during the queries execution, resulting in excessive physical I/O from disk. Keep in mind that this is not the only query executing at the time this occurs. Remember the output of sys.dm_io_virtual_file_stats() showed high read stalls on the data files for our user databases versus higher write stalls for tempdb? The memory pressure is also forcing heavier use of tempdb to handle sorting and hashing in the environment as well. The real clue here is the Memory counters for the instance; Page Life Expectancy, Free List Pages, and Free List Stalls/sec. The fact that Page Life Expectancy is fluctuating between 50 and 150 constantly is a sign that the buffer cache is experiencing constant churn of data, once every minute to two and a half minutes. If you add to the Page Life Expectancy counter, the consistent bottoming out of Free List Pages along with Free List Stalls/sec consistently spiking over 10, and you have the perfect memory pressure scenario. All of sudden it may not be that our disk subsystem is the problem, but is instead an innocent bystander and victim. Side Note: The Page Life Expectancy counter dropping briefly and then returning to normal operating values intermittently is not necessarily a sign that the server is under memory pressure. The Books Online and a number of other references will tell you that this counter should remain on average above 300 which is the time in seconds a page will remain in cache before being flushed or aged out. This number, which equates to just five minutes, is incredibly low for modern systems and most published documents pre-date the predominance of 64 bit computing and easy availability to larger amounts of memory in SQL Servers. As food for thought, consider that my personal laptop has more memory in it than most SQL Servers did at the time those numbers were posted. I would argue that today, a system churning the buffer cache every five minutes is in need of some serious tuning or a hardware upgrade. Back to our problem and its investigation: There are two things really wrong with this server; first the plan cache is excessively consuming memory and bloated in size and we need to look at that and second we need to evaluate upgrading the memory to accommodate the workload being performed. In the case of the server I was working on there were a lot of single use plans found in sys.dm_exec_cached_plans (where usecounts=1). Single use plans waste space in the plan cache, especially when they are adhoc plans for statements that had concatenated filter criteria that is not likely to reoccur with any frequency.  SQL Server 2005 doesn’t natively have a way to evict a single plan from cache like SQL Server 2008 does, but MVP Kalen Delaney, showed a hack to evict a single plan by creating a plan guide for the statement and then dropping that plan guide in her blog post Geek City: Clearing a Single Plan from Cache. We could put that hack in place in a job to automate cleaning out all the single use plans periodically, minimizing the size of the plan cache, but a better solution would be to fix the application so that it uses proper parameterized calls to the database. You didn’t write the app, and you can’t change its design? Ok, well you could try to force parameterization to occur by creating and keeping plan guides in place, or we can try forcing parameterization at the database level by using ALTER DATABASE <dbname> SET PARAMETERIZATION FORCED and that might help. If neither of these help, we could periodically dump the plan cache for that database, as discussed as being a problem in Kalen’s blog post referenced above; not an ideal scenario. The other option is to increase the memory on the server to 16GB or 32GB, if the hardware allows it, which will increase the size of the plan cache as well as the buffer cache. In SQL Server 2005 SP1, on a system with 16GB of memory, if we set max server memory to 14GB the plan cache could use at most 9GB  [(8GB*.75)+(6GB*.5)=(6+3)=9GB], leaving 5GB for the buffer cache.  If we went to 32GB of memory and set max server memory to 28GB, the plan cache could use at most 16GB [(8*.75)+(20*.5)=(6+10)=16GB], leaving 12GB for the buffer cache. Thankfully we have SQL Server 2005 Service Pack 2, 3, and 4 these days which include the changes in plan cache sizing discussed in the Changes to Caching Behavior between SQL Server 2000, SQL Server 2005 RTM and SQL Server 2005 SP2 blog post. In real life, when I was troubleshooting this problem, I spent a week trying to chase down the cause of the disk I/O bottleneck with our Server Admin and SAN Admin, and there wasn’t much that could be done immediately there, so I finally asked if we could increase the memory on the server to 16GB, which did fix the problem. It wasn’t until I had this same problem occur on another system that I actually figured out how to really troubleshoot this down to the root cause.  I couldn’t believe the size of the plan cache on the server with 16GB of memory when I actually learned about this and went back to look at it. SQL Server is constantly telling a story to anyone that will listen. As the DBA, you have to sit back and listen to all that it’s telling you and then evaluate the big picture and how all the data you can gather from SQL about performance relate to each other. One of the greatest tools out there is actually a free in the form of Diagnostic Scripts for SQL Server 2005 and 2008, created by MVP Glenn Alan Berry. Glenn’s scripts collect a majority of the information that SQL has to offer for rapid troubleshooting of problems, and he includes a lot of notes about what the outputs of each individual query might be telling you. When I read Pinal’s blog post SQL SERVER – ASYNC_IO_COMPLETION – Wait Type – Day 11 of 28, I noticed that he referenced Checking Memory Related Performance Counters in his post, but there was no real explanation about why checking memory counters is so important when looking at an I/O related wait type. I thought I’d chat with him briefly on Google Talk/Twitter DM and point this out, and offer a couple of other points I noted, so that he could add the information to his blog post if he found it useful.  Instead he asked that I write a guest blog for this. I am honored to be a guest blogger, and to be able to share this kind of information with the community. The information contained in this blog post is a glimpse at how I do troubleshooting almost every day of the week in my own environment. SQL Server provides us with a lot of information about how it is running, and where it may be having problems, it is up to us to play detective and find out how all that information comes together to tell us what’s really the problem. This blog post is written by Jonathan Kehayias (Blog | Twitter). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: MVP, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

    Read the article

  • SQLAuthority News – Pluralsight Course Review – Practices for Software Startups – Part 2 of 2

    - by pinaldave
    This is the second part of the two part series of Practices for Software Startup Pluralsight Course. Please read the first part of this series over here. The course is written by Stephen Forte (Blog | Twitter). Stephen Forte is the Chief Strategy Officer of the venture backed company, Telerik. Personal Learning Schedule After these three sessions it was 6:30 am and time to do my own blog.  But for the rest of the day, I kept thinking about the course, and wanted to go back and finish.  I was wishing that I had woken up at 3 am so I could finish all at one go.  All day long I was digesting what I had learned.  At 10 pm, after my daughter had gone to bed, I sighed on again.  I was not disappointed by the long wait.  As I mentioned before, Stephen has started four to six companies, and all of them are very successful today. Here is the video I promised yesterday – it discusses the importance of Right Sizing Your Startup. The Heartbeat of Startup – Technology Stephen has combined all technology knowledge into one 30 minute session.  He discussed  how to start your project, how to deal with opinions, and how to deal with multiple ideas – every start up has multiple directions it can go. He spent a lot of time emphasized deciding which direction to go and how to decide which will be the best for you.  He called it a continuous development cycle. One of the biggest hazards for a start-up company is one person deciding the direction the company will go, until down the road another team member announces that there is a glitch in their part of the work and that everyone will have to start over.  Even though a team of two or five people can move quickly, often the decision has gone too long and cannot be easily fixed.   Stephen used an example from his own life:  he was biased for one type of technology, and his teammate for another.  In the end they opted for his teammate’s  choice , and in the end it was a good decision, even though he was unfamiliar with that particular program.  He argues that technology should not be a barrier to progress, that you cannot rely on your experience only.  This really spoke to me because I am a big fan of SQL, but I know there is more out there, and I should be more open to it.  I give my thanks to Stephen, I learned something in this module besides startups. Money, Success and Epic Win! The longest, but most interesting, the module was funding your start-up.  You need to fund the start-up right at the very beginning, if not done right you will run into trouble.  The good news is that a few years ago start-ups required a lot more money – think millions of dollars – but now start-ups can get off the ground for thousands.  Stephen used an example of a company that years ago would have needed a million dollars, but today could be started for $600.  It is true that things have changed, but you still need money.  For $600 you can start small and add dynamically, as needed.  But the truth is that if you have $600, $6000, or $6 million, it will be spent.  Don’t think of it as trying to save money, think of it as investing in your future.   You will need money, and you will need to (quickly) decide what you do with the money: shares, stakeholders, investing in a team, hiring a CEO.  This is so important because once you have money and start the company, the company IS your money.  It is your biggest currency – having a percentage of ownership in the company.  Investors will want percentages as repayment for their investment, and they will want a say in the business as well.  You will have to decide how far you will dilute your shares, and how the company will be divided, if at all.  If you don’t plan in advance, you will find that after gaining three or four investors, suddenly you are the minority owner in your own dream.  You need to understand funding carefully.  This single module is worth all the money you would have spent on the whole course alone.  I encourage everyone to listen to this single module even if they don’t watch any of the others.     Press End to Start the Game – Exists! The final module is exit strategies.  You did all this work, dealt with all political and legal issues.  What are you going to get out of it? The answer is simple: money.  Maybe you want your company to be bought out, for you talent to bring you a profit.  You can sell the company to someone and still head it.  Many options are available.  You could sell and still work as an employee but no longer own the company.  There are many exit strategies.  This is where all your hard work comes into play.  It is important not to feel fooled at any step.  There are so many good ideas that end up in the garbage because of poor planning, so that if you find yourself successful, you don’t want to blow it at this step!  The exit is important.  I thought that this aspect of the course was completely unique, and I loved Stephen’s point of view.  I was lost deep in thought after this module ended.  I actually took two hours worth of notes on this section alone – and it was only a three hour course.  I am planning on attending this course one more time next week, just to catch up on all the small bits of wisdom I’m sure I missed. Thank you Stephen for bringing your real world experience with us!  I recommend that everyone attends this course, even if they don’t want to begin their own start-up company. It was indeed a long day for me. Do not forget to read part 1 of this story and attend course Practices for Software Startup Pluralsight Course. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Best Practices, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Introduction – Day 1 of 31

    - by pinaldave
    List of all the Interview Questions and Answers Series blogs Posts covering interview questions and answers always make for interesting reading.  Some people like the subject for their helpful hints and thought provoking subject, and others dislike these posts because they feel it is nothing more than cheating.  I’d like to discuss the pros and cons of a Question and Answer format here. Interview Questions and Answers are Helpful Just like blog posts, books, and articles, interview Question and Answer discussions are learning material.  The popular Dummy’s books or Idiots Guides are not only for “dummies,” but can help everyone relearn the fundamentals.  Question and Answer discussions can serve the same purpose.  You could call this SQL Server Fundamentals or SQL Server 101. I have administrated hundreds of interviews during my career and I have noticed that sometimes an interviewee with several years of experience lacks an understanding of the fundamentals.  These individuals have been in the industry for so long, usually working on a very specific project, that the ABCs of the business have slipped their mind. Or, when a college graduate is looking to get into the industry, he is not expected to have experience since he is just graduated. However, the new grad is expected to have an understanding of fundamentals and theory.  Sometimes after the stress of final exams and graduation, it can be difficult to remember the correct answers to interview questions, though. An interview Question and Answer discussion can be very helpful to both these individuals.  It is simply a way to go back over the building blocks of a topic.  Many times a simple review like this will help “jog” your memory, and all those previously-memorized facts will come flooding back to you.  It is not a way to re-learn a topic, but a way to remind yourself of what you already know. A Question and Answer discussion can also be a way to go over old topics in a more interesting manner.  Especially if you have been working in the industry, or taking lots of classes on the topic, everything you read can sound like a repeat of what you already know.  Going over a topic in a new format can make the material seem fresh and interesting.  And an interested mind will be more engaged and remember more in the end. Interview Questions and Answers are Harmful A common argument against a Question and Answer discussion is that it will give someone a “cheat sheet.” A new guy with relatively little experience can read the interview questions and answers, and then memorize them. When an interviewer asks him the same questions, he will repeat the answers and get the job. Honestly, is he good hire because he memorized the interview questions? Wouldn’t it be better for the interviewer to hire someone with actual experience?  The answer is not as easy as it seems – there are many different factors to be considered. If the interviewer is asking fundamentals-related questions only, he gets the answers he wants to hear, and then hires this first candidate – there is a good chance that he is hiring based on personality rather than experience.  If the interviewer is smart he will ask deeper questions, have more than one person on the interview team, and interview a variety of candidates.  If one interviewee happens to memorize some answers, it usually doesn’t mean he will automatically get the job at the expense of more qualified candidates. Another argument against interview Question and Answers is that it will give candidates a false sense of confidence, and that they will appear more qualified than they are. Well, if that is true, it will not last after the first interview when the candidate is asked difficult questions and he cannot find the answers in the list of interview Questions and Answers.  Besides, confidence is one of the best things to walk into an interview with! In today’s competitive job market, there are often hundreds of candidates applying for the same position.  With so many applicants to choose from, interviewers must make decisions about who to call back and who to hire based on their gut feeling.  One drawback to reading an interview Question and Answer article is that you might sound very boring in your interview – saying the same thing as every single candidate, and parroting answers that sound like someone else wrote them for you – because they did.  However, it is definitely better to go to an interview prepared, just make sure that you give a lot of thought to your answers to make them sound like your own voice.  Remember that you will be hired based on your skills as well as your personality, so don’t think that having all the right answers will make get you hired.  A good interviewee will be prepared, confident, and know how to stand out. My Opinion A list of interview Questions and Answers is really helpful as a refresher or for beginners. To really ace an interview, one needs to have real-world, hands-on experience with SQL Server as well. Interview questions just serve as a starter or easy read for experienced professionals. When I have to learn new technology, I often search online for interview questions and get an idea about the breadth and depth of the technology. Next Action I am going to write about interview Questions and Answers for next 30 days. I have previously written a series of interview questions and answers; now I have re-written them keeping the latest version of SQL Server and current industry progress in mind. If you have faced interesting interview questions or situations, please write to me and I will publish them as a guest post. If you want me to add few more details, leave a comment and I will make sure that I do my best to accommodate. Tomorrow we will start the interview Questions and Answers series, with a few interesting stories, best practices and guest posts. We will have a prize give-away and other awards when the series ends. List of all the Interview Questions and Answers Series blogs Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Interview Questions and Answers, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Big Data – Is Big Data Relevant to me? – Big Data Questionnaires – Guest Post by Vinod Kumar

    - by Pinal Dave
    This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions of SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. I think the series from Pinal is a good one for anyone planning to start on Big Data journey from the basics. In my daily customer interactions this buzz of “Big Data” always comes up, I react generally saying – “Sir, do you really have a ‘Big Data’ problem or do you have a big Data problem?” Generally, there is a silence in the air when I ask this question. Data is everywhere in organizations – be it big data, small data, all data and for few it is bad data which is same as no data :). Wow, don’t discount me as someone who opposes “Big Data”, I am a big supporter as much as I am a critic of the abuse of this term by the people. In this post, I wanted to let my mind flow so that you can also think in the direction I want you to see these concepts. In any case, this is not an exhaustive dump of what is in my mind – but you will surely get the drift how I am going to question Big Data terms from customers!!! Is Big Data Relevant to me? Many of my customers talk to me like blank whiteboard with no idea – “why Big Data”. They want to jump into the bandwagon of technology and they want to decipher insights from their unexplored data a.k.a. unstructured data with structured data. So what are these industry scenario’s that come to mind? Here are some of them: Financials Fraud detection: Banks and Credit cards are monitoring your spending habits on real-time basis. Customer Segmentation: applies in every industry from Banking to Retail to Aviation to Utility and others where they deal with end customer who consume their products and services. Customer Sentiment Analysis: Responding to negative brand perception on social or amplify the positive perception. Sales and Marketing Campaign: Understand the impact and get closer to customer delight. Call Center Analysis: attempt to take unstructured voice recordings and analyze them for content and sentiment. Medical Reduce Re-admissions: How to build a proactive follow-up engagements with patients. Patient Monitoring: How to track Inpatient, Out-Patient, Emergency Visits, Intensive Care Units etc. Preventive Care: Disease identification and Risk stratification is a very crucial business function for medical. Claims fraud detection: There is no precise dollars that one can put here, but this is a big thing for the medical field. Retail Customer Sentiment Analysis, Customer Care Centers, Campaign Management. Supply Chain Analysis: Every sensors and RFID data can be tracked for warehouse space optimization. Location based marketing: Based on where a check-in happens retail stores can be optimize their marketing. Telecom Price optimization and Plans, Finding Customer churn, Customer loyalty programs Call Detail Record (CDR) Analysis, Network optimizations, User Location analysis Customer Behavior Analysis Insurance Fraud Detection & Analysis, Pricing based on customer Sentiment Analysis, Loyalty Management Agents Analysis, Customer Value Management This list can go on to other areas like Utility, Manufacturing, Travel, ITES etc. So as you can see, there are obviously interesting use cases for each of these industry verticals. These are just representative list. Where to start? A lot of times I try to quiz customers on a number of dimensions before starting a Big Data conversation. Are you getting the data you need the way you want it and in a timely manner? Can you get in and analyze the data you need? How quickly is IT to respond to your BI Requests? How easily can you get at the data that you need to run your business/department/project? How are you currently measuring your business? Can you get the data you need to react WITHIN THE QUARTER to impact behaviors to meet your numbers or is it always “rear-view mirror?” How are you measuring: The Brand Customer Sentiment Your Competition Your Pricing Your performance Supply Chain Efficiencies Predictive product / service positioning What are your key challenges of driving collaboration across your global business?  What the challenges in innovation? What challenges are you facing in getting more information out of your data? Note: Garbage-in is Garbage-out. Hold good for all reporting / analytics requirements Big Data POCs? A number of customers get into the realm of setting a small team to work on Big Data – well it is a great start from an understanding point of view, but I tend to ask a number of other questions to such customers. Some of these common questions are: To what degree is your advanced analytics (natural language processing, sentiment analysis, predictive analytics and classification) paired with your Big Data’s efforts? Do you have dedicated resources exploring the possibilities of advanced analytics in Big Data for your business line? Do you plan to employ machine learning technology while doing Advanced Analytics? How is Social Media being monitored in your organization? What is your ability to scale in terms of storage and processing power? Do you have a system in place to sort incoming data in near real time by potential value, data quality, and use frequency? Do you use event-driven architecture to manage incoming data? Do you have specialized data services that can accommodate different formats, security, and the management requirements of multiple data sources? Is your organization currently using or considering in-memory analytics? To what degree are you able to correlate data from your Big Data infrastructure with that from your enterprise data warehouse? Have you extended the role of Data Stewards to include ownership of big data components? Do you prioritize data quality based on the source system (that is Facebook/Twitter data has lower quality thresholds than radio frequency identification (RFID) for a tracking system)? Do your retention policies consider the different legal responsibilities for storing Big Data for a specific amount of time? Do Data Scientists work in close collaboration with Data Stewards to ensure data quality? How is access to attributes of Big Data being given out in the organization? Are roles related to Big Data (Advanced Analyst, Data Scientist) clearly defined? How involved is risk management in the Big Data governance process? Is there a set of documented policies regarding Big Data governance? Is there an enforcement mechanism or approach to ensure that policies are followed? Who is the key sponsor for your Big Data governance program? (The CIO is best) Do you have defined policies surrounding the use of social media data for potential employees and customers, as well as the use of customer Geo-location data? How accessible are complex analytic routines to your user base? What is the level of involvement with outside vendors and third parties in regard to the planning and execution of Big Data projects? What programming technologies are utilized by your data warehouse/BI staff when working with Big Data? These are some of the important questions I ask each customer who is actively evaluating Big Data trends for their organizations. These questions give you a sense of direction where to start, what to use, how to secure, how to analyze and more. Sign off Any Big data is analysis is incomplete without a compelling story. The best way to understand this is to watch Hans Rosling – Gapminder (2:17 to 6:06) videos about the third world myths. Don’t get overwhelmed with the Big Data buzz word, the destination to what your data speaks is important. In this blog post, we did not particularly look at any Big Data technologies. This is a set of questionnaire one needs to keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make sense?  Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – 5 Tips for Improving Your Data with expressor Studio

    - by pinaldave
    It’s no secret that bad data leads to bad decisions and poor results.  However, how do you prevent dirty data from taking up residency in your data store?  Some might argue that it’s the responsibility of the person sending you the data.  While that may be true, in practice that will rarely hold up.  It doesn’t matter how many times you ask, you will get the data however they decide to provide it. So now you have bad data.  What constitutes bad data?  There are quite a few valid answers, for example: Invalid date values Inappropriate characters Wrong data Values that exceed a pre-set threshold While it is certainly possible to write your own scripts and custom SQL to identify and deal with these data anomalies, that effort often takes too long and becomes difficult to maintain.  Instead, leveraging an ETL tool like expressor Studio makes the data cleansing process much easier and faster.  Below are some tips for leveraging expressor to get your data into tip-top shape. Tip 1:     Build reusable data objects with embedded cleansing rules One of the new features in expressor Studio 3.2 is the ability to define constraints at the metadata level.  Using expressor’s concept of Semantic Types, you can define reusable data objects that have embedded logic such as constraints for dealing with dirty data.  Once defined, they can be saved as a shared atomic type and then re-applied to other data attributes in other schemas. As you can see in the figure above, I’ve defined a constraint on zip code.  I can then save the constraint rules I defined for zip code as a shared atomic type called zip_type for example.   The next time I get a different data source with a schema that also contains a zip code field, I can simply apply the shared atomic type (shown below) and the previously defined constraints will be automatically applied. Tip 2:     Unlock the power of regular expressions in Semantic Types Another powerful feature introduced in expressor Studio 3.2 is the option to use regular expressions as a constraint.   A regular expression is used to identify patterns within data.   The patterns could be something as simple as a date format or something much more complex such as a street address.  For example, I could define that a valid IP address should be made up of 4 numbers, each 0 to 255, and separated by a period.  So 192.168.23.123 might be a valid IP address whereas 888.777.0.123 would not be.   How can I account for this using regular expressions? A very simple regular expression that would look for any 4 sets of 3 digits separated by a period would be:  ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ Alternatively, the following would be the exact check for truly valid IP addresses as we had defined above:  ^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$ .  In expressor, we would enter this regular expression as a constraint like this: Here we select the corrective action to be ‘Escalate’, meaning that the expressor Dataflow operator will decide what to do.  Some of the options include rejecting the offending record, skipping it, or aborting the dataflow. Tip 3:     Email pattern expressions that might come in handy In the example schema that I am using, there’s a field for email.  Email addresses are often entered incorrectly because people are trying to avoid spam.  While there are a lot of different ways to define what constitutes a valid email address, a quick search online yields a couple of really useful regular expressions for validating email addresses: This one is short and sweet:  \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b (Source: http://www.regular-expressions.info/) This one is more specific about which characters are allowed:  ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ (Source: http://regexlib.com/REDetails.aspx?regexp_id=26 ) Tip 4:     Reject “dirty data” for analysis or further processing Yet another feature introduced in expressor Studio 3.2 is the ability to reject records based on constraint violations.  To capture reject records on input, simply specify Reject Record in the Error Handling setting for the Read File operator.  Then attach a Write File operator to the reject port of the Read File operator as such: Next, in the Write File operator, you can configure the expressor operator in a similar way to the Read File.  The key difference would be that the schema needs to be derived from the upstream operator as shown below: Once configured, expressor will output rejected records to the file you specified.  In addition to the rejected records, expressor also captures some diagnostic information that will be helpful towards identifying why the record was rejected.  This makes diagnosing errors much easier! Tip 5:    Use a Filter or Transform after the initial cleansing to finish the job Sometimes you may want to predicate the data cleansing on a more complex set of conditions.  For example, I may only be interested in processing data containing males over the age of 25 in certain zip codes.  Using an expressor Filter operator, you can define the conditional logic which isolates the records of importance away from the others. Alternatively, the expressor Transform operator can be used to alter the input value via a user defined algorithm or transformation.  It also supports the use of conditional logic and data can be rejected based on constraint violations. However, the best tip I can leave you with is to not constrain your solution design approach – expressor operators can be combined in many different ways to achieve the desired results.  For example, in the expressor Dataflow below, I can post-process the reject data from the Filter which did not meet my pre-defined criteria and, if successful, Funnel it back into the flow so that it gets written to the target table. I continue to be impressed that expressor offers all this functionality as part of their FREE expressor Studio desktop ETL tool, which you can download from here.  Their Studio ETL tool is absolutely free and they are very open about saying that if you want to deploy their software on a dedicated Windows Server, you need to purchase their server software, whose pricing is posted on their website. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – Shrinking Database is Bad – Increases Fragmentation – Reduces Performance

    - by pinaldave
    Earlier, I had written two articles related to Shrinking Database. I wrote about why Shrinking Database is not good. SQL SERVER – SHRINKDATABASE For Every Database in the SQL Server SQL SERVER – What the Business Says Is Not What the Business Wants I received many comments on Why Database Shrinking is bad. Today we will go over a very interesting example that I have created for the same. Here are the quick steps of the example. Create a test database Create two tables and populate with data Check the size of both the tables Size of database is very low Check the Fragmentation of one table Fragmentation will be very low Truncate another table Check the size of the table Check the fragmentation of the one table Fragmentation will be very low SHRINK Database Check the size of the table Check the fragmentation of the one table Fragmentation will be very HIGH REBUILD index on one table Check the size of the table Size of database is very HIGH Check the fragmentation of the one table Fragmentation will be very low Here is the script for the same. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO Let us check the table size and fragmentation. Now let us TRUNCATE the table and check the size and Fragmentation. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can clearly see that after TRUNCATE, the size of the database is not reduced and it is still the same as before TRUNCATE operation. After the Shrinking database operation, we were able to reduce the size of the database. If you notice the fragmentation, it is considerably high. The major problem with the Shrink operation is that it increases fragmentation of the database to very high value. Higher fragmentation reduces the performance of the database as reading from that particular table becomes very expensive. One of the ways to reduce the fragmentation is to rebuild index on the database. Let us rebuild the index and observe fragmentation and database size. -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REBUILD GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can notice that after rebuilding, Fragmentation reduces to a very low value (almost same to original value); however the database size increases way higher than the original. Before rebuilding, the size of the database was 5 MB, and after rebuilding, it is around 20 MB. Regular rebuilding the index is rebuild in the same user database where the index is placed. This usually increases the size of the database. Look at irony of the Shrinking database. One person shrinks the database to gain space (thinking it will help performance), which leads to increase in fragmentation (reducing performance). To reduce the fragmentation, one rebuilds index, which leads to size of the database to increase way more than the original size of the database (before shrinking). Well, by Shrinking, one did not gain what he was looking for usually. Rebuild indexing is not the best suggestion as that will create database grow again. I have always remembered the excellent post from Paul Randal regarding Shrinking the database is bad. I suggest every one to read that for accuracy and interesting conversation. Let us run following script where we Shrink the database and REORGANIZE. -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Shrink the Database DBCC SHRINKDATABASE (ShrinkIsBed); GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REORGANIZE GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can see that REORGANIZE does not increase the size of the database or remove the fragmentation. Again, I no way suggest that REORGANIZE is the solution over here. This is purely observation using demo. Read the blog post of Paul Randal. Following script will clean up the database -- Clean up USE MASTER GO ALTER DATABASE ShrinkIsBed SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO DROP DATABASE ShrinkIsBed GO There are few valid cases of the Shrinking database as well, but that is not covered in this blog post. We will cover that area some other time in future. Additionally, one can rebuild index in the tempdb as well, and we will also talk about the same in future. Brent has written a good summary blog post as well. Are you Shrinking your database? Well, when are you going to stop Shrinking it? Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Index, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

    Read the article

  • SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

    - by pinaldave
    Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base  Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

    Read the article

  • SQL SERVER – Introduction to SQL Server 2014 In-Memory OLTP

    - by Pinal Dave
    In SQL Server 2014 Microsoft has introduced a new database engine component called In-Memory OLTP aka project “Hekaton” which is fully integrated into the SQL Server Database Engine. It is optimized for OLTP workloads accessing memory resident data. In-memory OLTP helps us create memory optimized tables which in turn offer significant performance improvement for our typical OLTP workload. The main objective of memory optimized table is to ensure that highly transactional tables could live in memory and remain in memory forever without even losing out a single record. The most significant part is that it still supports majority of our Transact-SQL statement. Transact-SQL stored procedures can be compiled to machine code for further performance improvements on memory-optimized tables. This engine is designed to ensure higher concurrency and minimal blocking. In-Memory OLTP alleviates the issue of locking, using a new type of multi-version optimistic concurrency control. It also substantially reduces waiting for log writes by generating far less log data and needing fewer log writes. Points to remember Memory-optimized tables refer to tables using the new data structures and key words added as part of In-Memory OLTP. Disk-based tables refer to your normal tables which we used to create in SQL Server since its inception. These tables use a fixed size 8 KB pages that need to be read from and written to disk as a unit. Natively compiled stored procedures refer to an object Type which is new and is supported by in-memory OLTP engine which convert it into machine code, which can further improve the data access performance for memory –optimized tables. Natively compiled stored procedures can only reference memory-optimized tables, they can’t be used to reference any disk –based table. Interpreted Transact-SQL stored procedures, which is what SQL Server has always used. Cross-container transactions refer to transactions that reference both memory-optimized tables and disk-based tables. Interop refers to interpreted Transact-SQL that references memory-optimized tables. Using In-Memory OLTP In-Memory OLTP engine has been available as part of SQL Server 2014 since June 2013 CTPs. Installation of In-Memory OLTP is part of the SQL Server setup application. The In-Memory OLTP components can only be installed with a 64-bit edition of SQL Server 2014 hence they are not available with 32-bit editions. Creating Databases Any database that will store memory-optimized tables must have a MEMORY_OPTIMIZED_DATA filegroup. This filegroup is specifically designed to store the checkpoint files needed by SQL Server to recover the memory-optimized tables, and although the syntax for creating the filegroup is almost the same as for creating a regular filestream filegroup, it must also specify the option CONTAINS MEMORY_OPTIMIZED_DATA. Here is an example of a CREATE DATABASE statement for a database that can support memory-optimized tables: CREATE DATABASE InMemoryDB ON PRIMARY(NAME = [InMemoryDB_data], FILENAME = 'D:\data\InMemoryDB_data.mdf', size=500MB), FILEGROUP [SampleDB_mod_fg] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [InMemoryDB_mod_dir], FILENAME = 'S:\data\InMemoryDB_mod_dir'), (NAME = [InMemoryDB_mod_dir], FILENAME = 'R:\data\InMemoryDB_mod_dir') LOG ON (name = [SampleDB_log], Filename='L:\log\InMemoryDB_log.ldf', size=500MB) COLLATE Latin1_General_100_BIN2; Above example code creates files on three different drives (D:  S: and R:) for the data files and in memory storage so if you would like to run this code kindly change the drive and folder locations as per your convenience. Also notice that binary collation was specified as Windows (non-SQL). BIN2 collation is the only collation support at this point for any indexes on memory optimized tables. It is also possible to add a MEMORY_OPTIMIZED_DATA file group to an existing database, use the below command to achieve the same. ALTER DATABASE AdventureWorks2012 ADD FILEGROUP hekaton_mod CONTAINS MEMORY_OPTIMIZED_DATA; GO ALTER DATABASE AdventureWorks2012 ADD FILE (NAME='hekaton_mod', FILENAME='S:\data\hekaton_mod') TO FILEGROUP hekaton_mod; GO Creating Tables There is no major syntactical difference between creating a disk based table or a memory –optimized table but yes there are a few restrictions and a few new essential extensions. Essentially any memory-optimized table should use the MEMORY_OPTIMIZED = ON clause as shown in the Create Table query example. DURABILITY clause (SCHEMA_AND_DATA or SCHEMA_ONLY) Memory-optimized table should always be defined with a DURABILITY value which can be either SCHEMA_AND_DATA or  SCHEMA_ONLY the former being the default. A memory-optimized table defined with DURABILITY=SCHEMA_ONLY will not persist the data to disk which means the data durability is compromised whereas DURABILITY= SCHEMA_AND_DATA ensures that data is also persisted along with the schema. Indexing Memory Optimized Table A memory-optimized table must always have an index for all tables created with DURABILITY= SCHEMA_AND_DATA and this can be achieved by declaring a PRIMARY KEY Constraint at the time of creating a table. The following example shows a PRIMARY KEY index created as a HASH index, for which a bucket count must also be specified. CREATE TABLE Mem_Table ( [Name] VARCHAR(32) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000), [City] VARCHAR(32) NULL, [State_Province] VARCHAR(32) NULL, [LastModified] DATETIME NOT NULL, ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA); Now as you can see in the above query example we have used the clause MEMORY_OPTIMIZED = ON to make sure that it is considered as a memory optimized table and not just a normal table and also used the DURABILITY Clause= SCHEMA_AND_DATA which means it will persist data along with metadata and also you can notice this table has a PRIMARY KEY mentioned upfront which is also a mandatory clause for memory-optimized tables. We will talk more about HASH Indexes and BUCKET_COUNT in later articles on this topic which will be focusing more on Row and Index storage on Memory-Optimized tables. So stay tuned for that as well. Now as we covered the basics of Memory Optimized tables and understood the key things to remember while using memory optimized tables, let’s explore more using examples to understand the Performance gains using memory-optimized tables. I will be using the database which i created earlier in this article i.e. InMemoryDB in the below Demo Exercise. USE InMemoryDB GO -- Creating a disk based table CREATE TABLE dbo.Disktable ( Id INT IDENTITY, Name CHAR(40) ) GO CREATE NONCLUSTERED INDEX IX_ID ON dbo.Disktable (Id) GO -- Creating a memory optimized table with similar structure and DURABILITY = SCHEMA_AND_DATA CREATE TABLE dbo.Memorytable_durable ( Id INT NOT NULL PRIMARY KEY NONCLUSTERED Hash WITH (bucket_count =1000000), Name CHAR(40) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) GO -- Creating an another memory optimized table with similar structure but DURABILITY = SCHEMA_Only CREATE TABLE dbo.Memorytable_nondurable ( Id INT NOT NULL PRIMARY KEY NONCLUSTERED Hash WITH (bucket_count =1000000), Name CHAR(40) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_only) GO -- Now insert 100000 records in dbo.Disktable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Disktable(Name) VALUES('sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END -- Do the same inserts for Memory table dbo.Memorytable_durable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Memorytable_durable VALUES(@i_t, 'sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END -- Now finally do the same inserts for Memory table dbo.Memorytable_nondurable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Memorytable_nondurable VALUES(@i_t, 'sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END The above 3 Inserts took 1.20 minutes, 54 secs, and 2 secs respectively to insert 100000 records on my machine with 8 Gb RAM. This proves the point that memory-optimized tables can definitely help businesses achieve better performance for their highly transactional business table and memory- optimized tables with Durability SCHEMA_ONLY is even faster as it does not bother persisting its data to disk which makes it supremely fast. Koenig Solutions is one of the few organizations which offer IT training on SQL Server 2014 and all its updates. Now, I leave the decision on using memory_Optimized tables on you, I hope you like this article and it helped you understand  the fundamentals of IN-Memory OLTP . Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Koenig

    Read the article

  • SQL SERVER – Extending SQL Azure with Azure worker role – Guest Post by Paras Doshi

    - by pinaldave
    This is guest post by Paras Doshi. Paras Doshi is a research Intern at SolidQ.com and a Microsoft student partner. He is currently working in the domain of SQL Azure. SQL Azure is nothing but a SQL server in the cloud. SQL Azure provides benefits such as on demand rapid provisioning, cost-effective scalability, high availability and reduced management overhead. To see an introduction on SQL Azure, check out the post by Pinal here In this article, we are going to discuss how to extend SQL Azure with the Azure worker role. In other words, we will attempt to write a custom code and host it in the Azure worker role; the aim is to add some features that are not available with SQL Azure currently or features that need to be customized for flexibility. This way we extend the SQL Azure capability by building some solutions that run on Azure as worker roles. To understand Azure worker role, think of it as a windows service in cloud. Azure worker role can perform background processes, and to handle processes such as synchronization and backup, it becomes our ideal tool. First, we will focus on writing a worker role code that synchronizes SQL Azure databases. Before we do so, let’s see some scenarios in which synchronization between SQL Azure databases is beneficial: scaling out access over multiple databases enables us to handle workload efficiently As of now, SQL Azure database can be hosted in one of any six datacenters. By synchronizing databases located in different data centers, one can extend the data by enabling access to geographically distributed data Let us see some scenarios in which SQL server to SQL Azure database synchronization is beneficial To backup SQL Azure database on local infrastructure Rather than investing in local infrastructure for increased workloads, such workloads could be handled by cloud Ability to extend data to different datacenters located across the world to enable efficient data access from remote locations Now, let us develop cloud-based app that synchronizes SQL Azure databases. For an Introduction to developing cloud based apps, click here Now, in this article, I aim to provide a bird’s eye view of how a code that synchronizes SQL Azure databases look like and then list resources that can help you develop the solution from scratch. Now, if you newly add a worker role to the cloud-based project, this is how the code will look like. (Note: I have added comments to the skeleton code to point out the modifications that will be required in the code to carry out the SQL Azure synchronization. Note the placement of Setup() and Sync() function.) Click here (http://parasdoshi1989.files.wordpress.com/2011/06/code-snippet-1-for-extending-sql-azure-with-azure-worker-role1.pdf ) Enabling SQL Azure databases synchronization through sync framework is a two-step process. In the first step, the database is provisioned and sync framework creates tracking tables, stored procedures, triggers, and tables to store metadata to enable synchronization. This is one time step. The code for the same is put in the setup() function which is called once when the worker role starts. Now, the second step is continuous (or on demand) synchronization of SQL Azure databases by propagating changes between databases. This is done on a continuous basis by calling the sync() function in the while loop. The code logic to synchronize changes between SQL Azure databases should be put in the sync() function. Discussing the coding part step by step is out of the scope of this article. Therefore, let me suggest you a resource, which is given here. Also, note that before you start developing the code, you will need to install SYNC framework 2.1 SDK (download here). Further, you will reference some libraries before you start coding. Details regarding the same are available in the article that I just pointed to. You will be charged for data transfers if the databases are not in the same datacenter. For pricing information, go here Currently, a tool named DATA SYNC, which is built on top of sync framework, is available in CTP that allows SQL Azure <-> SQL server and SQL Azure <-> SQL Azure synchronization (without writing single line of code); however, in some cases, the custom code shown in this blogpost provides flexibility that is not available with Data SYNC. For instance, filtering is not supported in the SQL Azure DATA SYNC CTP2; if you wish to have such a functionality now, then you have the option of developing a custom code using SYNC Framework. Now, this code can be easily extended to synchronize at some schedule. Let us say we want the databases to get synchronized every day at 10:00 pm. This is what the code will look like now: (http://parasdoshi1989.files.wordpress.com/2011/06/code-snippet-2-for-extending-sql-azure-with-azure-worker-role.pdf) Don’t you think that by writing such a code, we are imitating the functionality provided by the SQL server agent for a SQL server? Think about it. We are scheduling our administrative task by writing custom code – in other words, we have developed a “Light weight SQL server agent for SQL Azure!” Since the SQL server agent is not currently available in cloud, we have developed a solution that enables us to schedule tasks, and thus we have extended SQL Azure with the Azure worker role! Now if you wish to track jobs, you can do so by storing this data in SQL Azure (or Azure tables). The reason is that Windows Azure is a stateless platform, and we will need to store the state of the job ourselves and the choice that you have is SQL Azure or Azure tables. Note that this solution requires custom code and also it is not UI driven; however, for now, it can act as a temporary solution until SQL server agent is made available in the cloud. Moreover, this solution does not encompass functionalities that a SQL server agent provides, but it does open up an interesting avenue to schedule some of the tasks such as backup and synchronization of SQL Azure databases by writing some custom code in the Azure worker role. Now, let us see one more possibility – i.e., running BCP through a worker role in Azure-hosted services and then uploading the backup files either locally or on blobs. If you upload it locally, then consider the data transfer cost. If you upload it to blobs residing in the same datacenter, then no transfer cost applies but the cost on blob size applies. So, before choosing the option, you need to evaluate your preferences keeping the cost associated with each option in mind. In this article, I have shown that Azure worker role solution could be developed to synchronize SQL Azure databases. Moreover, a light-weight SQL server agent for SQL Azure can be developed. Also we discussed the possibility of running BCP through a worker role in Azure-hosted services for backing up our precious SQL Azure data. Thus, we can extend SQL Azure with the Azure worker role. But remember: you will be charged for running Azure worker roles. So at the end of the day, you need to ask – am I willing to build a custom code and pay money to achieve this functionality? I hope you found this blog post interesting. If you have any questions/feedback, you can comment below or you can mail me at Paras[at]student-partners[dot]com Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Azure, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Developer’s Life – Disaster Lessons – Notes from the Field #039

    - by Pinal Dave
    [Note from Pinal]: This is a 39th episode of Notes from the Field series. What is the best solution do you have when you encounter a disaster in your organization. Now many of you would answer that in this scenario you would have another standby machine or alternative which you will plug in. Now let me ask second question – What would you do if you as an individual faces disaster?  In this episode of the Notes from the Field series database expert Mike Walsh explains a very crucial issue we face in our career, which is not technical but more to relate to human nature. Read on this may be the best blog post you might read in recent times. Howdy! When it was my turn to share the Notes from the Field last time, I took a departure from my normal technical content to talk about Attitude and Communication.(http://blog.sqlauthority.com/2014/05/08/developers-life-attitude-and-communication-they-can-cause-problems-notes-from-the-field-027/) Pinal said it was a popular topic so I hope he won’t mind if I stick with Professional Development for another of my turns at sharing some information here. Like I said last time, the “soft skills” of the IT world are often just as important – sometimes more important – than the technical skills. As a consultant with Linchpin People – I see so many situations where the professional skills I’ve gained and use are more valuable to clients than knowing the best way to tune a query. Today I want to continue talking about professional development and tell you about the way I almost got myself hit by a train – and why that matters in our day jobs. Sometimes we can learn a lot from disasters. Whether we caused them or someone else did. If you are interested in learning about some of my observations in these lessons you can see more where I talk about lessons from disasters on my blog. For now, though, onto how I almost got my vehicle hit by a train… The Train Crash That Almost Was…. My family and I own a little schoolhouse building about a 10 mile drive away from our house. We use it as a free resource for families in the area that homeschool their children – so they can have some class space. I go up there a lot to check in on the property, to take care of the trash and to do work on the property. On the way there, there is a very small Stop Sign controlled railroad intersection. There is only two small freight trains a day passing there. Actually the same train, making a journey south and then back North. That’s it. This road is a small rural road, barely ever a second car driving in the neighborhood there when I am. The stop sign is pretty much there only for the train crossing. When we first bought the building, I was up there a lot doing renovations on the property. Being familiar with the area, I am also familiar with the train schedule and know the tracks are normally free of trains. So I developed a bad habit. You see, I’d approach the stop sign and slow down as I roll through it. Sometimes I’d do a quick look and come to an “almost” stop there but keep on going. I let my impatience and complacency take over. And that is because most of the time I was going there long after the train was done for the day or in between the runs. This habit became pretty well established after a couple years of driving the route. The behavior reinforced a bit by the success ratio. I saw others doing it as well from the neighborhood when I would happen to be there around the time another car was there. Well. You already know where this ends up by the title and backstory here. A few months ago I came to that little crossing, and I started to do the normal routine. I’d pretty much stopped looking in some respects because of the pattern I’d gotten into.  For some reason I looked and heard and saw the train slowly approaching and slammed on my brakes and stopped. It was an abrupt stop, and it was close. I probably would have made it okay, but I sat there thinking about lessons for IT professionals from the situation once I started breathing again and watched the cars loaded with sand and propane slowly labored down the tracks… Here are Those Lessons… It’s easy to get stuck into a routine – That isn’t always bad. Except when it’s a bad routine. Momentum and inertia are powerful. Once you have a habit and a routine developed – it’s really hard to break that. Make sure you are setting the right routines and habits TODAY. What almost dangerous things are you doing today? How are you almost messing up your production environment today? Stop doing that. Be Deliberate – (Even when you are the only one) – Like I said – a lot of people roll through that stop sign. Perhaps the neighbors or other drivers think “why is he fully stopping and looking… The train only comes two times a day!” – they can think that all they want. Through deliberate actions and forcing myself to pay attention, I will avoid that oops again. Slow down. Take a deep breath. Be Deliberate in your job. Pay attention to the small stuff and go out of your way to be careful. It will save you later. Be Observant – Keep your eyes open. By looking around, observing the situation and understanding what your servers, databases, users and vendors are doing – you’ll notice when something is out of place. But if you don’t know what is normal, if you don’t look to make sure nothing has changed – that train will come and get you. Where can you be more observant? What warning signs are you ignoring in your environment today? In the IT world – trains are everywhere. Projects move fast. Decisions happen fast. Problems turn from a warning sign to a disaster quickly. If you get stuck in a complacent pattern of “Everything is okay, it always has been and always will be” – that’s the time that you will most likely get stuck in a bad situation. Don’t let yourself get complacent, don’t let your team get complacent. That will lead to being proactive. And a proactive environment spends less money on consultants for troubleshooting problems you should have seen ahead of time. You can spend your money and IT budget on improving for your customers. If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

    - by pinaldave
    Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • MySQL – Scalability on Amazon RDS: Scale out to multiple RDS instances

    - by Pinal Dave
    Today, I’d like to discuss getting better MySQL scalability on Amazon RDS. The question of the day: “What can you do when a MySQL database needs to scale write-intensive workloads beyond the capabilities of the largest available machine on Amazon RDS?” Let’s take a look. In a typical EC2/RDS set-up, users connect to app servers from their mobile devices and tablets, computers, browsers, etc.  Then app servers connect to an RDS instance (web/cloud services) and in some cases they might leverage some read-only replicas.   Figure 1. A typical RDS instance is a single-instance database, with read replicas.  This is not very good at handling high write-based throughput. As your application becomes more popular you can expect an increasing number of users, more transactions, and more accumulated data.  User interactions can become more challenging as the application adds more sophisticated capabilities. The result of all this positive activity: your MySQL database will inevitably begin to experience scalability pressures. What can you do? Broadly speaking, there are four options available to improve MySQL scalability on RDS. 1. Larger RDS Instances – If you’re not already using the maximum available RDS instance, you can always scale up – to larger hardware.  Bigger CPUs, more compute power, more memory et cetera. But the largest available RDS instance is still limited.  And they get expensive. “High-Memory Quadruple Extra Large DB Instance”: 68 GB of memory 26 ECUs (8 virtual cores with 3.25 ECUs each) 64-bit platform High I/O Capacity Provisioned IOPS Optimized: 1000Mbps 2. Provisioned IOPs – You can get provisioned IOPs and higher throughput on the I/O level. However, there is a hard limit with a maximum instance size and maximum number of provisioned IOPs you can buy from Amazon and you simply cannot scale beyond these hardware specifications. 3. Leverage Read Replicas – If your application permits, you can leverage read replicas to offload some reads from the master databases. But there are a limited number of replicas you can utilize and Amazon generally requires some modifications to your existing application. And read-replicas don’t help with write-intensive applications. 4. Multiple Database Instances – Amazon offers a fourth option: “You can implement partitioning,thereby spreading your data across multiple database Instances” (Link) However, Amazon does not offer any guidance or facilities to help you with this. “Multiple database instances” is not an RDS feature.  And Amazon doesn’t explain how to implement this idea. In fact, when asked, this is the response on an Amazon forum: Q: Is there any documents that describe the partition DB across multiple RDS? I need to use DB with more 1TB but exist a limitation during the create process, but I read in the any FAQ that you need to partition database, but I don’t find any documents that describe it. A: “DB partitioning/sharding is not an official feature of Amazon RDS or MySQL, but a technique to scale out database by using multiple database instances. The appropriate way to split data depends on the characteristics of the application or data set. Therefore, there is no concrete and specific guidance.” So now what? The answer is to scale out with ScaleBase. Amazon RDS with ScaleBase: What you get – MySQL Scalability! ScaleBase is specifically designed to scale out a single MySQL RDS instance into multiple MySQL instances. Critically, this is accomplished with no changes to your application code.  Your application continues to “see” one database.   ScaleBase does all the work of managing and enforcing an optimized data distribution policy to create multiple MySQL instances. With ScaleBase, data distribution, transactions, concurrency control, and two-phase commit are all 100% transparent and 100% ACID-compliant, so applications, services and tooling continue to interact with your distributed RDS as if it were a single MySQL instance. The result: now you can cost-effectively leverage multiple MySQL RDS instance to scale out write-intensive workloads to an unlimited number of users, transactions, and data. Amazon RDS with ScaleBase: What you keep – Everything! And how does this change your Amazon environment? 1. Keep your application, unchanged – There is no change your application development life-cycle at all.  You still use your existing development tools, frameworks and libraries.  Application quality assurance and testing cycles stay the same. And, critically, you stay with an ACID-compliant MySQL environment. 2. Keep your RDS value-added services – The value-added services that you rely on are all still available. Amazon will continue to handle database maintenance and updates for you. You can still leverage High Availability via Multi A-Z.  And, if it benefits youra application throughput, you can still use read replicas. 3. Keep your RDS administration – Finally the RDS monitoring and provisioning tools you rely on still work as they did before. With your one large MySQL instance, now split into multiple instances, you can actually use less expensive, smallersmaller available RDS hardware and continue to see better database performance. Conclusion Amazon RDS is a tremendous service, but it doesn’t offer solutions to scale beyond a single MySQL instance. Larger RDS instances get more expensive.  And when you max-out on the available hardware, you’re stuck.  Amazon recommends scaling out your single instance into multiple instances for transaction-intensive apps, but offers no services or guidance to help you. This is where ScaleBase comes in to save the day. It gives you a simple and effective way to create multiple MySQL RDS instances, while removing all the complexities typically caused by “DIY” sharding andwith no changes to your applications . With ScaleBase you continue to leverage the AWS/RDS ecosystem: commodity hardware and value added services like read replicas, multi A-Z, maintenance/updates and administration with monitoring tools and provisioning. SCALEBASE ON AMAZON If you’re curious to try ScaleBase on Amazon, it can be found here – Download NOW. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: MySQL, PostADay, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL – Migrate Database from SQL Server to NuoDB – A Quick Tutorial

    - by Pinal Dave
    Data is growing exponentially and every organization with growing data is thinking of next big innovation in the world of Big Data. Big data is a indeed a future for every organization at one point of the time. Just like every other next big thing, big data has its own challenges and issues. The biggest challenge associated with the big data is to find the ideal platform which supports the scalability and growth of the data. If you are a regular reader of this blog, you must be familiar with NuoDB. I have been working with NuoDB for a while and their recent release is the best thus far. NuoDB is an elastically scalable SQL database that can run on local host, datacenter and cloud-based resources. A key feature of the product is that it does not require sharding (read more here). Last week, I was able to install NuoDB in less than 90 seconds and have explored their Explorer and Admin sections. You can read about my experiences in these posts: SQL – Step by Step Guide to Download and Install NuoDB – Getting Started with NuoDB SQL – Quick Start with Admin Sections of NuoDB – Manage NuoDB Database SQL – Quick Start with Explorer Sections of NuoDB – Query NuoDB Database Many SQL Authority readers have been following me in my journey to evaluate NuoDB. One of the frequently asked questions I’ve received from you is if there is any way to migrate data from SQL Server to NuoDB. The fact is that there is indeed a way to do so and NuoDB provides a fantastic tool which can help users to do it. NuoDB Migrator is a command line utility that supports the migration of Microsoft SQL Server, MySQL, Oracle, and PostgreSQL schemas and data to NuoDB. The migration to NuoDB is a three-step process: NuoDB Migrator generates a schema for a target NuoDB database It loads data into the target NuoDB database It dumps data from the source database Let’s see how we can migrate our data from SQL Server to NuoDB using a simple three-step approach. But before we do that we will create a sample database in MSSQL and later we will migrate the same database to NuoDB: Setup Step 1: Build a sample data CREATE DATABASE [Test]; CREATE TABLE [Department]( [DepartmentID] [smallint] NOT NULL, [Name] VARCHAR(100) NOT NULL, [GroupName] VARCHAR(100) NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_Department_DepartmentID] PRIMARY KEY CLUSTERED ( [DepartmentID] ASC ) ) ON [PRIMARY]; INSERT INTO Department SELECT * FROM AdventureWorks2012.HumanResources.Department; Note that I am using the SQL Server AdventureWorks database to build this sample table but you can build this sample table any way you prefer. Setup Step 2: Install Java 64 bit Before you can begin the migration process to NuoDB, make sure you have 64-bit Java installed on your computer. This is due to the fact that the NuoDB Migrator tool is built in Java. You can download 64-bit Java for Windows, Mac OSX, or Linux from the following link: http://java.com/en/download/manual.jsp. One more thing to remember is that you make sure that the path in your environment settings is set to your JAVA_HOME directory or else the tool will not work. Here is how you can do it: Go to My Computer >> Right Click >> Select Properties >> Click on Advanced System Settings >> Click on Environment Variables >> Click on New and enter the following values. Variable Name: JAVA_HOME Variable Value: C:\Program Files\Java\jre7 Make sure you enter your Java installation directory in the Variable Value field. Setup Step 3: Install JDBC driver for SQL Server. There are two JDBC drivers available for SQL Server.  Select the one you prefer to use by following one of the two links below: Microsoft JDBC Driver jTDS JDBC Driver In this example we will be using jTDS JDBC driver. Once you download the driver, move the driver to your NuoDB installation folder. In my case, I have moved the JAR file of the driver into the C:\Program Files\NuoDB\tools\migrator\jar folder as this is my NuoDB installation directory. Now we are all set to start the three-step migration process from SQL Server to NuoDB: Migration Step 1: NuoDB Schema Generation Here is the command I use to generate a schema of my SQL Server Database in NuoDB. First I go to the folder C:\Program Files\NuoDB\tools\migrator\bin and execute the nuodb-migrator.bat file. Note that my database name is ‘test’. Additionally my username and password is also ‘test’. You can see that my SQL Server database is running on my localhost on port 1433. Additionally, the schema of the table is ‘dbo’. nuodb-migrator schema –source.driver=net.sourceforge.jtds.jdbc.Driver –source.url=jdbc:jtds:sqlserver://localhost:1433/ –source.username=test –source.password=test –source.catalog=test –source.schema=dbo –output.path=/tmp/schema.sql The above script will generate a schema of all my SQL Server tables and will put it in the folder C:\tmp\schema.sql . You can open the schema.sql file and execute this file directly in your NuoDB instance. You can follow the link here to see how you can execute the SQL script in NuoDB. Please note that if you have not yet created the schema in the NuoDB database, you should create it before executing this step. Step 2: Generate the Dump File of the Data Once you have recreated your schema in NuoDB from SQL Server, the next step is very easy. Here we create a CSV format dump file, which will contain all the data from all the tables from the SQL Server database. The command to do so is very similar to the above command. Be aware that this step may take a bit of time based on your database size. nuodb-migrator dump –source.driver=net.sourceforge.jtds.jdbc.Driver –source.url=jdbc:jtds:sqlserver://localhost:1433/ –source.username=test –source.password=test –source.catalog=test –source.schema=dbo –output.type=csv –output.path=/tmp/dump.cat Once the above command is successfully executed you can find your CSV file in the C:\tmp\ folder. However, you do not have to do anything manually. The third and final step will take care of completing the migration process. Migration Step 3: Load the Data into NuoDB After building schema and taking a dump of the data, the very next step is essential and crucial. It will take the CSV file and load it into the NuoDB database. nuodb-migrator load –target.url=jdbc:com.nuodb://localhost:48004/mytest –target.schema=dbo –target.username=test –target.password=test –input.path=/tmp/dump.cat Please note that in the above script we are now targeting the NuoDB database, which we have already created with the name of “MyTest”. If the database does not exist, create it manually before executing the above script. I have kept the username and password as “test”, but please make sure that you create a more secure password for your database for security reasons. Voila!  You’re Done That’s it. You are done. It took 3 setup and 3 migration steps to migrate your SQL Server database to NuoDB.  You can now start exploring the database and build excellent, scale-out applications. In this blog post, I have done my best to come up with simple and easy process, which you can follow to migrate your app from SQL Server to NuoDB. Download NuoDB I strongly encourage you to download NuoDB and go through my 3-step migration tutorial from SQL Server to NuoDB. Additionally here are two very important blog post from NuoDB CTO Seth Proctor. He has written excellent blog posts on the concept of the Administrative Domains. NuoDB has this concept of an Administrative Domain, which is a collection of hosts that can run one or multiple databases.  Each database has its own TEs and SMs, but all are managed within the Admin Console for that particular domain. http://www.nuodb.com/techblog/2013/03/11/getting-started-provisioning-a-domain/ http://www.nuodb.com/techblog/2013/03/14/getting-started-running-a-database/ Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • Developer’s Life – Attitude and Communication – They Can Cause Problems – Notes from the Field #027

    - by Pinal Dave
    [Note from Pinal]: This is a 27th episode of Notes from the Field series. The biggest challenge for anyone is to understand human nature. We human have so many things on our mind at any moment of time. There are cases when what we say is not what we mean and there are cases where what we mean we do not say. We do say and things as per our mood and our agenda in mind. Sometimes there are incidents when our attitude creates confusion in the communication and we end up creating a situation which is absolutely not warranted. In this episode of the Notes from the Field series database expert Mike Walsh explains a very crucial issue we face in our career, which is not technical but more to relate to human nature. Read on this may be the best blog post you might read in recent times. In this week’s note from the field, I’m taking a slight departure from technical knowledge and concepts explained. We’ll be back to it next week, I’m sure. Pinal wanted us to explain some of the issues we bump into and how we see some of our customers arrive at problem situations and how we have helped get them back on the right track. Often it is a technical problem we are officially solving – but in a lot of cases as a consultant, we are really helping fix some communication difficulties. This is a technical blog post and not an “advice column” in a newspaper – but the longer I am a consultant, the more years I add to my experience in technology the more I learn that the vast majority of the problems we encounter have “soft skills” included in the chain of causes for the issue we are helping overcome. This is not going to be exhaustive but I hope that sharing four pieces of advice inspired by real issues starts a process of searching for places where we can be the cause of these challenges and look at fixing them in ourselves. Or perhaps we can begin looking at resolving them in teams that we manage. I’ll share three statements that I’ve either heard, read or said and talk about some of the communication or attitude challenges highlighted by the statement. 1 – “But that’s the SAN Administrator’s responsibility…” I heard that early on in my consulting career when talking with a customer who had serious corruption and no good recent backups – potentially no good backups at all. The statement doesn’t have to be this one exactly, but the attitude here is an attitude of “my job stops here, and I don’t care about the intent or principle of why I’m here.” It’s also a situation of having the attitude that as long as there is someone else to blame, I’m fine…  You see in this case, the DBA had a suspicion that the backups were not being handled right.  They were the DBA and they knew that they had responsibility to ensure SQL backups were good to go – it’s a basic requirement of a production DBA. In my “As A DBA Where Do I start?!” presentation, I argue that is job #1 of a DBA. But in this case, the thought was that there was someone else to blame. Rather than create extra work and take on responsibility it was decided to just let it be another team’s responsibility. This failed the company, the company’s customers and no one won. As technologists – we should strive to go the extra mile. If there is a lack of clarity around roles and responsibilities and we know it – we should push to get it resolved. Especially as the DBAs who should act as the advocates of the data contained in the databases we are responsible for. 2 – “We’ve always done it this way, it’s never caused a problem before!” Complacency. I have to say that many failures I’ve been paid good money to help recover from would have not happened had it been for an attitude of complacency. If any thoughts like this have entered your mind about your situation you may be suffering from it. If, while reading this, you get this sinking feeling in your stomach about that one thing you know should be fixed but haven’t done it.. Why don’t you stop and go fix it then come back.. “We should have better backups, but we’re on a SAN so we should be fine really.” “Technically speaking that could happen, but what are the chances?” “We’ll just clean that up as a fast follow” ..and so on. In the age of tightening IT budgets, increased expectations of up time, availability and performance there is no room for complacency. Our customers and business units expect – no demand – the best. Complacency says “we will give you second best or hopefully good enough and we accept the risk and know this may hurt us later. Sometimes an organization will opt for “good enough” and I agree with the concept that at times the perfect can be the enemy of the good. But when we make those decisions in a vacuum and are not reporting them up and discussing them as an organization that is different. That is us unilaterally choosing to do something less than the best and purposefully playing a game of chance. 3 – “This device must accept interference from other devices but not create any” I’ve paraphrased this one – but it’s something the Federal Communications Commission – a federal agency in the United States that regulates electronic communication – requires of all manufacturers of any device that could cause or receive interference electronically. I blogged in depth about this here (http://www.straightpathsql.com/archives/2011/07/relationship-advice-from-the-fcc/) so I won’t go into much detail other than to say this… If we all operated more on the premise that we should do our best to not be the cause of conflict, and to be less easily offended and less upset when we perceive offense life would be easier in many areas! This doesn’t always cause the issues we are called in to help out. Not directly. But where we see it is in unhealthy relationships between the various technology teams at a client. We’ll see teams hoarding knowledge, not sharing well with others and almost working against other teams instead of working with them. If you trace these problems back far enough it often stems from someone or some group of people violating this principle from the FCC. To Sum It Up Technology problems are easy to solve. At Linchpin People we help many customers get past the toughest technological challenge – and at the end of the day it is really just a repeatable process of pattern based troubleshooting, logical thinking and starting at the beginning and carefully stepping through to the end. It’s easy at the end of the day. The tough part of what we do as consultants is the people skills. Being able to help get teams working together, being able to help teams take responsibility, to improve team to team communication? That is the difficult part, and we get to use the soft skills on every engagement. Work on professional development (http://professionaldevelopment.sqlpass.org/) and see continuing improvement here, not just with technology. I can teach just about anyone how to be an excellent DBA and performance tuner, but some of these soft skills are much more difficult to teach. If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – A Quick Look at Logging and Ideas around Logging

    - by pinaldave
    This blog post is written in response to the T-SQL Tuesday post on Logging. When someone talks about logging, personally I get lots of ideas about it. I have seen logging as a very generic term. Let me ask you this question first before I continue writing about logging. What is the first thing comes to your mind when you hear word “Logging”? Now ask the same question to the guy standing next to you. I am pretty confident that you will get  a different answer from different people. I decided to do this activity and asked 5 SQL Server person the same question. Question: What is the first thing comes to your mind when you hear the word “Logging”? Strange enough I got a different answer every single time. Let me just list what answer I got from my friends. Let us go over them one by one. Output Clause The very first person replied output clause. Pretty interesting answer to start with. I see what exactly he was thinking. SQL Server 2005 has introduced a new OUTPUT clause. OUTPUT clause has access to inserted and deleted tables (virtual tables) just like triggers. OUTPUT clause can be used to return values to client clause. OUTPUT clause can be used with INSERT, UPDATE, or DELETE to identify the actual rows affected by these statements. Here are some references for Output Clause: OUTPUT Clause Example and Explanation with INSERT, UPDATE, DELETE Reasons for Using Output Clause – Quiz Tips from the SQL Joes 2 Pros Development Series – Output Clause in Simple Examples Error Logs I was expecting someone to mention Error logs when it is about logging. The error log is the most looked place when there is any error either with the application or there is an error with the operating system. I have kept the policy to check my server’s error log every day. The reason is simple – enough time in my career I have figured out that when I am looking at error logs I find something which I was not expecting. There are cases, when I noticed errors in the error log and I fixed them before end user notices it. Other common practices I always tell my DBA friends to do is that when any error happens they should find relevant entries in the error logs and document the same. It is quite possible that they will see the same error in the error log  and able to fix the error based on the knowledge base which they have created. There can be many different kinds of error log files exists in SQL Server as well – 1) SQL Server Error Logs 2) Windows Event Log 3) SQL Server Agent Log 4) SQL Server Profile Log 5) SQL Server Setup Log etc. Here are some references for Error Logs: Recycle Error Log – Create New Log file without Server Restart SQL Error Messages Change Data Capture I got surprised with this answer. I think more than the answer I was surprised by the person who had answered me this one. I always thought he was expert in HTML, JavaScript but I guess, one should never assume about others. Indeed one of the cool logging feature is Change Data Capture. Change Data Capture records INSERTs, UPDATEs, and DELETEs applied to SQL Server tables, and makes a record available of what changed, where, and when, in simple relational ‘change tables’ rather than in an esoteric chopped salad of XML. These change tables contain columns that reflect the column structure of the source table you have chosen to track, along with the metadata needed to understand the changes that have been made. Here are some references for Change Data Capture: Introduction to Change Data Capture (CDC) in SQL Server 2008 Tuning the Performance of Change Data Capture in SQL Server 2008 Download Script of Change Data Capture (CDC) CDC and TRUNCATE – Cannot truncate table because it is published for replication or enabled for Change Data Capture Dynamic Management View (DMV) I like this answer. If asked I would have not come up with DMV right away but in the spirit of the original question, I think DMV does log the data. DMV logs or stores or records the various data and activity on the SQL Server. Dynamic management views return server state information that can be used to monitor the health of a server instance, diagnose problems, and tune performance. One can get plethero of information from DMVs – High Availability Status, Query Executions Details, SQL Server Resources Status etc. Here are some references for Dynamic Management View (DMV): SQL SERVER – Denali – DMV Enhancement – sys.dm_exec_query_stats – New Columns DMV – sys.dm_os_windows_info – Information about Operating System DMV – sys.dm_os_wait_stats Explanation – Wait Type – Day 3 of 28 DMV sys.dm_exec_describe_first_result_set_for_object – Describes the First Result Metadata for the Module Transaction Log Impact Detection Using DMV – dm_tran_database_transactions Log Files I almost flipped with this final answer from my friend. This should be probably the first answer. Yes, indeed log file logs the SQL Server activities. One can write infinite things about log file. SQL Server uses log file with the extension .ldf to manage transactions and maintain database integrity. Log file ensures that valid data is written out to database and system is in a consistent state. Log files are extremely useful in case of the database failures as with the help of full backup file database can be brought in the desired state (point in time recovery is also possible). SQL Server database has three recovery models – 1) Simple, 2) Full and 3) Bulk Logged. Each of the model uses the .ldf file for performing various activities. It is very important to take the backup of the log files (along with full backup) as one never knows when backup of the log file come into the action and save the day! How to Stop Growing Log File Too Big Reduce the Virtual Log Files (VLFs) from LDF file Log File Growing for Model Database – model Database Log File Grew Too Big master Database Log File Grew Too Big SHRINKFILE and TRUNCATE Log File in SQL Server 2008 Can I just say I loved this month’s T-SQL Tuesday Question. It really provoked very interesting conversation around me. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Optimization, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – SSMS Automatically Generates TOP (100) PERCENT in Query Designer

    - by pinaldave
    Earlier this week, I was surfing various SQL forums to see what kind of help developer need in the SQL Server world. One of the question indeed caught my attention. I am here regenerating complete question as well scenario to illustrate the point in a precise manner. Additionally, I have added added second part of the question to give completeness. Question: I am trying to create a view in Query Designer (not in the New Query Window). Every time I am trying to create a view it always adds  TOP (100) PERCENT automatically on the T-SQL script. No matter what I do, it always automatically adds the TOP (100) PERCENT to the script. I have attempted to copy paste from notepad, build a query and a few other things – there is no success. I am really not sure what I am doing wrong with Query Designer. Here is my query script: (I use AdventureWorks as a sample database) SELECT Person.Address.AddressID FROM Person.Address INNER JOIN Person.AddressType ON Person.Address.AddressID = Person.AddressType.AddressTypeID ORDER BY Person.Address.AddressID This script automatically replaces by following query: SELECT TOP (100) PERCENT Person.Address.AddressID FROM Person.Address INNER JOIN Person.AddressType ON Person.Address.AddressID = Person.AddressType.AddressTypeID ORDER BY Person.Address.AddressID However, when I try to do the same from New Query Window it works totally fine. However, when I attempt to create a view of the same query it gives following error. Msg 1033, Level 15, State 1, Procedure myView, Line 6 The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. It is pretty clear to me now that the script which I have written seems to need TOP (100) PERCENT, so Query . Why do I need it? Is there any work around to this issue. I particularly find this question pretty interesting as it really touches the fundamentals of the T-SQL query writing. Please note that the query which is automatically changed is not in New Query Editor but opened from SSMS using following way. Database >> Views >> Right Click >> New View (see the image below) Answer: The answer to the above question can be very long but I will keep it simple and to the point. There are three things to discuss in above script 1) Reason for Error 2) Reason for Auto generates TOP (100) PERCENT and 3) Potential solutions to the above error. Let us quickly see them in detail. 1) Reason for Error The reason for error is already given in the error. ORDER BY is invalid in the views and a few other objects. One has to use TOP or other keywords along with it. The way semantics of the query works where optimizer only follows(honors) the ORDER BY in the same scope or the same SELECT/UPDATE/DELETE statement. There is a possibility that one can order after the scope of the view again the efforts spend to order view will be wasted. The final resultset of the query always follows the final ORDER BY or outer query’s order and due to the same reason optimizer follows the final order of the query and not of the views (as view will be used in another query for further processing e.g. in SELECT statement). Due to same reason ORDER BY is now allowed in the view. For further accuracy and clear guidance I suggest you read this blog post by Query Optimizer Team. They have explained it very clear manner the same subject. 2) Reason for Auto Generated TOP (100) PERCENT One of the most popular workaround to above error is to use TOP (100) PERCENT in the view. Now TOP (100) PERCENT allows user to use ORDER BY in the query and allows user to overcome above error which we discussed. This gives the impression to the user that they have resolved the error and successfully able to use ORDER BY in the View. Well, this is incorrect as well. The way this works is when TOP (100) PERCENT is used the result is not guaranteed as well it is ignored in our the query where the view is used. Here is the blog post on this subject: Interesting Observation – TOP 100 PERCENT and ORDER BY. Now when you create a new view in the SSMS and build a query with ORDER BY to avoid the error automatically it adds the TOP 100 PERCENT. Here is the connect item for the same issue. I am sure there will be more connect items as well but I could not find them. 3) Potential Solutions If you are reading this post from the beginning in that case, it is clear by now that ORDER BY should not be used in the View as it does not serve any purpose unless there is a specific need of it. If you are going to use TOP 100 PERCENT with ORDER BY there is absolutely no need of using ORDER BY rather avoid using it all together. Here is another blog post of mine which describes the same subject ORDER BY Does Not Work – Limitation of the Views Part 1. It is valid to use ORDER BY in a view if there is a clear business need of using TOP with any other percentage lower than 100 (for example TOP 10 PERCENT or TOP 50 PERCENT etc). In most of the cases ORDER BY is not needed in the view and it should be used in the most outer query for present result in desired order. User can remove TOP 100 PERCENT and ORDER BY from the view before using the view in any query or procedure. In the most outer query there should be ORDER BY as per the business need. I think this sums up the concept in a few words. This is a very long topic and not easy to illustrate in one single blog post. I welcome your comments and suggestions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, SQL View, T SQL, Technology

    Read the article

  • SQL SERVER – SSMS: Disk Usage Report

    - by Pinal Dave
    Let us start with humor!  I think we the series on various reports, we come to a logical point. We covered all the reports at server level. This means the reports we saw were targeted towards activities that are related to instance level operations. These are mostly like how a doctor diagnoses a patient. At this point I am reminded of a dialog which I read somewhere: Patient: Doc, It hurts when I touch my head. Doc: Ok, go on. What else have you experienced? Patient: It hurts even when I touch my eye, it hurts when I touch my arms, it even hurts when I touch my feet, etc. Doc: Hmmm … Patient: I feel it hurts when I touch anywhere in my body. Doc: Ahh … now I get it. You need a plaster to your finger John. Sometimes the server level gives an indicator to what is happening in the system, but we need to get to the root cause for a specific database. So, this is the first blog in series where we would start discussing about database level reports. To launch database level reports, expand selected server in Object Explorer, expand the Databases folder, and then right-click any database for which we want to look at reports. From the menu, select Reports, then Standard Reports, and then any of database level reports. In this blog, we would talk about four “disk” reports because they are similar: Disk Usage Disk Usage by Top Tables Disk Usage by Table Disk Usage by Partition Disk Usage This report shows multiple information about the database. Let us discuss them one by one.  We have divided the output into 5 different sections. Section 1 shows the high level summary of the database. It shows the space used by database files (mdf and ldf). Under the hood, the report uses, various DMVs and DBCC Commands, it is using sys.data_spaces and DBCC SHOWFILESTATS. Section 2 and 3 are pie charts. One for data file allocation and another for the transaction log file. Pie chart for “Data Files Space Usage (%)” shows space consumed data, indexes, allocated to the SQL Server database, and unallocated space which is allocated to the SQL Server database but not yet filled with anything. “Transaction Log Space Usage (%)” used DBCC SQLPERF (LOGSPACE) and shows how much empty space we have in the physical transaction log file. Section 4 shows the data from Default Trace and looks at Event IDs 92, 93, 94, 95 which are for “Data File Auto Grow”, “Log File Auto Grow”, “Data File Auto Shrink” and “Log File Auto Shrink” respectively. Here is an expanded view for that section. If default trace is not enabled, then this section would be replaced by the message “Trace Log is disabled” as highlighted below. Section 5 of the report uses DBCC SHOWFILESTATS to get information. Here is the enhanced version of that section. This shows the physical layout of the file. In case you have In-Memory Objects in the database (from SQL Server 2014), then report would show information about those as well. Here is the screenshot taken for a different database, which has In-Memory table. I have highlighted new things which are only shown for in-memory database. The new sections which are highlighted above are using sys.dm_db_xtp_checkpoint_files, sys.database_files and sys.data_spaces. The new type for in-memory OLTP is ‘FX’ in sys.data_space. The next set of reports is targeted to get information about a table and its storage. These reports can answer questions like: Which is the biggest table in the database? How many rows we have in table? Is there any table which has a lot of reserved space but its unused? Which partition of the table is having more data? Disk Usage by Top Tables This report provides detailed data on the utilization of disk space by top 1000 tables within the Database. The report does not provide data for memory optimized tables. Disk Usage by Table This report is same as earlier report with few difference. First Report shows only 1000 rows First Report does order by values in DMV sys.dm_db_partition_stats whereas second one does it based on name of the table. Both of the reports have interactive sort facility. We can click on any column header and change the sorting order of data. Disk Usage by Partition This report shows the distribution of the data in table based on partition in the table. This is so similar to previous output with the partition details now. Here is the query taken from profiler. SELECT row_number() OVER (ORDER BY a1.used_page_count DESC, a1.index_id) AS row_number ,      (dense_rank() OVER (ORDER BY a5.name, a2.name))%2 AS l1 ,      a1.OBJECT_ID ,      a5.name AS [schema] ,       a2.name ,       a1.index_id ,       a3.name AS index_name ,       a3.type_desc ,       a1.partition_number ,       a1.used_page_count * 8 AS total_used_pages ,       a1.reserved_page_count * 8 AS total_reserved_pages ,       a1.row_count FROM sys.dm_db_partition_stats a1 INNER JOIN sys.all_objects a2  ON ( a1.OBJECT_ID = a2.OBJECT_ID) AND a1.OBJECT_ID NOT IN (SELECT OBJECT_ID FROM sys.tables WHERE is_memory_optimized = 1) INNER JOIN sys.schemas a5 ON (a5.schema_id = a2.schema_id) LEFT OUTER JOIN  sys.indexes a3  ON ( (a1.OBJECT_ID = a3.OBJECT_ID) AND (a1.index_id = a3.index_id) ) WHERE (SELECT MAX(DISTINCT partition_number) FROM sys.dm_db_partition_stats a4 WHERE (a4.OBJECT_ID = a1.OBJECT_ID)) >= 1 AND a2.TYPE <> N'S' AND  a2.TYPE <> N'IT' ORDER BY a5.name ASC, a2.name ASC, a1.index_id, a1.used_page_count DESC, a1.partition_number Using all of the above reports, you should be able to get the usage of database files and also space used by tables. I think this is too much disk information for a single blog and I hope you have used them in the past to get data. Do let me know if you found anything interesting using these reports in your environments. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #035

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Row Overflow Data Explanation  In SQL Server 2005 one table row can contain more than one varchar(8000) fields. One more thing, the exclusions has exclusions also the limit of each individual column max width of 8000 bytes does not apply to varchar(max), nvarchar(max), varbinary(max), text, image or xml data type columns. Comparison Index Fragmentation, Index De-Fragmentation, Index Rebuild – SQL SERVER 2000 and SQL SERVER 2005 An old but like a gold article. Talks about lots of concepts related to Index and the difference from earlier version to the newer version. I strongly suggest that everyone should read this article just to understand how SQL Server has moved forward with the technology. Improvements in TempDB SQL Server 2005 had come up with quite a lots of improvements and this blog post describes them and explains the same. If you ask me what is my the most favorite article from early career. I must point out to this article as when I wrote this one I personally have learned a lot of new things. Recompile All The Stored Procedure on Specific TableI prefer to recompile all the stored procedure on the table, which has faced mass insert or update. sp_recompiles marks stored procedures to recompile when they execute next time. This blog post explains the same with the help of a script.  2008 SQLAuthority Download – SQL Server Cheatsheet You can download and print this cheat sheet and use it for your personal reference. If you have any suggestions, please let me know and I will see if I can update this SQL Server cheat sheet. Difference Between DBMS and RDBMS What is the difference between DBMS and RDBMS? DBMS – Data Base Management System RDBMS – Relational Data Base Management System or Relational DBMS High Availability – Hot Add Memory Hot Add CPU and Hot Add Memory are extremely interesting features of the SQL Server, however, personally I have not witness them heavily used. These features also have few restriction as well. I blogged about them in detail. 2009 Delete Duplicate Rows I have demonstrated in this blog post how one can identify and delete duplicate rows. Interesting Observation of Logon Trigger On All Servers – Solution The question I put forth in my previous article was – In single login why the trigger fires multiple times; it should be fired only once. I received numerous answers in thread as well as in my MVP private news group. Now, let us discuss the answer for the same. The answer is – It happens because multiple SQL Server services are running as well as intellisense is turned on. Blog post demonstrates how we can do the same with the help of SQL scripts. Management Studio New Features I have selected my favorite 5 features and blogged about it. IntelliSense for Query Editing Multi Server Query Query Editor Regions Object Explorer Enhancements Activity Monitors Maximum Number of Index per Table One of the questions I asked in my user group was – What is the maximum number of Index per table? I received lots of answers to this question but only two answers are correct. Let us now take a look at them in this blog post. 2010 Default Statistics on Column – Automatic Statistics on Column The truth is, Statistics can be in a table even though there is no Index in it. If you have the auto- create and/or auto-update Statistics feature turned on for SQL Server database, Statistics will be automatically created on the Column based on a few conditions. Please read my previously posted article, SQL SERVER – When are Statistics Updated – What triggers Statistics to Update, for the specific conditions when Statistics is updated. 2011 T-SQL Scripts to Find Maximum between Two Numbers In this blog post there are two different scripts listed which demonstrates way to find the maximum number between two numbers. I need your help, which one of the script do you think is the most accurate way to find maximum number? Find Details for Statistics of Whole Database – DMV – T-SQL Script I was recently asked is there a single script which can provide all the necessary details about statistics for any database. This question made me write following script. I was initially planning to use sp_helpstats command but I remembered that this is marked to be deprecated in future. 2012 Introduction to Function SIGN SIGN Function is very fundamental function. It will return the value 1, -1 or 0. If your value is negative it will return you negative -1 and if it is positive it will return you positive +1. Let us start with a simple small example. Template Browser – A Very Important and Useful Feature of SSMS Templates are like a quick cheat sheet or quick reference. Templates are available to create objects like databases, tables, views, indexes, stored procedures, triggers, statistics, and functions. Templates are also available for Analysis Services as well. The template scripts contain parameters to help you customize the code. You can Replace Template Parameters dialog box to insert values into the script. An invalid floating point operation occurred If you run any of the above functions they will give you an error related to invalid floating point. Honestly there is no workaround except passing the function appropriate values. SQRT of a negative number will give you result in real numbers which is not supported at this point of time as well LOG of a negative number is not possible (because logarithm is the inverse function of an exponential function and the exponential function is NEVER negative). Validating Spatial Object with IsValidDetailed Function SQL Server 2012 has introduced the new function IsValidDetailed(). This function has made my life very easy. In simple words, this function will check if the spatial object passed is valid or not. If it is valid it will give information that it is valid. If the spatial object is not valid it will return the answer that it is not valid and the reason for the same. This makes it very easy to debug the issue and make the necessary correction. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #048

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Order of Result Set of SELECT Statement on Clustered Indexed Table When ORDER BY is Not Used Above theory is true in most of the cases. However SQL Server does not use that logic when returning the resultset. SQL Server always returns the resultset which it can return fastest.In most of the cases the resultset which can be returned fastest is the resultset which is returned using clustered index. Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT One of the Jr. Developer asked me this question (What will be the Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT?) while I was rushing to an important meeting. I was getting late so I asked him to talk with his Application Tech Lead. When I came back from meeting both of them were looking for me. They said they are confused. I quickly wrote down following example for them. 2008 SQL SERVER – Guidelines and Coding Standards Complete List Download Coding standards and guidelines are very important for any developer on the path of a successful career. A coding standard is a set of guidelines, rules and regulations on how to write code. Coding standards should be flexible enough or should take care of the situation where they should not prevent best practices for coding. They are basically the guidelines that one should follow for better understanding. Download Guidelines and Coding Standards complete List Download Get Answer in Float When Dividing of Two Integer Many times we have requirements of some calculations amongst different fields in Tables. One of the software developers here was trying to calculate some fields having integer values and divide it which gave incorrect results in integer where accurate results including decimals was expected. Puzzle – Computed Columns Datatype Explanation SQL Server automatically does a cast to the data type having the highest precedence. So the result of INT and INT will be INT, but INT and FLOAT will be FLOAT because FLOAT has a higher precedence. If you want a different data type, you need to do an EXPLICIT cast. Renaming SP is Not Good Idea – Renaming Stored Procedure Does Not Update sys.procedures I have written many articles about renaming a tables, columns and procedures SQL SERVER – How to Rename a Column Name or Table Name, here I found something interesting about renaming the stored procedures and felt like sharing it with you all. The interesting fact is that when we rename a stored procedure using SP_Rename command, the Stored Procedure is successfully renamed. But when we try to test the procedure using sp_helptext, the procedure will be having the old name instead of new names. 2009 Insert Values of Stored Procedure in Table – Use Table Valued Function It is clear from the result set that , where I have converted stored procedure logic into the table valued function, is much better in terms of logic as it saves a large number of operations. However, this option should be used carefully. The performance of the stored procedure is “usually” better than that of functions. Interesting Observation – Index on Index View Used in Similar Query Recently, I was working on an optimization project for one of the largest organizations. While working on one of the queries, we came across a very interesting observation. We found that there was a query on the base table and when the query was run, it used the index, which did not exist in the base table. On careful examination, we found that the query was using the index that was on another view. This was very interesting as I have personally never experienced a scenario like this. In simple words, “Query on the base table can use the index created on the indexed view of the same base table.” Interesting Observation – Execution Plan and Results of Aggregate Concatenation Queries Working with SQL Server has never seemed to be monotonous – no matter how long one has worked with it. Quite often, I come across some excellent comments that I feel like acknowledging them as blog posts. Recently, I wrote an article on SQL SERVER – Execution Plan and Results of Aggregate Concatenation Queries Depend Upon Expression Location, which is well received in the community. 2010 I encourage all of you to go through complete series and write your own on the subject. If you write an article and send it to me, I will publish it on this blog with due credit to you. If you write on your own blog, I will update this blog post pointing to your blog post. SQL SERVER – ORDER BY Does Not Work – Limitation of the View 1 SQL SERVER – Adding Column is Expensive by Joining Table Outside View – Limitation of the View 2 SQL SERVER – Index Created on View not Used Often – Limitation of the View 3 SQL SERVER – SELECT * and Adding Column Issue in View – Limitation of the View 4 SQL SERVER – COUNT(*) Not Allowed but COUNT_BIG(*) Allowed – Limitation of the View 5 SQL SERVER – UNION Not Allowed but OR Allowed in Index View – Limitation of the View 6 SQL SERVER – Cross Database Queries Not Allowed in Indexed View – Limitation of the View 7 SQL SERVER – Outer Join Not Allowed in Indexed Views – Limitation of the View 8 SQL SERVER – SELF JOIN Not Allowed in Indexed View – Limitation of the View 9 SQL SERVER – Keywords View Definition Must Not Contain for Indexed View – Limitation of the View 10 SQL SERVER – View Over the View Not Possible with Index View – Limitations of the View 11 2011 Startup Parameters Easy to Configure If you are a regular reader of this blog, you must be aware that I have written about SQL Server Denali recently. Here is the quickest way to reach into the screen where we can change the startup parameters. Go to SQL Server Configuration Manager >> SQL Server Services >> Right Click on the Server >> Properties >> Startup Parameters 2012 Validating Unique Columnname Across Whole Database I sometimes come across very strange requirements and often I do not receive a proper explanation of the same. Here is the one of those examples. For example “Our business requirement is when we add new column we want it unique across current database.” Read the solution to this strange request in this blog post. Excel Losing Decimal Values When Value Pasted from SSMS ResultSet It is very common when users are coping the resultset to Excel, the floating point or decimals are missed. The solution is very much simple and it requires a small adjustment in the Excel. By default Excel is very smart and when it detects the value which is getting pasted is numeric it changes the column format to accommodate that. Basic Calculation and PEMDAS Order of Operation Read this interesting blog post for fantastic conversation about the subject. Copy Column Headers from Resultset – SQL in Sixty Seconds #027 – Video http://www.youtube.com/watch?v=x_-3tLqTRv0 Delete From Multiple Table – Update Multiple Table in Single Statement There are two questions which I get every single day multiple times. In my gmail, I have created standard canned reply for them. Let us see the questions here. I want to delete from multiple table in a single statement how will I do it? I want to update multiple table in a single statement how will I do it? Read the answer in the blog post. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56  | Next Page >