Search Results

Search found 848 results on 34 pages for 'sqlauthority'.

Page 33/34 | < Previous Page | 29 30 31 32 33 34 | Next Page >

SQL SERVER – SQL in Sixty Seconds – 5 Videos from Joes 2 Pros Series – SQL Exam Prep Series 70-433

- by pinaldave

Joes 2 Pros SQL Server Learning series is indeed fun. Joes 2 Pros series is written for beginners and who wants to build expertise for SQL Server programming and development from fundamental. In the beginning of the series author Rick Morelan is not shy to explain the simplest concept of how to open SQL Server Management Studio. Honestly the book starts with that much basic but as it progresses further Rick discussing about various advanced concepts from query tuning to Core Architecture. This five part series is written with keeping SQL Server Exam 70-433. Instead of just focusing on what will be there in exam, this series is focusing on learning the important concepts thoroughly. This book no way take short cut to explain any concepts and at times, will go beyond the topic at length. The best part is that all the books has many companion videos explaining the concepts and videos. Every Wednesday I like to post a video which explains something in quick few seconds. Today we will go over five videos which I posted in my earlier posts related to Joes 2 Pros series. Introduction to XML Data Type Methods – SQL in Sixty Seconds #015 The XML data type was first introduced with SQL Server 2005. This data type continues with SQL Server 2008 where expanded XML features are available, most notably is the power of the XQuery language to analyze and query the values contained in your XML instance. There are five XML data type methods available in SQL Server 2008: query() – Used to extract XML fragments from an XML data type. value() – Used to extract a single value from an XML document. exist() – Used to determine if a specified node exists. Returns 1 if yes and 0 if no. modify() – Updates XML data in an XML data type. node() – Shreds XML data into multiple rows (not covered in this blog post). [Detailed Blog Post] | [Quiz with Answer] Introduction to SQL Error Actions – SQL in Sixty Seconds #014 Most people believe that when SQL Server encounters an error severity level 11 or higher the remaining SQL statements will not get executed. In addition, people also believe that if any error severity level of 11 or higher is hit inside an explicit transaction, then the whole statement will fail as a unit. While both of these beliefs are true 99% of the time, they are not true in all cases. It is these outlying cases that frequently cause unexpected results in your SQL code. To understand how to achieve consistent results you need to know the four ways SQL Error Actions can react to error severity levels 11-16: Statement Termination – The statement with the procedure fails but the code keeps on running to the next statement. Transactions are not affected. Scope Abortion – The current procedure, function or batch is aborted and the next calling scope keeps running. That is, if Stored Procedure A calls B and C, and B fails, then nothing in B runs but A continues to call C. @@Error is set but the procedure does not have a return value. Batch Termination – The entire client call is terminated. XACT_ABORT – (ON = The entire client call is terminated.) or (OFF = SQL Server will choose how to handle all errors.) [Detailed Blog Post] | [Quiz with Answer] Introduction to Basics of a Query Hint – SQL in Sixty Seconds #013 Query hints specify that the indicated hints should be used throughout the query. Query hints affect all operators in the statement and are implemented using the OPTION clause. Cautionary Note: Because the SQL Server Query Optimizer typically selects the best execution plan for a query, it is highly recommended that hints be used as a last resort for experienced developers and database administrators to achieve the desired results. [Detailed Blog Post] | [Quiz with Answer] Introduction to Hierarchical Query – SQL in Sixty Seconds #012 A CTE can be thought of as a temporary result set and are similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. A CTE is generally considered to be more readable than a derived table and does not require the extra effort of declaring a Temp Table while providing the same benefits to the user. However; a CTE is more powerful than a derived table as it can also be self-referencing, or even referenced multiple times in the same query. A recursive CTE requires four elements in order to work properly: Anchor query (runs once and the results ‘seed’ the Recursive query) Recursive query (runs multiple times and is the criteria for the remaining results) UNION ALL statement to bind the Anchor and Recursive queries together. INNER JOIN statement to bind the Recursive query to the results of the CTE. [Detailed Blog Post] | [Quiz with Answer] Introduction to SQL Server Security – SQL in Sixty Seconds #011 Let’s get some basic definitions down first. Take the workplace example where “Tom” needs “Read” access to the “Financial Folder”. What are the Securable, Principal, and Permissions from that last sentence? A Securable is a resource that someone might want to access (like the Financial Folder). A Principal is anything that might want to gain access to the securable (like Tom). A Permission is the level of access a principal has to a securable (like Read). [Detailed Blog Post] | [Quiz with Answer] Please leave a comment explain which one was your favorite video as that will help me understand what works and what needs improvement. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology, Video

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – SSMS: Database Consistency History Report

- by Pinal Dave

Doctor and Database The last place I like to visit is always a hospital. With the monsoon season starting, intermittent rains, it has become sort of a routine to get a cycle of fever every other year (seriously I hate it). So when I visit my doctor, it is always interesting in the way he quizzes me. The routine question of – “How many days have you had this?”, “Is there any pattern?”, “Did you drench in rain?”, “Do you have any other symptom?” and so on. The idea here is that the doctor wants to find any anomaly or a pattern that will guide him to a viral or bacterial type. Most of the time they get it based on experience and sometimes after a battery of tests. So if there is consistent behavior to your problem, there is always a solution out. SQL Server has its way to find if the server data / files are in consistent state using the DBCC commands. Back to SQL Server In real life, Database consistency check is one of the critical operations a DBA generally doesn’t give much priority. Many readers of my blogs have asked many times, how do we know if the database is consistent? How do I read output of DBCC CHECKDB and find if everything is right or not? My common answer to all of them is – look at the bottom of checkdb (or checktable) output and look for below line. CHECKDB found 0 allocation errors and 0 consistency errors in database ‘DatabaseName’. Above is a “good sign” because we are seeing zero allocation and zero consistency error. If you are seeing non-zero errors then there is some problem with the database. Sample output is shown as below: CHECKDB found 0 allocation errors and 2 consistency errors in database ‘DatabaseName’. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DatabaseName). If we see non-zero error then most of the time (not always) we get repair options depending on the level of corruption. There is risk involved with above option (repair_allow_data_loss), that is – we would lose the data. Sometimes the option would be repair_rebuild which is little safer. Though these options are available, it is important to find the root cause to the problem. In standard report, there is a report which can show the history of checkdb executed for the selected database. Since this is a database level report, we need to right click on database, click Reports, click Standard Reports and then choose “Database Consistency History” report. The information in this report is picked from default trace. If default trace is disabled or there is no checkdb run or information is not there in default trace (because it’s rolled over), we would get report like below. As we can see report says it very clearly: Currently, no execution history of CHECKDB is available or default trace is not enabled. To demonstrate, I have caused corruption in one of the database and did below steps. Run CheckDB so that errors are reported. Fix the corruption by losing the data using repair option Run CheckDB again to check if corruption is cleared. After that I have launched the report and below is what we would see. If you are lazy like me and don’t want to run the report manually for each database then below query would be handy to provide same report for all database. This query is runs behind the scenes by the report. All I have done is remove the filter for database name (at the last – highlighted). DECLARE @curr_tracefilename VARCHAR(500); DECLARE @base_tracefilename VARCHAR(500); DECLARE @indx INT; SELECT @curr_tracefilename = path FROM sys.traces WHERE is_default = 1; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SELECT @indx = PATINDEX('%\%', @curr_tracefilename) ; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SET @base_tracefilename = LEFT( @curr_tracefilename,LEN(@curr_tracefilename) - @indx) + '\log.trc'; SELECT SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),36, PATINDEX('%executed%',TEXTData)-36) AS command , LoginName , StartTime , CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%found%',TEXTData) +6,PATINDEX('%errors %',TEXTData)-PATINDEX('%found%',TEXTData)-6)) AS errors , CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%repaired%',TEXTData) +9,PATINDEX('%errors.%',TEXTData)-PATINDEX('%repaired%',TEXTData)-9)) repaired , SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%time:%',TEXTData)+6,PATINDEX('%hours%',TEXTData)-PATINDEX('%time:%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%hours%',TEXTData) +6,PATINDEX('%minutes%',TEXTData)-PATINDEX('%hours%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%minutes%',TEXTData) +8,PATINDEX('%seconds.%',TEXTData)-PATINDEX('%minutes%',TEXTData)-8) AS time FROM::fn_trace_gettable( @base_tracefilename, DEFAULT) WHERE EventClass = 22 AND SUBSTRING(TEXTData,36,12) = 'DBCC CHECKDB' -- AND DatabaseName = @DatabaseName; Don’t get worried about the logic above. All it is doing is reading the trace files, parsing below entry and getting out information for underlined words. DBCC CHECKDB (CorruptedDatabase) executed by sa found 2 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds. Internal database snapshot has split point LSN = 00000029:00000030:0001 and first LSN = 00000029:00000020:0001. Hopefully now onwards you would run checkdb and understand the importance of it. As responsible DBAs I am sure you are already doing it, let me know how often do you actually run them on you production environment? Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words - Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Buzz Words: What is Hadoop – Day 6 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is NoSQL. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – Hadoop. What is Hadoop? Apache Hadoop is an open-source, free and Java based software framework offers a powerful distributed platform to store and manage Big Data. It is licensed under an Apache V2 license. It runs applications on large clusters of commodity hardware and it processes thousands of terabytes of data on thousands of the nodes. Hadoop is inspired from Google’s MapReduce and Google File System (GFS) papers. The major advantage of Hadoop framework is that it provides reliability and high availability. What are the core components of Hadoop? There are two major components of the Hadoop framework and both fo them does two of the important task for it. Hadoop MapReduce is the method to split a larger data problem into smaller chunk and distribute it to many different commodity servers. Each server have their own set of resources and they have processed them locally. Once the commodity server has processed the data they send it back collectively to main server. This is effectively a process where we process large data effectively and efficiently. (We will understand this in tomorrow’s blog post). Hadoop Distributed File System (HDFS) is a virtual file system. There is a big difference between any other file system and Hadoop. When we move a file on HDFS, it is automatically split into many small pieces. These small chunks of the file are replicated and stored on other servers (usually 3) for the fault tolerance or high availability. (We will understand this in the day after tomorrow’s blog post). Besides above two core components Hadoop project also contains following modules as well. Hadoop Common: Common utilities for the other Hadoop modules Hadoop Yarn: A framework for job scheduling and cluster resource management There are a few other projects (like Pig, Hive) related to above Hadoop as well which we will gradually explore in later blog posts. A Multi-node Hadoop Cluster Architecture Now let us quickly see the architecture of the a multi-node Hadoop cluster. A small Hadoop cluster includes a single master node and multiple worker or slave node. As discussed earlier, the entire cluster contains two layers. One of the layer of MapReduce Layer and another is of HDFC Layer. Each of these layer have its own relevant component. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. A slave or worker node consists of a DataNode and TaskTracker. It is also possible that slave node or worker node is only data or compute node. The matter of the fact that is the key feature of the Hadoop. In this introductory blog post we will stop here while describing the architecture of Hadoop. In a future blog post of this 31 day series we will explore various components of Hadoop Architecture in Detail. Why Use Hadoop? There are many advantages of using Hadoop. Let me quickly list them over here: Robust and Scalable – We can add new nodes as needed as well modify them. Affordable and Cost Effective – We do not need any special hardware for running Hadoop. We can just use commodity server. Adaptive and Flexible – Hadoop is built keeping in mind that it will handle structured and unstructured data. Highly Available and Fault Tolerant – When a node fails, the Hadoop framework automatically fails over to another node. Why Hadoop is named as Hadoop? In year 2005 Hadoop was created by Doug Cutting and Mike Cafarella while working at Yahoo. Doug Cutting named Hadoop after his son’s toy elephant. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – MapReduce. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Number-Crunching with SQL Server – Exceed the Functionality of Excel

- by Pinal Dave

Imagine this. Your users have developed an Excel spreadsheet that extracts data from your SQL Server database, manipulates that data through the use of Excel formulas and, possibly, some VBA code which is then used to calculate P&L, hedging requirements or even risk numbers. Management comes to you and tells you that they need to get rid of the spreadsheet and that the results of the spreadsheet calculations need to be persisted on the database. SQL Server has a very small set of functions for analyzing data. Excel has hundreds of functions for analyzing data, with many of them focused on specific financial and statistical calculations. Is it even remotely possible that you can use SQL Server to replace the complex calculations being done in a spreadsheet? Westclintech has developed a library of functions that match or exceed the functionality of Excel’s functions and contains many functions that are not available in EXCEL. Their XLeratorDB library of functions contains over 700 functions that can be incorporated into T-SQL statements. XLeratorDB takes advantage of the SQL CLR architecture introduced in SQL Server 2005. SQL CLR permits managed code to be compiled into the database and run alongside built-in SQL Server functions like COUNT or SUM. The Westclintech developers have taken advantage of this architecture to bring robust analytical functions to the database. In our hypothetical spreadsheet, let’s assume that our users are using the YIELD function and that the data are extracted from a table in our database called BONDS. Here’s what the spreadsheet might look like. We go to column G and see that it contains the following formula. Obviously, SQL Server does not offer a native YIELD function. However, with XLeratorDB we can replicate this calculation in SQL Server with the following statement: SELECT *, wct.YIELD(CAST(GETDATE() AS date),Maturity,Rate,Price,100,Frequency,Basis) AS YIELD FROM BONDS This produces the following result. This illustrates one of the best features about XLeratorDB; it is so easy to use. Since I knew that the spreadsheet was using the YIELD function I could use the same function with the same calling structure to do the calculation in SQL Server. I didn’t need to know anything at all about the mechanics of calculating the yield on a bond. It was pretty close to cut and paste. In fact, that’s one way to construct the SQL. Just copy the function call from the cell in the spreadsheet and paste it into SMS and change the cell references to column names. I built the SQL for this query by starting with this. SELECT * ,YIELD(TODAY(),B2,C2,D2,100,E2,F2) FROM BONDS I then changed the cell references to column names. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) ,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Finally, I replicated the TODAY() function using GETDATE() and added the schema name to the function name. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) --,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) ,wct.YIELD(GETDATE(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Then I am able to execute the statement returning the results seen above. The XLeratorDB libraries are heavy on financial, statistical, and mathematical functions. Where there is an analog to an Excel function, the XLeratorDB function uses the same naming conventions and calling structure as the Excel function, but there are also hundreds of additional functions for SQL Server that are not found in Excel. You can find the functions by opening Object Explorer in SQL Server Management Studio (SSMS) and expanding the Programmability folder under the database where the functions have been installed. The Functions folder expands to show 3 sub-folders: Table-valued Functions; Scalar-valued functions, Aggregate Functions, and System Functions. You can expand any of the first three folders to see the XLeratorDB functions. Since the wct.YIELD function is a scalar function, we will open the Scalar-valued Functions folder, scroll down to the wct.YIELD function and and click the plus sign (+) to display the input parameters. The functions are also Intellisense-enabled, with the input parameters displayed directly in the query tab. The Westclintech website contains documentation for all the functions including examples that can be copied directly into a query window and executed. There are also more one hundred articles on the site which go into more detail about how some of the functions work and demonstrate some of the extensive business processes that can be done in SQL Server using XLeratorDB functions and some T-SQL. XLeratorDB is organized into libraries: finance, statistics; math; strings; engineering; and financial options. There is also a windowing library for SQL Server 2005, 2008, and 2012 which provides functions for calculating things like running and moving averages (which were introduced in SQL Server 2012), FIFO inventory calculations, financial ratios and more, without having to use triangular joins. To get started you can download the XLeratorDB 15-day free trial from the Westclintech web site. It is a fully-functioning, unrestricted version of the software. If you need more than 15 days to evaluate the software, you can simply download another 15-day free trial. XLeratorDB is an easy and cost-effective way to start adding sophisticated data analysis to your SQL Server database without having to know anything more than T-SQL. Get XLeratorDB Today and Now! Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Excel

Read the article
SQL SERVER – Columnstore Index and sys.dm_db_index_usage_stats

- by pinaldave

As you know I have been writing on Columnstore Index for quite a while. Recently my friend Vinod Kumar wrote about SQL Server 2012: ColumnStore Characteristics. A fantastic read on the subject if you have yet not caught up on that subject. After the blog post I called him and asked what should I write next on this subject. He suggested that I should write on DMV script which I have prepared related to Columnstore when I was writing our SQL Server Questions and Answers book. When we were writing this book SQL Server 2012 CTP versions were available. I had written few scripts related to SQL Server columnstore Index. I like Vinod’s idea and I decided to write about DMV, which we did not cover in the book as SQL Server 2012 was not released yet. We did not want to talk about the product which was not yet released. The first script which I had written was with DMV - sys.column_store_index_stats. This DMV was displaying the statistics of the columnstore indexes. When I attempted to run it on SQL Server 2012 RTM it gave me error suggesting that this DMV does not exists. Here is the script which I ran: SELECT * FROM sys.column_store_index_stats; It generated following error: Msg 208, Level 16, State 1, Line 1 Invalid object name ‘column_store_index_stats’. I was pretty confident that this DMV was available when I had written the scripts. The next reaction was to type ‘sys.’ only in SSMS and wait for intelisense to popup DMV list. I scrolled down and noticed that above said DMV did not exists there as well. Now this is not bug or missing feature. This was indeed something can happen because the version which I was practicing was early CTP version. If you go to the page of the DMV here, it clearly stats notice on the top of the page. This documentation is for preview only, and is subject to change in later releases. Now this was not alarming but my next thought was if this DMV is not there where can I find the information which this DMV was providing. Well, while I was thinking about this, I noticed that my another friend Balmukund Lakhani was online on personal messenger. Well, Balmukund is “Know All” kid. I have yet to find situation where I have not got my answers from him. I immediately pinged him and asked the question regarding where can I find information of ‘column_store_index_stats’. His answer was very abrupt but enlightening for sure. Here is our conversation: Pinal: Where can I find information of column_store_index_stats? Balmukund: Assume you have never worked with CTP before and now try to find the information which you are trying to find. Honestly it was fantastic response from him. I was confused as I have played extensively with CTP versions of SQL Server 2012. Now his response give me big hint. I should have not looked for DMV but rather should have focused on what I wanted to do. I wanted to retrieve the statistics related to the index. In SQL Server 2008/R2, I was able to retrieve the statistics of the index from the DMV - sys.dm_db_index_usage_stats. I used the same DMV on SQL Server 2012 and it did retrieved the necessary information for me. Here is the updated script which gave me all the necessary information I was looking for. Matter of the fact, if I have used my earlier SQL Server 2008 R2 script this would have just worked fine. SELECT DB_NAME(Database_ID) DBName, SCHEMA_NAME(schema_id) AS SchemaName, OBJECT_NAME(ius.OBJECT_ID) ObjName, i.type_desc, i.name, user_seeks, user_scans, user_lookups, user_updates,* FROM sys.dm_db_index_usage_stats ius INNER JOIN sys.indexes i ON i.index_id = ius.index_id AND ius.OBJECT_ID = i.OBJECT_ID INNER JOIN sys.tables t ON t.OBJECT_ID = i.OBJECT_ID GO Let us see the resultset of above query. You will notice that column Type_desc describes the type of the index. You can additionally write WHERE condition on the column and only retrieve only selected type of Index. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Index, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1

- by Pinal Dave

This is the first part of the series Incremental Statistics. Here is the index of the complete series. What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1 Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2 DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3 Statistics are considered one of the most important aspects of SQL Server Performance Tuning. You might have often heard the phrase, with related to performance tuning. “Update Statistics before you take any other steps to tune performance”. Honestly, I have said above statement many times and many times, I have personally updated statistics before I start to do any performance tuning exercise. You may agree or disagree to the point, but there is no denial that Statistics play an extremely vital role in the performance tuning. SQL Server 2014 has a new feature called Incremental Statistics. I have been playing with this feature for quite a while and I find that very interesting. After spending some time with this feature, I decided to write about this subject over here. New in SQL Server 2014 – Incremental Statistics Well, it seems like lots of people wants to start using SQL Server 2014′s new feature of Incremetnal Statistics. However, let us understand what actually this feature does and how it can help. I will try to simplify this feature first before I start working on the demo code. Code for all versions of SQL Server Here is the code which you can execute on all versions of SQL Server and it will update the statistics of your table. The keyword which you should pay attention is WITH FULLSCAN. It will scan the entire table and build brand new statistics for you which your SQL Server Performance Tuning engine can use for better estimation of your execution plan. UPDATE STATISTICS TableName(StatisticsName) WITH FULLSCAN Who should learn about this? Why? If you are using partitions in your database, you should consider about implementing this feature. Otherwise, this feature is pretty much not applicable to you. Well, if you are using single partition and your table data is in a single place, you still have to update your statistics the same way you have been doing. If you are using multiple partitions, this may be a very useful feature for you. In most cases, users have multiple partitions because they have lots of data in their table. Each partition will have data which belongs to itself. Now it is very common that each partition are populated separately in SQL Server. Real World Example For example, if your table contains data which is related to sales, you will have plenty of entries in your table. It will be a good idea to divide the partition into multiple filegroups for example, you can divide this table into 3 semesters or 4 quarters or even 12 months. Let us assume that we have divided our table into 12 different partitions. Now for the month of January, our first partition will be populated and for the month of February our second partition will be populated. Now assume, that you have plenty of the data in your first and second partition. Now the month of March has just started and your third partition has started to populate. Due to some reason, if you want to update your statistics, what will you do? In SQL Server 2012 and earlier version You will just use the code of WITH FULLSCAN and update the entire table. That means even though you have only data in third partition you will still update the entire table. This will be VERY resource intensive process as you will be updating the statistics of the partition 1 and 2 where data has not changed at all. In SQL Server 2014 You will just update the partition of Partition 3. There is a special syntax where you can now specify which partition you want to update now. The impact of this is that it is smartly merging the new data with old statistics and update the entire statistics without doing FULLSCAN of your entire table. This has a huge impact on performance. Remember that the new feature in SQL Server 2014 does not change anything besides the capability to update a single partition. However, there is one feature which is indeed attractive. Previously, when table data were changed 20% at that time, statistics update were triggered. However, now the same threshold is applicable to a single partition. That means if your partition faces 20% data, change it will also trigger partition level statistics update which, when merged to your final statistics will give you better performance. In summary If you are not using a partition, this feature is not applicable to you. If you are using a partition, this feature can be very helpful to you. Tomorrow: We will see working code of SQL Server 2014 Incremental Statistics. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SQL Statistics, Statistics

Read the article
SQL SERVER – Recover the Accidentally Renamed Table

- by pinaldave

I have no answer to following question. I saw a desperate email marked as urgent delivered in my mailbox. “I accidentally renamed table in my SSMS. I was scrolling very fast and I made mistakes. It was either because I double clicked or clicked on F2 (shortcut key for renaming). However, I have made the mistake and now I have no idea how to fix this. I am in big trouble. Help me get my original tablename.” I have seen many similar scenarios in my life and they give me a very good opportunity to preach wisdom but when the house is burning, we cannot talk about how we should have conserved the water earlier. The goal at that point is to put off the fire as fast as we can. I decided to answer this email with my best knowledge. If you have renamed the table, I think you pretty much is out of luck. Here are few things which you can do which can give you idea about what your tablename can be if you are lucky. Method 1: (Not Recommended but try your luck) Check your naming convention of your system. I have often seen that many organizations name their index as IX_TableName_Colms or name their keys as FK_TableName1_TableName2_Cols. If your organization is following the same you can get the name from your table, you may refer your keys. Again, note that this is quite possible that your tablename was already renamed and your keys were not updated. This can easily lead you to select incorrect name. I think follow this if you are confident or move to the next method. Method 2: (Not Recommended but try your luck) This method is also based on your orgs naming convention. If you use the name of the table in any columnname (some organizations use tablename in their incremental identity column name), you can get that name from there. Method 3: (Not Recommended but try your luck) If you know where your table was used in your stored procedures, you can script your stored procedure and find the name of the table back. Method 4: (Try your luck) All the best organizations first create a data model of the schema and there is good chance that this table is used there, you should take your chances and refer original document. If your organization is good at managing docs or source code, you will get the name of the table back for sure. Method 5: (It WORKS but try on a development server) There is no sure way to get you the name of the table which you accidentally renamed however, there is one way which will work for sure. You need to take your latest full backup and restore it on your development server (remember not on production or where you have renamed this column). Now restore latest differential file of the full backup. Now restore all the log files one by one making sure that you are restoring before the point of time of you renamed the tablename. Now go to explore and this will give you the name of the table which you have renamed. If you are confident that the same table existed with the same name when the last full backup was made, you do not have to go to all the steps. You can just get the name of the table directly from last backup’s restore. Read the article about Backup Timeline. Wisdom: How can I miss to preach wisdom when I get the opportunity to do so? Here are a few points to remember. Use a different account to explore production environment. Do not use the same account which have all the rights and permissions all the time. Use the account which has read only permissions if there are no modification required. Use policy based management to prevent changes which are accidental. If there was policy of valid names, the accidental change of the table was not possible unless it was intentional delibarate changes. Have a proper auditing of the system in place. You can use DDL triggers but be careful with its usage (get it reviewed properly first). (Add your suggestion here) I guess Method 5 will work all the time (using point in time restore). Everything else is chance of luck and if you are lucky are bad – you will get further incorrect name. Now go back and read the first line of this blog. Out of five method four methods are just lucky guesses. The method 5 will work but again it is a lengthy process if the size of the database is huge or if you do not have full backup. Did I miss anything obvious? Please leave a comment and I will publish your answer with due credit. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Not Possible – Delete From Multiple Table – Update Multiple Table in Single Statement

- by pinaldave

There are two questions which I get every single day multiple times. In my gmail, I have created standard canned reply for them. Let us see the questions here. I want to delete from multiple table in a single statement how will I do it? I want to update multiple table in a single statement how will I do it? The answer is – No, You cannot and you should not. SQL Server does not support deleting or updating from two tables in a single update. If you want to delete or update two different tables – you may want to write two different delete or update statements for it. This method has many issues – from the consistency of the data to SQL syntax. Now here is the real reason for this blog post – yesterday I was asked this question again and I replied my canned answer saying it is not possible and it should not be any way implemented that day. In the response to my reply I was pointed out to my own blog post where user suggested that I had previously mentioned this is possible and with demo example. Let us go over my conversation – you may find it interesting. Let us call the user DJ. DJ: Pinal, can we delete multiple table in a single statement or with single delete statement? Pinal: No, you cannot and you should not. DJ: Oh okey, if that is the case, why do you suggest to do that? Pinal: (baffled) I am not suggesting that. I am rather suggesting that it is not possible and it should not be possible. DJ: Hmm… but in that case why did you blog about it earlier? Pinal: (What?) No, I did not. I am pretty confident. DJ: Well, I am confident as well. You did. Pinal: In that case, it is my word against your word. Isn’t it? DJ: I have proof. Do you want to see it that you suggest it is possible? Pinal: Yes, I will be delighted too. (After 10 Minutes) DJ: Here are not one but two of your blog posts which talks about it - SQL SERVER – Curious Case of Disappearing Rows – ON UPDATE CASCADE and ON DELETE CASCADE – Part 1 of 2 SQL SERVER – Curious Case of Disappearing Rows – ON UPDATE CASCADE and ON DELETE CASCADE – T-SQL Example – Part 2 of 2 Pinal: Oh! DJ: I know I was correct. Pinal: Well, oh man, I did not mean there what you mean here. DJ: I did not understand can you explain it further. Pinal: Here we go. The example in the other blog is the example of the cascading delete or cascading update. I think you may want to understand the concept of the foreign keys and cascading update/delete. The concept of cascading exists to maintain data integrity. If there primary keys get deleted the update or delete reflects on the foreign key table to maintain the key integrity and data consistency. SQL Server follows ANSI Entry SQL with regard to referential integrity between PrimaryKey and ForeignKey columns which requires the inserting, updating, and deleting of data in related tables to be restricted to values that preserve the integrity. This is all together different concept than deleting multiple values in a single statement. When I hear that someone wants to delete or update multiple table in a single statement what I assume is something very similar to following. DELETE/UPDATE Table 1 (cols) Table 2 (cols) VALUES … which is not valid statement/syntax as well it is not ASNI standards as well. I guess, after this discussion with DJ, I realize I need to do a blog post so I can add the link to this blog post in my canned answer. Well, it was a fun conversation with DJ and I hope it the message is very clear now. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Joins, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Curious Case of Disappearing Rows – ON UPDATE CASCADE and ON DELETE CASCADE – Part 1 of 2

- by pinaldave

Social media has created an Always Connected World for us. Recently I enrolled myself to learn new technologies as a student. I had decided to focus on learning and decided not to stay connected on the internet while I am in the learning session. On the second day of the event after the learning was over, I noticed lots of notification from my friend on my various social media handle. He had connected with me on Twitter, Facebook, Google+, LinkedIn, YouTube as well SMS, WhatsApp on the phone, Skype messages and not to forget with a few emails. I right away called him up. The problem was very unique – let us hear the problem in his own words. “Pinal – we are in big trouble we are not able to figure out what is going on. Our product details table is continuously loosing rows. Lots of rows have disappeared since morning and we are unable to find why the rows are getting deleted. We have made sure that there is no DELETE command executed on the table as well. The matter of the fact, we have removed every single place the code which is referencing the table. We have done so many crazy things out of desperation but no luck. The rows are continuously deleted in a random pattern. Do you think we have problems with intrusion or virus?” After describing the problems he had pasted few rants about why I was not available during the day. I think it will be not smart to post those exact words here (due to many reasons). Well, my immediate reaction was to get online with him. His problem was unique to him and his team was all out to fix the issue since morning. As he said he has done quite a lot out in desperation. I started asking questions from audit, policy management and profiling the data. Very soon I realize that I think this problem was not as advanced as it looked. There was no intrusion, SQL Injection or virus issue. Well, long story short first - It was a very simple issue of foreign key created with ON UPDATE CASCADE and ON DELETE CASCADE. CASCADE allows deletions or updates of key values to cascade through the tables defined to have foreign key relationships that can be traced back to the table on which the modification is performed. ON DELETE CASCADE specifies that if an attempt is made to delete a row with a key referenced by foreign keys in existing rows in other tables, all rows containing those foreign keys are also deleted. ON UPDATE CASCADE specifies that if an attempt is made to update a key value in a row, where the key value is referenced by foreign keys in existing rows in other tables, all of the foreign key values are also updated to the new value specified for the key. (Reference: BOL) In simple words – due to ON DELETE CASCASE whenever is specified when the data from Table A is deleted and if it is referenced in another table using foreign key it will be deleted as well. In my friend’s case, they had two tables, Products and ProductDetails. They had created foreign key referential integrity of the product id between the table. Now the as fall was up they were updating their catalogue. When they were updating the catalogue they were deleting products which are no more available. As the changes were cascading the corresponding rows were also deleted from another table. This is CORRECT. The matter of the fact, there is no error or anything and SQL Server is behaving how it should be behaving. The problem was in the understanding and inappropriate implementations of business logic. What they needed was Product Master Table, Current Product Catalogue, and Product Order Details History tables. However, they were using only two tables and without proper understanding the relation between them was build using foreign keys. If there were only two table, they should have used soft delete which will not actually delete the record but just hide it from the original product table. This workaround could have got them saved from cascading delete issues. I will be writing a detailed post on the design implications etc in my future post as in above three lines I cannot cover every issue related to designing and it is also not the scope of the blog post. More about designing in future blog posts. Once they learn their mistake, they were happy as there was no intrusion but trust me sometime we are our own enemy and this is a great example of it. In tomorrow’s blog post we will go over their code and workarounds. Feel free to share your opinions, experiences and comments. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Professional Development – Difference Between Bio, CV and Resume

- by Pinal Dave

Applying for work can be very stressful – you want to put your best foot forward, and it can be very hard to sell yourself to a potential employer while highlighting your best characteristics and answering questions. On top of that, some jobs require different application materials – a biography (or bio), a curriculum vitae (or CV), or a resume. These things seem so interchangeable, so what is the difference? Let’s start with the one most of us have heard of – the resume. A resume is a summary of your job and education history. If you have ever applied for a job, you will have used a resume. The ability to write a good resume that highlights your best characteristics and emphasizes your qualifications for a specific job is a skill that will take you a long way in the world. For such an essential skill, unfortunately it is one that many people struggle with. RESUME So let’s discuss what makes a great resume. First, make sure that your name and contact information are at the top, in large print (slightly larger font than the rest of the text, size 14 or 16 if the rest is size 12, for example). You need to make sure that if you catch the recruiter’s attention and they know how to get a hold of you. As for qualifications, be quick and to the point. Make your job title and the company the headline, and include your skills, accomplishments, and qualifications as bullet points. Use good action verbs, like “finished,” “arranged,” “solved,” and “completed.” Include hard numbers – don’t just say you “changed the filing system,” say that you “revolutionized the storage of over 250 files in less than five days.” Doesn’t that sentence sound much more powerful? Curriculum Vitae (CV) Now let’s talk about curriculum vitae, or “CVs”. A CV is more like an expanded resume. The same rules are still true: put your name front and center, keep your contact info up to date, and summarize your skills with bullet points. However, CVs are often required in more technical fields – like science, engineering, and computer science. This means that you need to really highlight your education and technical skills. Difference between Resume and CV Resumes are expected to be one or two pages long – CVs can be as many pages as necessary. If you are one of those people lucky enough to feel limited by the size constraint of resumes, a CV is for you! On a CV you can expand on your projects, highlight really exciting accomplishments, and include more educational experience – including GPA and test scores from the GRE or MCAT (as applicable). You can also include awards, associations, teaching and research experience, and certifications. A CV is a place to really expand on all your experience and how great you will be in this particular position. Biography (Bio) Chances are, you already know what a bio is, and you have even read a few of them. Think about the one or two paragraphs that every author includes in the back flap of a book. Think about the sentences under a blogger’s photo on every “About Me” page. That is a bio. It is a way to quickly highlight your life experiences. It is essentially the way you would introduce yourself at a party. Where a bio is required for a job, chances are they won’t want to know about where you were born and how many pets you have, though. This is a way to summarize your entire job history in quick-to-read format – and sometimes during a job hunt, being able to get to the point and grab the recruiter’s interest is the best way to get your foot in the door. Think of a bio as your entire resume put into words. Most bios have a standard format. In paragraph one, talk about your most recent position and accomplishments there, specifically how they relate to the job you are applying for. If you have teaching or research experience, training experience, certifications, or management experience, talk about them in paragraph two. Paragraph three and four are for highlighting publications, education, certifications, associations, etc. To wrap up your bio, provide your contact info and availability (dates and times). Where to use What? For most positions, you will know exactly what kind of application to use, because the job announcement will state what materials are needed – resume, CV, bio, cover letter, skill set, etc. If there is any confusion, choose whatever the industry standard is (CV for technical fields, resume for everything else) or choose which of your documents is the strongest. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: About Me, PostADay, Professional Development, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Buzz Words: What is HDFS – Day 8 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is MapReduce. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – HDFS. What is HDFS ? HDFS stands for Hadoop Distributed File System and it is a primary storage system used by Hadoop. It provides high performance access to data across Hadoop clusters. It is usually deployed on low-cost commodity hardware. In commodity hardware deployment server failures are very common. Due to the same reason HDFS is built to have high fault tolerance. The data transfer rate between compute nodes in HDFS is very high, which leads to reduced risk of failure. HDFS creates smaller pieces of the big data and distributes it on different nodes. It also copies each smaller piece to multiple times on different nodes. Hence when any node with the data crashes the system is automatically able to use the data from a different node and continue the process. This is the key feature of the HDFS system. Architecture of HDFS The architecture of the HDFS is master/slave architecture. An HDFS cluster always consists of single NameNode. This single NameNode is a master server and it manages the file system as well regulates access to various files. In additional to NameNode there are multiple DataNodes. There is always one DataNode for each data server. In HDFS a big file is split into one or more blocks and those blocks are stored in a set of DataNodes. The primary task of the NameNode is to open, close or rename files and directory and regulate access to the file system, whereas the primary task of the DataNode is read and write to the file systems. DataNode is also responsible for the creation, deletion or replication of the data based on the instruction from NameNode. In reality, NameNode and DataNode are software designed to run on commodity machine build in Java language. Visual Representation of HDFS Architecture Let us understand how HDFS works with the help of the diagram. Client APP or HDFS Client connects to NameSpace as well as DataNode. Client App access to the DataNode is regulated by NameSpace Node. NameSpace Node allows Client App to connect to the DataNode based by allowing the connection to the DataNode directly. A big data file is divided into multiple data blocks (let us assume that those data chunks are A,B,C and D. Client App will later on write data blocks directly to the DataNode. Client App does not have to directly write to all the node. It just has to write to any one of the node and NameNode will decide on which other DataNode it will have to replicate the data. In our example Client App directly writes to DataNode 1 and detained 3. However, data chunks are automatically replicated to other nodes. All the information like in which DataNode which data block is placed is written back to NameNode. High Availability During Disaster Now as multiple DataNode have same data blocks in the case of any DataNode which faces the disaster, the entire process will continue as other DataNode will assume the role to serve the specific data block which was on the failed node. This system provides very high tolerance to disaster and provides high availability. If you notice there is only single NameNode in our architecture. If that node fails our entire Hadoop Application will stop performing as it is a single node where we store all the metadata. As this node is very critical, it is usually replicated on another clustered as well as on another data rack. Though, that replicated node is not operational in architecture, it has all the necessary data to perform the task of the NameNode in the case of the NameNode fails. The entire Hadoop architecture is built to function smoothly even there are node failures or hardware malfunction. It is built on the simple concept that data is so big it is impossible to have come up with a single piece of the hardware which can manage it properly. We need lots of commodity (cheap) hardware to manage our big data and hardware failure is part of the commodity servers. To reduce the impact of hardware failure Hadoop architecture is built to overcome the limitation of the non-functioning hardware. Tomorrow In tomorrow’s blog post we will discuss the importance of the relational database in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – ?Finding Out What Changed in a Deleted Database – Notes from the Field #041

- by Pinal Dave

[Note from Pinal]: This is a 41th episode of Notes from the Field series. The real world is full of challenges. When we are reading theory or book, we sometimes do not realize how real world reacts works and that is why we have the series notes from the field, which is extremely popular with developers and DBA. Let us talk about interesting problem of how to figure out what has changed in the DELETED database. Well, you think I am just throwing the words but in reality this kind of problems are making our DBA’s life interesting and in this blog post we have amazing story from Brian Kelley about the same subject. In this episode of the Notes from the Field series database expert Brian Kelley explains a how to find out what has changed in deleted database. Read the experience of Brian in his own words. Sometimes, one of the hardest questions to answer is, “What changed?” A similar question is, “Did anything change other than what we expected to change?” The First Place to Check – Schema Changes History Report: Pinal has recently written on the Schema Changes History report and its requirement for the Default Trace to be enabled. This is always the first place I look when I am trying to answer these questions. There are a couple of obvious limitations with the Schema Changes History report. First, while it reports what changed, when it changed, and who changed it, other than the base DDL operation (CREATE, ALTER, DELETE), it does not present what the changes actually were. This is not something covered by the default trace. Second, the default trace has a fixed size. When it hits that size, the changes begin to overwrite. As a result, if you wait too long, especially on a busy database server, you may find your changes rolled off. But the Database Has Been Deleted! Pinal cited another issue, and that’s the inability to run the Schema Changes History report if the database has been dropped. Thankfully, all is not lost. One thing to remember is that the Schema Changes History report is ultimately driven by the Default Trace. As you may have guess, it’s a trace, like any other database trace. And the Default Trace does write to disk. The trace files are written to the defined LOG directory for that SQL Server instance and have a prefix of log_: Therefore, you can read the trace files like any other. Tip: Copy the files to a working directory. Otherwise, you may occasionally receive a file in use error. With the Default Trace files, if you ask the question early enough, you can see the information for a deleted database just the same as any other database. Testing with a Deleted Database: Here’s a short script that will create a database, create a schema, create an object, and then drop the database. Without the database, you can’t do a standard Schema Changes History report. CREATE DATABASE DeleteMe; GO USE DeleteMe; GO CREATE SCHEMA Test AUTHORIZATION dbo; GO CREATE TABLE Test.Foo (FooID INT); GO USE MASTER; GO DROP DATABASE DeleteMe; GO This sets up the perfect situation where we can’t retrieve the information using the Schema Changes History report but where it’s still available. Finding the Information: I’ve sorted the columns so I can see the Event Subclass, the Start Time, the Database Name, the Object Name, and the Object Type at the front, but otherwise, I’m just looking at the trace files using SQL Profiler. As you can see, the information is definitely there: Therefore, even in the case of a dropped/deleted database, you can still determine who did what and when. You can even determine who dropped the database (loginame is captured). The key is to get the default trace files in a timely manner in order to extract the information. If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Security, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Backing Up and Recovering the Tail End of a Transaction Log – Notes from the Field #042

- by Pinal Dave

[Notes from Pinal]: The biggest challenge which people face is not taking backup, but the biggest challenge is to restore a backup successfully. I have seen so many different examples where users have failed to restore their database because they made some mistake while they take backup and were not aware of the same. Tail Log backup was such an issue in earlier version of SQL Server but in the latest version of SQL Server, Microsoft team has fixed the confusion with additional information on the backup and restore screen itself. Now they have additional information, there are a few more people confused as they have no clue about this. Previously they did not find this as a issue and now they are finding tail log as a new learning. Linchpin People are database coaches and wellness experts for a data driven world. In this 42nd episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple words, Backing Up and Recovering the Tail End of a Transaction Log. Many times when restoring a database over an existing database SQL Server will warn you about needing to make a tail end of the log backup. This might be your reminder that you have to choose to overwrite the database or could be your reminder that you are about to write over and lose any transactions since the last transaction log backup. You might be asking yourself “What is the tail end of the transaction log”. The tail end of the transaction log is simply any committed transactions that have occurred since the last transaction log backup. This is a very crucial part of a recovery strategy if you are lucky enough to be able to capture this part of the log. Most organizations have chosen to accept some amount of data loss. You might be shaking your head at this statement however if your organization is taking transaction logs backup every 15 minutes, then your potential risk of data loss is up to 15 minutes. Depending on the extent of the issue causing you to have to perform a restore, you may or may not have access to the transaction log (LDF) to be able to back up those vital transactions. For example, if the storage array or disk that holds your transaction log file becomes corrupt or damaged then you wouldn’t be able to recover the tail end of the log. If you do have access to the physical log file then you can still back up the tail end of the log. In 2013 I presented a session at the PASS Summit called “The Ultimate Tail Log Backup and Restore” and have been invited back this year to present it again. During this session I demonstrate how you can back up the tail end of the log even after the data file becomes corrupt. In my demonstration I set my database offline and then delete the data file (MDF). The database can’t become more corrupt than that. I attempt to bring the database back online to change the state to RECOVERY PENDING and then backup the tail end of the log. I can do this by specifying WITH NO_TRUNCATE. Using NO_TRUNCATE is equivalent to specifying both COPY_ONLY and CONTINUE_AFTER_ERROR. It as its name says, does not try to truncate the log. This is a great demo however how could I achieve backing up the tail end of the log if the failure destroys my entire instance of SQL and all I had was the LDF file? During my demonstration I also demonstrate that I can attach the log file to a database on another instance and then back up the tail end of the log. If I am performing proper backups then my most recent full, differential and log files should be on a server other than the one that crashed. I am able to achieve this task by creating new database with the same name as the failed database. I then set the database offline, delete my data file and overwrite the log with my good log file. I attempt to bring the database back online and then backup the log with NO_TRUNCATE just like in the first example. I encourage each of you to view my blog post and watch the video demonstration on how to perform these tasks. I really hope that none of you ever have to perform this in production, however it is a really good idea to know how to do this just in case. It really isn’t a matter of “IF” you will have to perform a restore of a production system but more of a “WHEN”. Being able to recover the tail end of the log in these sever cases could be the difference of having to notify all your business customers of data loss or not. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Developer Training Resources and Summary Roundup

- by pinaldave

It is always pleasure for any author when other renowned authors in the industry write about you. Earlier I wrote a five part blog series on Developer Training and I have received a phenomenal response to the series. I have received plenty of comments, questions and feedback. I thought it would be nice to sum up the whole series as well answer a few of the questions received. Quick Recap Developer Training - Importance and Significance - Part 1 In this part we discussed the importance of training in the real world. The most important and valuable resource any company is its employee. Employees who have been well-trained will be better at their jobs and produce a better product. An employee who is well trained obviously knows more about their job and all the technical aspects. I have a very high opinion about training employees and it is the most important task. Developer Training – Employee Morals and Ethics – Part 2 In this part we discussed the most crucial components of training. Often employees are expecting the company to pay for their training and the company expresses no interest in training the employee. Quite often training expenses are the real issue for both the employee and employer. There are companies that pay for 100% of the expenses and there are employees who opt for training on their own expense during their personal time. Training is often looked at as vacation by employee and employers and we need to change this mind-set. One of the ways is to report back the learning to your manager and implement newly learned knowledge in day-to-day work. Developer Training – Difficult Questions and Alternative Perspective - Part 3 This part was the most difficult to write as I tried to address a few difficult questions and answers. Training is such a sensitive issue that many developers when not receiving chance for training think about leaving the organization. The manager often feels pressure to accommodate every single employee for training even though his training budget is limited. It is indeed the responsibility of the developer to get maximum advantage from the training. Training immediately helps organizations but stays as a part of an employee’s knowledge forever. Developer Training – Various Options for Developer Training – Part 4 In this part I tried to explore a few methods and options for training. The generic feedback I received on this blog post was short and I should have explored each of the subject of the training in details. I believe there are two big buckets of training 1) Instructor Lead Training and 2) Self Lead Training. The common element between both the methods is “learning material”. Learning material can be of any format – videos, books, paper notes or just a plain black board. Instructor-led training is a very effective mode but not possible every single time. During the course of the developer’s career, one has to learn lots of new technology and it is almost impossible to have a quality trainer available on that subject at that time. Books are most effective and proven methods, however, it always helps if someone explains the concepts of the book with a demonstration. In recent times I have started to believe in online trainings which leads to a hybrid experience. Online trainings take the best part of the books and the best part of the instructor-led training and gives effective training in a matter of hours. Developer Training – A Conclusive Summary- Part 5 In this part, I shared what I was continuously thinking about developer training. There is no better teacher than oneself. There is no better motivation than a personal desire to learn new technology. Honestly there is nothing more personal learning. That “change is the only constant” and “adapt & overcome” are the essential lessons of life. One cannot stop the learning and resist the change. In the IT industry “ego of knowing all” and the “resistance to change” are the most challenging issues. Once someone overcomes them, life is much easier. I believe that proper and appropriate high quality training can help to address the burning issues. Opinion of Friends I invited a few of my friends to express their opinion about developer training and here are their opinions. I am listing them here in the order of the blog post publishing date. Nakul Vachhrajani - Developer Trainings-Importance, Benefits, Tips and follow-up Nakul’s sums of many of the concepts which are complementary to my blog posts. Nakul addresses the burning question of developer training with different angles. I am personally very impressed by his following statement - “Being skilled does not mean having just a stack of certifications, but it also means having an understanding about the internals of the products that you are working on – and using that knowledge to improve the efficiency & productivity at the workplace in turn resulting in better products, better consulting abilities and a happier self.” Nakul also suggests the online training options of Pluralsight. Vinod Kumar - Training–a necessity or bonus Vinod Kumar comes up with excellent follow up on developer training. Vinod is known for his inspirational writing about SQL Server. Vinod starts with a story of a student who is extremely eager to learn the wisdom of life from a monk but the monk does not accept him as a disciple for a long time. The conversation between student and monk is indeed an essence of all learning. We all want to learn quickly and be successful but the most important thing in life is to have the right attitude towards learning and more so towards life. The blog post end with a very important thought about how to avoid the famous excuse – “I don’t have enough time.” Ritesh Shah - Training – useful or useless? Ritesh brings up very important concept related to training. Ritesh in his meticulous style explains why training is an important and lifelong process. Training must not stop at any age but should continue forever. The moment training stops, progress stops along with. Paras Doshi - Professional Development Resource Paras is known for his to–the-point writing, and has summarized the five part series very precisely. He read the five part series and created a digest summary of the blog post. If you are in a rush and have no time to read my five series – I suggest you read his blog post. Training Resources I am often asked what the best resources for learning new technology are. This is the most difficult question EVER. There are plenty of good training resources available. When it is about training our needs are different, our preference of learning is different and we all have an opinion. Additionally, we all are located in different geographic locations worldwide and there is no way one solution will fit all. However, let me list a few of the training resources which I have built so far and you can consume them if you find it relevant to your need. SQL Server Books SQL Server Interview Questions and Answers SQL Wait Stats SQL Programming Joes 2 Pros SQL Server Video Tutorials SQL Server Questions and Answers SQL Server Performance: Indexing Basics SQL Server Performance: Introduction to Query Tuning SQL in Sixty Seconds Series of Sixty Seconds Learning Video on YouTube Trust me worldwide web is very big and there are plenty of high quality learning materials available worldwide – trainer-led as well online. I suggest you explore various options and make the best choice for yourself. Remember, training is your personal journey and it should never stop. Are you ready? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Developer Training, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – How to Recover SQL Database Data Deleted by Accident

- by Pinal Dave

In Repair a SQL Server database using a transaction log explorer, I showed how to use ApexSQL Log, a SQL Server transaction log viewer, to recover a SQL Server database after a disaster. In this blog, I’ll show you how to use another SQL Server disaster recovery tool from ApexSQL in a situation when data is accidentally deleted. You can download ApexSQL Recover here, install, and play along. With a good SQL Server disaster recovery strategy, data recovery is not a problem. You have a reliable full database backup with valid data, a full database backup and subsequent differential database backups, or a full database backup and a chain of transaction log backups. But not all situations are ideal. Here we’ll address some sub-optimal scenarios, where you can still successfully recover data. If you have only a full database backup This is the least optimal SQL Server disaster recovery strategy, as it doesn’t ensure minimal data loss. For example, data was deleted on Wednesday. Your last full database backup was created on Sunday, three days before the records were deleted. By using the full database backup created on Sunday, you will be able to recover SQL database records that existed in the table on Sunday. If there were any records inserted into the table on Monday or Tuesday, they will be lost forever. The same goes for records modified in this period. This method will not bring back modified records, only the old records that existed on Sunday. If you restore this full database backup, all your changes (intentional and accidental) will be lost and the database will be reverted to the state it had on Sunday. What you have to do is compare the records that were in the table on Sunday to the records on Wednesday, create a synchronization script, and execute it against the Wednesday database. If you have a full database backup followed by differential database backups Let’s say the situation is the same as in the example above, only you create a differential database backup every night. Use the full database backup created on Sunday, and the last differential database backup (created on Tuesday). In this scenario, you will lose only the data inserted and updated after the differential backup created on Tuesday. If you have a full database backup and a chain of transaction log backups This is the SQL Server disaster recovery strategy that provides minimal data loss. With a full chain of transaction logs, you can recover the SQL database to an exact point in time. To provide optimal results, you have to know exactly when the records were deleted, because restoring to a later point will not bring back the records. This method requires restoring the full database backup first. If you have any differential log backup created after the last full database backup, restore the most recent one. Then, restore transaction log backups, one by one, it the order they were created starting with the first created after the restored differential database backup. Now, the table will be in the state before the records were deleted. You have to identify the deleted records, script them and run the script against the original database. Although this method is reliable, it is time-consuming and requires a lot of space on disk. How to easily recover deleted records? The following solution enables you to recover SQL database records even if you have no full or differential database backups and no transaction log backups. To understand how ApexSQL Recover works, I’ll explain what happens when table data is deleted. Table data is stored in data pages. When you delete table records, they are not immediately deleted from the data pages, but marked to be overwritten by new records. Such records are not shown as existing anymore, but ApexSQL Recover can read them and create undo script for them. How long will deleted records stay in the MDF file? It depends on many factors, as time passes it’s less likely that the records will not be overwritten. The more transactions occur after the deletion, the more chances the records will be overwritten and permanently lost. Therefore, it’s recommended to create a copy of the database MDF and LDF files immediately (if you cannot take your database offline until the issue is solved) and run ApexSQL Recover on them. Note that a full database backup will not help here, as the records marked for overwriting are not included in the backup. First, I’ll delete some records from the Person.EmailAddress table in the AdventureWorks database. I can delete these records in SQL Server Management Studio, or execute a script such as DELETE FROM Person.EmailAddress WHERE BusinessEntityID BETWEEN 70 AND 80 Then, I’ll start ApexSQL Recover and select From DELETE operation in the Recovery tab. In the Select the database to recover step, first select the SQL Server instance. If it’s not shown in the drop-down list, click the Server icon right to the Server drop-down list and browse for the SQL Server instance, or type the instance name manually. Specify the authentication type and select the database in the Database drop-down list. In the next step, you’re prompted to add additional data sources. As this can be a tricky step, especially for new users, ApexSQL Recover offers help via the Help me decide option. The Help me decide option guides you through a series of questions about the database transaction log and advises what files to add. If you know that you have no transaction log backups or detached transaction logs, or the online transaction log file has been truncated after the data was deleted, select No additional transaction logs are available. If you know that you have transaction log backups that contain the delete transactions you want to recover, click Add transaction logs. The online transaction log is listed and selected automatically. Click Add if to add transaction log backups. It would be best if you have a full transaction log chain, as explained above. The next step for this option is to specify the time range. Selecting a small time range for the time of deletion will create the recovery script just for the accidentally deleted records. A wide time range might script the records deleted on purpose, and you don’t want that. If needed, you can check the script generated and manually remove such records. After that, for all data sources options, the next step is to select the tables. Be careful here, if you deleted some data from other tables on purpose, and don’t want to recover them, don’t select all tables, as ApexSQL Recover will create the INSERT script for them too. The next step offers two options: to create a recovery script that will insert the deleted records back into the Person.EmailAddress table, or to create a new database, create the Person.EmailAddress table in it, and insert the deleted records. I’ll select the first one. The recovery process is completed and 11 records are found and scripted, as expected. To see the script, click View script. ApexSQL Recover has its own script editor, where you can review, modify, and execute the recovery script. The insert into statements look like: INSERT INTO Person.EmailAddress( BusinessEntityID, EmailAddressID, EmailAddress, rowguid, ModifiedDate) VALUES( 70, 70, N'[email protected]' COLLATE SQL_Latin1_General_CP1_CI_AS, 'd62c5b4e-c91f-403f-b630-7b7e0fda70ce', '20030109 00:00:00.000' ); To execute the script, click Execute in the menu. If you want to check whether the records are really back, execute SELECT * FROM Person.EmailAddress WHERE BusinessEntityID BETWEEN 70 AND 80 As shown, ApexSQL Recover recovers SQL database data after accidental deletes even without the database backup that contains the deleted data and relevant transaction log backups. ApexSQL Recover reads the deleted data from the database data file, so this method can be used even for databases in the Simple recovery model. Besides recovering SQL database records from a DELETE statement, ApexSQL Recover can help when the records are lost due to a DROP TABLE, or TRUNCATE statement, as well as repair a corrupted MDF file that cannot be attached to as SQL Server instance. You can find more information about how to recover SQL database lost data and repair a SQL Server database on ApexSQL Solution center. There are solutions for various situations when data needs to be recovered. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Introduction – Day 1 of 31

- by pinaldave

List of all the Interview Questions and Answers Series blogs Posts covering interview questions and answers always make for interesting reading. Some people like the subject for their helpful hints and thought provoking subject, and others dislike these posts because they feel it is nothing more than cheating. I’d like to discuss the pros and cons of a Question and Answer format here. Interview Questions and Answers are Helpful Just like blog posts, books, and articles, interview Question and Answer discussions are learning material. The popular Dummy’s books or Idiots Guides are not only for “dummies,” but can help everyone relearn the fundamentals. Question and Answer discussions can serve the same purpose. You could call this SQL Server Fundamentals or SQL Server 101. I have administrated hundreds of interviews during my career and I have noticed that sometimes an interviewee with several years of experience lacks an understanding of the fundamentals. These individuals have been in the industry for so long, usually working on a very specific project, that the ABCs of the business have slipped their mind. Or, when a college graduate is looking to get into the industry, he is not expected to have experience since he is just graduated. However, the new grad is expected to have an understanding of fundamentals and theory. Sometimes after the stress of final exams and graduation, it can be difficult to remember the correct answers to interview questions, though. An interview Question and Answer discussion can be very helpful to both these individuals. It is simply a way to go back over the building blocks of a topic. Many times a simple review like this will help “jog” your memory, and all those previously-memorized facts will come flooding back to you. It is not a way to re-learn a topic, but a way to remind yourself of what you already know. A Question and Answer discussion can also be a way to go over old topics in a more interesting manner. Especially if you have been working in the industry, or taking lots of classes on the topic, everything you read can sound like a repeat of what you already know. Going over a topic in a new format can make the material seem fresh and interesting. And an interested mind will be more engaged and remember more in the end. Interview Questions and Answers are Harmful A common argument against a Question and Answer discussion is that it will give someone a “cheat sheet.” A new guy with relatively little experience can read the interview questions and answers, and then memorize them. When an interviewer asks him the same questions, he will repeat the answers and get the job. Honestly, is he good hire because he memorized the interview questions? Wouldn’t it be better for the interviewer to hire someone with actual experience? The answer is not as easy as it seems – there are many different factors to be considered. If the interviewer is asking fundamentals-related questions only, he gets the answers he wants to hear, and then hires this first candidate – there is a good chance that he is hiring based on personality rather than experience. If the interviewer is smart he will ask deeper questions, have more than one person on the interview team, and interview a variety of candidates. If one interviewee happens to memorize some answers, it usually doesn’t mean he will automatically get the job at the expense of more qualified candidates. Another argument against interview Question and Answers is that it will give candidates a false sense of confidence, and that they will appear more qualified than they are. Well, if that is true, it will not last after the first interview when the candidate is asked difficult questions and he cannot find the answers in the list of interview Questions and Answers. Besides, confidence is one of the best things to walk into an interview with! In today’s competitive job market, there are often hundreds of candidates applying for the same position. With so many applicants to choose from, interviewers must make decisions about who to call back and who to hire based on their gut feeling. One drawback to reading an interview Question and Answer article is that you might sound very boring in your interview – saying the same thing as every single candidate, and parroting answers that sound like someone else wrote them for you – because they did. However, it is definitely better to go to an interview prepared, just make sure that you give a lot of thought to your answers to make them sound like your own voice. Remember that you will be hired based on your skills as well as your personality, so don’t think that having all the right answers will make get you hired. A good interviewee will be prepared, confident, and know how to stand out. My Opinion A list of interview Questions and Answers is really helpful as a refresher or for beginners. To really ace an interview, one needs to have real-world, hands-on experience with SQL Server as well. Interview questions just serve as a starter or easy read for experienced professionals. When I have to learn new technology, I often search online for interview questions and get an idea about the breadth and depth of the technology. Next Action I am going to write about interview Questions and Answers for next 30 days. I have previously written a series of interview questions and answers; now I have re-written them keeping the latest version of SQL Server and current industry progress in mind. If you have faced interesting interview questions or situations, please write to me and I will publish them as a guest post. If you want me to add few more details, leave a comment and I will make sure that I do my best to accommodate. Tomorrow we will start the interview Questions and Answers series, with a few interesting stories, best practices and guest posts. We will have a prize give-away and other awards when the series ends. List of all the Interview Questions and Answers Series blogs Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Interview Questions and Answers, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Big Data – Is Big Data Relevant to me? – Big Data Questionnaires – Guest Post by Vinod Kumar

- by Pinal Dave

This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions of SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. I think the series from Pinal is a good one for anyone planning to start on Big Data journey from the basics. In my daily customer interactions this buzz of “Big Data” always comes up, I react generally saying – “Sir, do you really have a ‘Big Data’ problem or do you have a big Data problem?” Generally, there is a silence in the air when I ask this question. Data is everywhere in organizations – be it big data, small data, all data and for few it is bad data which is same as no data :). Wow, don’t discount me as someone who opposes “Big Data”, I am a big supporter as much as I am a critic of the abuse of this term by the people. In this post, I wanted to let my mind flow so that you can also think in the direction I want you to see these concepts. In any case, this is not an exhaustive dump of what is in my mind – but you will surely get the drift how I am going to question Big Data terms from customers!!! Is Big Data Relevant to me? Many of my customers talk to me like blank whiteboard with no idea – “why Big Data”. They want to jump into the bandwagon of technology and they want to decipher insights from their unexplored data a.k.a. unstructured data with structured data. So what are these industry scenario’s that come to mind? Here are some of them: Financials Fraud detection: Banks and Credit cards are monitoring your spending habits on real-time basis. Customer Segmentation: applies in every industry from Banking to Retail to Aviation to Utility and others where they deal with end customer who consume their products and services. Customer Sentiment Analysis: Responding to negative brand perception on social or amplify the positive perception. Sales and Marketing Campaign: Understand the impact and get closer to customer delight. Call Center Analysis: attempt to take unstructured voice recordings and analyze them for content and sentiment. Medical Reduce Re-admissions: How to build a proactive follow-up engagements with patients. Patient Monitoring: How to track Inpatient, Out-Patient, Emergency Visits, Intensive Care Units etc. Preventive Care: Disease identification and Risk stratification is a very crucial business function for medical. Claims fraud detection: There is no precise dollars that one can put here, but this is a big thing for the medical field. Retail Customer Sentiment Analysis, Customer Care Centers, Campaign Management. Supply Chain Analysis: Every sensors and RFID data can be tracked for warehouse space optimization. Location based marketing: Based on where a check-in happens retail stores can be optimize their marketing. Telecom Price optimization and Plans, Finding Customer churn, Customer loyalty programs Call Detail Record (CDR) Analysis, Network optimizations, User Location analysis Customer Behavior Analysis Insurance Fraud Detection & Analysis, Pricing based on customer Sentiment Analysis, Loyalty Management Agents Analysis, Customer Value Management This list can go on to other areas like Utility, Manufacturing, Travel, ITES etc. So as you can see, there are obviously interesting use cases for each of these industry verticals. These are just representative list. Where to start? A lot of times I try to quiz customers on a number of dimensions before starting a Big Data conversation. Are you getting the data you need the way you want it and in a timely manner? Can you get in and analyze the data you need? How quickly is IT to respond to your BI Requests? How easily can you get at the data that you need to run your business/department/project? How are you currently measuring your business? Can you get the data you need to react WITHIN THE QUARTER to impact behaviors to meet your numbers or is it always “rear-view mirror?” How are you measuring: The Brand Customer Sentiment Your Competition Your Pricing Your performance Supply Chain Efficiencies Predictive product / service positioning What are your key challenges of driving collaboration across your global business? What the challenges in innovation? What challenges are you facing in getting more information out of your data? Note: Garbage-in is Garbage-out. Hold good for all reporting / analytics requirements Big Data POCs? A number of customers get into the realm of setting a small team to work on Big Data – well it is a great start from an understanding point of view, but I tend to ask a number of other questions to such customers. Some of these common questions are: To what degree is your advanced analytics (natural language processing, sentiment analysis, predictive analytics and classification) paired with your Big Data’s efforts? Do you have dedicated resources exploring the possibilities of advanced analytics in Big Data for your business line? Do you plan to employ machine learning technology while doing Advanced Analytics? How is Social Media being monitored in your organization? What is your ability to scale in terms of storage and processing power? Do you have a system in place to sort incoming data in near real time by potential value, data quality, and use frequency? Do you use event-driven architecture to manage incoming data? Do you have specialized data services that can accommodate different formats, security, and the management requirements of multiple data sources? Is your organization currently using or considering in-memory analytics? To what degree are you able to correlate data from your Big Data infrastructure with that from your enterprise data warehouse? Have you extended the role of Data Stewards to include ownership of big data components? Do you prioritize data quality based on the source system (that is Facebook/Twitter data has lower quality thresholds than radio frequency identification (RFID) for a tracking system)? Do your retention policies consider the different legal responsibilities for storing Big Data for a specific amount of time? Do Data Scientists work in close collaboration with Data Stewards to ensure data quality? How is access to attributes of Big Data being given out in the organization? Are roles related to Big Data (Advanced Analyst, Data Scientist) clearly defined? How involved is risk management in the Big Data governance process? Is there a set of documented policies regarding Big Data governance? Is there an enforcement mechanism or approach to ensure that policies are followed? Who is the key sponsor for your Big Data governance program? (The CIO is best) Do you have defined policies surrounding the use of social media data for potential employees and customers, as well as the use of customer Geo-location data? How accessible are complex analytic routines to your user base? What is the level of involvement with outside vendors and third parties in regard to the planning and execution of Big Data projects? What programming technologies are utilized by your data warehouse/BI staff when working with Big Data? These are some of the important questions I ask each customer who is actively evaluating Big Data trends for their organizations. These questions give you a sense of direction where to start, what to use, how to secure, how to analyze and more. Sign off Any Big data is analysis is incomplete without a compelling story. The best way to understand this is to watch Hans Rosling – Gapminder (2:17 to 6:06) videos about the third world myths. Don’t get overwhelmed with the Big Data buzz word, the destination to what your data speaks is important. In this blog post, we did not particularly look at any Big Data technologies. This is a set of questionnaire one needs to keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make sense? Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – 5 Tips for Improving Your Data with expressor Studio

- by pinaldave

It’s no secret that bad data leads to bad decisions and poor results. However, how do you prevent dirty data from taking up residency in your data store? Some might argue that it’s the responsibility of the person sending you the data. While that may be true, in practice that will rarely hold up. It doesn’t matter how many times you ask, you will get the data however they decide to provide it. So now you have bad data. What constitutes bad data? There are quite a few valid answers, for example: Invalid date values Inappropriate characters Wrong data Values that exceed a pre-set threshold While it is certainly possible to write your own scripts and custom SQL to identify and deal with these data anomalies, that effort often takes too long and becomes difficult to maintain. Instead, leveraging an ETL tool like expressor Studio makes the data cleansing process much easier and faster. Below are some tips for leveraging expressor to get your data into tip-top shape. Tip 1: Build reusable data objects with embedded cleansing rules One of the new features in expressor Studio 3.2 is the ability to define constraints at the metadata level. Using expressor’s concept of Semantic Types, you can define reusable data objects that have embedded logic such as constraints for dealing with dirty data. Once defined, they can be saved as a shared atomic type and then re-applied to other data attributes in other schemas. As you can see in the figure above, I’ve defined a constraint on zip code. I can then save the constraint rules I defined for zip code as a shared atomic type called zip_type for example. The next time I get a different data source with a schema that also contains a zip code field, I can simply apply the shared atomic type (shown below) and the previously defined constraints will be automatically applied. Tip 2: Unlock the power of regular expressions in Semantic Types Another powerful feature introduced in expressor Studio 3.2 is the option to use regular expressions as a constraint. A regular expression is used to identify patterns within data. The patterns could be something as simple as a date format or something much more complex such as a street address. For example, I could define that a valid IP address should be made up of 4 numbers, each 0 to 255, and separated by a period. So 192.168.23.123 might be a valid IP address whereas 888.777.0.123 would not be. How can I account for this using regular expressions? A very simple regular expression that would look for any 4 sets of 3 digits separated by a period would be: ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ Alternatively, the following would be the exact check for truly valid IP addresses as we had defined above: ^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$ . In expressor, we would enter this regular expression as a constraint like this: Here we select the corrective action to be ‘Escalate’, meaning that the expressor Dataflow operator will decide what to do. Some of the options include rejecting the offending record, skipping it, or aborting the dataflow. Tip 3: Email pattern expressions that might come in handy In the example schema that I am using, there’s a field for email. Email addresses are often entered incorrectly because people are trying to avoid spam. While there are a lot of different ways to define what constitutes a valid email address, a quick search online yields a couple of really useful regular expressions for validating email addresses: This one is short and sweet: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b (Source: http://www.regular-expressions.info/) This one is more specific about which characters are allowed: ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ (Source: http://regexlib.com/REDetails.aspx?regexp_id=26 ) Tip 4: Reject “dirty data” for analysis or further processing Yet another feature introduced in expressor Studio 3.2 is the ability to reject records based on constraint violations. To capture reject records on input, simply specify Reject Record in the Error Handling setting for the Read File operator. Then attach a Write File operator to the reject port of the Read File operator as such: Next, in the Write File operator, you can configure the expressor operator in a similar way to the Read File. The key difference would be that the schema needs to be derived from the upstream operator as shown below: Once configured, expressor will output rejected records to the file you specified. In addition to the rejected records, expressor also captures some diagnostic information that will be helpful towards identifying why the record was rejected. This makes diagnosing errors much easier! Tip 5: Use a Filter or Transform after the initial cleansing to finish the job Sometimes you may want to predicate the data cleansing on a more complex set of conditions. For example, I may only be interested in processing data containing males over the age of 25 in certain zip codes. Using an expressor Filter operator, you can define the conditional logic which isolates the records of importance away from the others. Alternatively, the expressor Transform operator can be used to alter the input value via a user defined algorithm or transformation. It also supports the use of conditional logic and data can be rejected based on constraint violations. However, the best tip I can leave you with is to not constrain your solution design approach – expressor operators can be combined in many different ways to achieve the desired results. For example, in the expressor Dataflow below, I can post-process the reject data from the Filter which did not meet my pre-defined criteria and, if successful, Funnel it back into the flow so that it gets written to the target table. I continue to be impressed that expressor offers all this functionality as part of their FREE expressor Studio desktop ETL tool, which you can download from here. Their Studio ETL tool is absolutely free and they are very open about saying that if you want to deploy their software on a dedicated Windows Server, you need to purchase their server software, whose pricing is posted on their website. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Shrinking Database is Bad – Increases Fragmentation – Reduces Performance

- by pinaldave

Earlier, I had written two articles related to Shrinking Database. I wrote about why Shrinking Database is not good. SQL SERVER – SHRINKDATABASE For Every Database in the SQL Server SQL SERVER – What the Business Says Is Not What the Business Wants I received many comments on Why Database Shrinking is bad. Today we will go over a very interesting example that I have created for the same. Here are the quick steps of the example. Create a test database Create two tables and populate with data Check the size of both the tables Size of database is very low Check the Fragmentation of one table Fragmentation will be very low Truncate another table Check the size of the table Check the fragmentation of the one table Fragmentation will be very low SHRINK Database Check the size of the table Check the fragmentation of the one table Fragmentation will be very HIGH REBUILD index on one table Check the size of the table Size of database is very HIGH Check the fragmentation of the one table Fragmentation will be very low Here is the script for the same. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO Let us check the table size and fragmentation. Now let us TRUNCATE the table and check the size and Fragmentation. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can clearly see that after TRUNCATE, the size of the database is not reduced and it is still the same as before TRUNCATE operation. After the Shrinking database operation, we were able to reduce the size of the database. If you notice the fragmentation, it is considerably high. The major problem with the Shrink operation is that it increases fragmentation of the database to very high value. Higher fragmentation reduces the performance of the database as reading from that particular table becomes very expensive. One of the ways to reduce the fragmentation is to rebuild index on the database. Let us rebuild the index and observe fragmentation and database size. -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REBUILD GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can notice that after rebuilding, Fragmentation reduces to a very low value (almost same to original value); however the database size increases way higher than the original. Before rebuilding, the size of the database was 5 MB, and after rebuilding, it is around 20 MB. Regular rebuilding the index is rebuild in the same user database where the index is placed. This usually increases the size of the database. Look at irony of the Shrinking database. One person shrinks the database to gain space (thinking it will help performance), which leads to increase in fragmentation (reducing performance). To reduce the fragmentation, one rebuilds index, which leads to size of the database to increase way more than the original size of the database (before shrinking). Well, by Shrinking, one did not gain what he was looking for usually. Rebuild indexing is not the best suggestion as that will create database grow again. I have always remembered the excellent post from Paul Randal regarding Shrinking the database is bad. I suggest every one to read that for accuracy and interesting conversation. Let us run following script where we Shrink the database and REORGANIZE. -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Shrink the Database DBCC SHRINKDATABASE (ShrinkIsBed); GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REORGANIZE GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can see that REORGANIZE does not increase the size of the database or remove the fragmentation. Again, I no way suggest that REORGANIZE is the solution over here. This is purely observation using demo. Read the blog post of Paul Randal. Following script will clean up the database -- Clean up USE MASTER GO ALTER DATABASE ShrinkIsBed SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO DROP DATABASE ShrinkIsBed GO There are few valid cases of the Shrinking database as well, but that is not covered in this blog post. We will cover that area some other time in future. Additionally, one can rebuild index in the tempdb as well, and we will also talk about the same in future. Brent has written a good summary blog post as well. Are you Shrinking your database? Well, when are you going to stop Shrinking it? Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Index, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
SQL SERVER – Introduction to SQL Server 2014 In-Memory OLTP

- by Pinal Dave

In SQL Server 2014 Microsoft has introduced a new database engine component called In-Memory OLTP aka project “Hekaton” which is fully integrated into the SQL Server Database Engine. It is optimized for OLTP workloads accessing memory resident data. In-memory OLTP helps us create memory optimized tables which in turn offer significant performance improvement for our typical OLTP workload. The main objective of memory optimized table is to ensure that highly transactional tables could live in memory and remain in memory forever without even losing out a single record. The most significant part is that it still supports majority of our Transact-SQL statement. Transact-SQL stored procedures can be compiled to machine code for further performance improvements on memory-optimized tables. This engine is designed to ensure higher concurrency and minimal blocking. In-Memory OLTP alleviates the issue of locking, using a new type of multi-version optimistic concurrency control. It also substantially reduces waiting for log writes by generating far less log data and needing fewer log writes. Points to remember Memory-optimized tables refer to tables using the new data structures and key words added as part of In-Memory OLTP. Disk-based tables refer to your normal tables which we used to create in SQL Server since its inception. These tables use a fixed size 8 KB pages that need to be read from and written to disk as a unit. Natively compiled stored procedures refer to an object Type which is new and is supported by in-memory OLTP engine which convert it into machine code, which can further improve the data access performance for memory –optimized tables. Natively compiled stored procedures can only reference memory-optimized tables, they can’t be used to reference any disk –based table. Interpreted Transact-SQL stored procedures, which is what SQL Server has always used. Cross-container transactions refer to transactions that reference both memory-optimized tables and disk-based tables. Interop refers to interpreted Transact-SQL that references memory-optimized tables. Using In-Memory OLTP In-Memory OLTP engine has been available as part of SQL Server 2014 since June 2013 CTPs. Installation of In-Memory OLTP is part of the SQL Server setup application. The In-Memory OLTP components can only be installed with a 64-bit edition of SQL Server 2014 hence they are not available with 32-bit editions. Creating Databases Any database that will store memory-optimized tables must have a MEMORY_OPTIMIZED_DATA filegroup. This filegroup is specifically designed to store the checkpoint files needed by SQL Server to recover the memory-optimized tables, and although the syntax for creating the filegroup is almost the same as for creating a regular filestream filegroup, it must also specify the option CONTAINS MEMORY_OPTIMIZED_DATA. Here is an example of a CREATE DATABASE statement for a database that can support memory-optimized tables: CREATE DATABASE InMemoryDB ON PRIMARY(NAME = [InMemoryDB_data], FILENAME = 'D:\data\InMemoryDB_data.mdf', size=500MB), FILEGROUP [SampleDB_mod_fg] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [InMemoryDB_mod_dir], FILENAME = 'S:\data\InMemoryDB_mod_dir'), (NAME = [InMemoryDB_mod_dir], FILENAME = 'R:\data\InMemoryDB_mod_dir') LOG ON (name = [SampleDB_log], Filename='L:\log\InMemoryDB_log.ldf', size=500MB) COLLATE Latin1_General_100_BIN2; Above example code creates files on three different drives (D: S: and R:) for the data files and in memory storage so if you would like to run this code kindly change the drive and folder locations as per your convenience. Also notice that binary collation was specified as Windows (non-SQL). BIN2 collation is the only collation support at this point for any indexes on memory optimized tables. It is also possible to add a MEMORY_OPTIMIZED_DATA file group to an existing database, use the below command to achieve the same. ALTER DATABASE AdventureWorks2012 ADD FILEGROUP hekaton_mod CONTAINS MEMORY_OPTIMIZED_DATA; GO ALTER DATABASE AdventureWorks2012 ADD FILE (NAME='hekaton_mod', FILENAME='S:\data\hekaton_mod') TO FILEGROUP hekaton_mod; GO Creating Tables There is no major syntactical difference between creating a disk based table or a memory –optimized table but yes there are a few restrictions and a few new essential extensions. Essentially any memory-optimized table should use the MEMORY_OPTIMIZED = ON clause as shown in the Create Table query example. DURABILITY clause (SCHEMA_AND_DATA or SCHEMA_ONLY) Memory-optimized table should always be defined with a DURABILITY value which can be either SCHEMA_AND_DATA or SCHEMA_ONLY the former being the default. A memory-optimized table defined with DURABILITY=SCHEMA_ONLY will not persist the data to disk which means the data durability is compromised whereas DURABILITY= SCHEMA_AND_DATA ensures that data is also persisted along with the schema. Indexing Memory Optimized Table A memory-optimized table must always have an index for all tables created with DURABILITY= SCHEMA_AND_DATA and this can be achieved by declaring a PRIMARY KEY Constraint at the time of creating a table. The following example shows a PRIMARY KEY index created as a HASH index, for which a bucket count must also be specified. CREATE TABLE Mem_Table ( [Name] VARCHAR(32) NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 100000), [City] VARCHAR(32) NULL, [State_Province] VARCHAR(32) NULL, [LastModified] DATETIME NOT NULL, ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA); Now as you can see in the above query example we have used the clause MEMORY_OPTIMIZED = ON to make sure that it is considered as a memory optimized table and not just a normal table and also used the DURABILITY Clause= SCHEMA_AND_DATA which means it will persist data along with metadata and also you can notice this table has a PRIMARY KEY mentioned upfront which is also a mandatory clause for memory-optimized tables. We will talk more about HASH Indexes and BUCKET_COUNT in later articles on this topic which will be focusing more on Row and Index storage on Memory-Optimized tables. So stay tuned for that as well. Now as we covered the basics of Memory Optimized tables and understood the key things to remember while using memory optimized tables, let’s explore more using examples to understand the Performance gains using memory-optimized tables. I will be using the database which i created earlier in this article i.e. InMemoryDB in the below Demo Exercise. USE InMemoryDB GO -- Creating a disk based table CREATE TABLE dbo.Disktable ( Id INT IDENTITY, Name CHAR(40) ) GO CREATE NONCLUSTERED INDEX IX_ID ON dbo.Disktable (Id) GO -- Creating a memory optimized table with similar structure and DURABILITY = SCHEMA_AND_DATA CREATE TABLE dbo.Memorytable_durable ( Id INT NOT NULL PRIMARY KEY NONCLUSTERED Hash WITH (bucket_count =1000000), Name CHAR(40) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) GO -- Creating an another memory optimized table with similar structure but DURABILITY = SCHEMA_Only CREATE TABLE dbo.Memorytable_nondurable ( Id INT NOT NULL PRIMARY KEY NONCLUSTERED Hash WITH (bucket_count =1000000), Name CHAR(40) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_only) GO -- Now insert 100000 records in dbo.Disktable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Disktable(Name) VALUES('sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END -- Do the same inserts for Memory table dbo.Memorytable_durable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Memorytable_durable VALUES(@i_t, 'sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END -- Now finally do the same inserts for Memory table dbo.Memorytable_nondurable and observe the Time Taken DECLARE @i_t bigint SET @i_t =1 WHILE @i_t<= 100000 BEGIN INSERT INTO dbo.Memorytable_nondurable VALUES(@i_t, 'sachin' + CONVERT(VARCHAR,@i_t)) SET @i_t+=1 END The above 3 Inserts took 1.20 minutes, 54 secs, and 2 secs respectively to insert 100000 records on my machine with 8 Gb RAM. This proves the point that memory-optimized tables can definitely help businesses achieve better performance for their highly transactional business table and memory- optimized tables with Durability SCHEMA_ONLY is even faster as it does not bother persisting its data to disk which makes it supremely fast. Koenig Solutions is one of the few organizations which offer IT training on SQL Server 2014 and all its updates. Now, I leave the decision on using memory_Optimized tables on you, I hope you like this article and it helped you understand the fundamentals of IN-Memory OLTP . Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Koenig

Read the article
SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

- by pinaldave

Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

Read the article
SQL SERVER – Extending SQL Azure with Azure worker role – Guest Post by Paras Doshi

- by pinaldave

This is guest post by Paras Doshi. Paras Doshi is a research Intern at SolidQ.com and a Microsoft student partner. He is currently working in the domain of SQL Azure. SQL Azure is nothing but a SQL server in the cloud. SQL Azure provides benefits such as on demand rapid provisioning, cost-effective scalability, high availability and reduced management overhead. To see an introduction on SQL Azure, check out the post by Pinal here In this article, we are going to discuss how to extend SQL Azure with the Azure worker role. In other words, we will attempt to write a custom code and host it in the Azure worker role; the aim is to add some features that are not available with SQL Azure currently or features that need to be customized for flexibility. This way we extend the SQL Azure capability by building some solutions that run on Azure as worker roles. To understand Azure worker role, think of it as a windows service in cloud. Azure worker role can perform background processes, and to handle processes such as synchronization and backup, it becomes our ideal tool. First, we will focus on writing a worker role code that synchronizes SQL Azure databases. Before we do so, let’s see some scenarios in which synchronization between SQL Azure databases is beneficial: scaling out access over multiple databases enables us to handle workload efficiently As of now, SQL Azure database can be hosted in one of any six datacenters. By synchronizing databases located in different data centers, one can extend the data by enabling access to geographically distributed data Let us see some scenarios in which SQL server to SQL Azure database synchronization is beneficial To backup SQL Azure database on local infrastructure Rather than investing in local infrastructure for increased workloads, such workloads could be handled by cloud Ability to extend data to different datacenters located across the world to enable efficient data access from remote locations Now, let us develop cloud-based app that synchronizes SQL Azure databases. For an Introduction to developing cloud based apps, click here Now, in this article, I aim to provide a bird’s eye view of how a code that synchronizes SQL Azure databases look like and then list resources that can help you develop the solution from scratch. Now, if you newly add a worker role to the cloud-based project, this is how the code will look like. (Note: I have added comments to the skeleton code to point out the modifications that will be required in the code to carry out the SQL Azure synchronization. Note the placement of Setup() and Sync() function.) Click here (http://parasdoshi1989.files.wordpress.com/2011/06/code-snippet-1-for-extending-sql-azure-with-azure-worker-role1.pdf ) Enabling SQL Azure databases synchronization through sync framework is a two-step process. In the first step, the database is provisioned and sync framework creates tracking tables, stored procedures, triggers, and tables to store metadata to enable synchronization. This is one time step. The code for the same is put in the setup() function which is called once when the worker role starts. Now, the second step is continuous (or on demand) synchronization of SQL Azure databases by propagating changes between databases. This is done on a continuous basis by calling the sync() function in the while loop. The code logic to synchronize changes between SQL Azure databases should be put in the sync() function. Discussing the coding part step by step is out of the scope of this article. Therefore, let me suggest you a resource, which is given here. Also, note that before you start developing the code, you will need to install SYNC framework 2.1 SDK (download here). Further, you will reference some libraries before you start coding. Details regarding the same are available in the article that I just pointed to. You will be charged for data transfers if the databases are not in the same datacenter. For pricing information, go here Currently, a tool named DATA SYNC, which is built on top of sync framework, is available in CTP that allows SQL Azure <-> SQL server and SQL Azure <-> SQL Azure synchronization (without writing single line of code); however, in some cases, the custom code shown in this blogpost provides flexibility that is not available with Data SYNC. For instance, filtering is not supported in the SQL Azure DATA SYNC CTP2; if you wish to have such a functionality now, then you have the option of developing a custom code using SYNC Framework. Now, this code can be easily extended to synchronize at some schedule. Let us say we want the databases to get synchronized every day at 10:00 pm. This is what the code will look like now: (http://parasdoshi1989.files.wordpress.com/2011/06/code-snippet-2-for-extending-sql-azure-with-azure-worker-role.pdf) Don’t you think that by writing such a code, we are imitating the functionality provided by the SQL server agent for a SQL server? Think about it. We are scheduling our administrative task by writing custom code – in other words, we have developed a “Light weight SQL server agent for SQL Azure!” Since the SQL server agent is not currently available in cloud, we have developed a solution that enables us to schedule tasks, and thus we have extended SQL Azure with the Azure worker role! Now if you wish to track jobs, you can do so by storing this data in SQL Azure (or Azure tables). The reason is that Windows Azure is a stateless platform, and we will need to store the state of the job ourselves and the choice that you have is SQL Azure or Azure tables. Note that this solution requires custom code and also it is not UI driven; however, for now, it can act as a temporary solution until SQL server agent is made available in the cloud. Moreover, this solution does not encompass functionalities that a SQL server agent provides, but it does open up an interesting avenue to schedule some of the tasks such as backup and synchronization of SQL Azure databases by writing some custom code in the Azure worker role. Now, let us see one more possibility – i.e., running BCP through a worker role in Azure-hosted services and then uploading the backup files either locally or on blobs. If you upload it locally, then consider the data transfer cost. If you upload it to blobs residing in the same datacenter, then no transfer cost applies but the cost on blob size applies. So, before choosing the option, you need to evaluate your preferences keeping the cost associated with each option in mind. In this article, I have shown that Azure worker role solution could be developed to synchronize SQL Azure databases. Moreover, a light-weight SQL server agent for SQL Azure can be developed. Also we discussed the possibility of running BCP through a worker role in Azure-hosted services for backing up our precious SQL Azure data. Thus, we can extend SQL Azure with the Azure worker role. But remember: you will be charged for running Azure worker roles. So at the end of the day, you need to ask – am I willing to build a custom code and pay money to achieve this functionality? I hope you found this blog post interesting. If you have any questions/feedback, you can comment below or you can mail me at Paras[at]student-partners[dot]com Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Azure, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

- by pinaldave

Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Search Results

Search found 848 results on 34 pages for 'sqlauthority'.

Page 33/34 | < Previous Page | 29 30 31 32 33 34 | Next Page >

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by pinaldave

< Previous Page | 29 30 31 32 33 34 | Next Page >