Search Results

Search found 2468 results on 99 pages for 'dave johansen'.

Page 21/99 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >

Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service. Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia: Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL – Quick Start with Explorer Sections of NuoDB – Query NuoDB Database

- by Pinal Dave

This is the third post in the series of the blog posts I am writing about NuoDB. NuoDB is very innovative and easy-to-use product. I can clearly see how one can scale-out NuoDB with so much ease and confidence. In my very first blog post we discussed how we can install NuoDB (link), and in my second post I discussed how we can manage the NuoDB database transaction engines and storage managers with a few clicks (link). Note: You can Download NuoDB from here. In this post, we will learn how we can use the Explorer feature of NuoDB to do various SQL operations. NuoDB has a browser-based Explorer, which is very powerful and has many of the features any IDE would normally have. Let us see how it works in the following step-by-step tutorial. Let us go to the NuoDBNuoDB Console by typing the following URL in your browser: http://localhost:8080/ It will bring you to the QuickStart screen. Make sure that you have created the sample database. If you have not created sample database, click on Create Database and create it successfully. Now go to the NuoDB Explorer by clicking on the main tab, and it will ask you for your domain username and password. Enter the username as a domain and password as a bird. Alternatively you can also enter username as a quickstart and password as a quickstart. Once you enter the password you will be able to see the databases. In our example we have installed the Sample Database hence you will see the Test database in our Database Hierarchy screen. When you click on database it will ask for the database login. Note that Database Login is different from Domain login and you will have to enter your database login over here. In our case the database username is dba and password is goalie. Once you enter a valid username and password it will display your database. Further expand your database and you will notice various objects in your database. Once you explore various objects, select any database and click on Open. When you click on execute, it will display the SQL script to select the data from the table. The autogenerated script displays entire result set from the database. The NuoDB Explorer is very powerful and makes the life of developers very easy. If you click on List SQL Statements it will list all the available SQL statements right away in Query Editor. You can see the popup window in following image. Here is the cool thing for geeks. You can even click on Query Plan and it will display the text based query plan as well. In case of a SELECT, the query plan will be much simpler, however, when we write complex queries it will be very interesting. We can use the query plan tab for performance tuning of the database. Here is another feature, when we click on List Tables in NuoDB Explorer. It lists all the available tables in the query editor. This is very helpful when we are writing a long complex query. Here is a relatively complex example I have built using Inner Join syntax. Right below I have displayed the Query Plan. The query plan displays all the little details related to the query. Well, we just wrote multi-table query and executed it against the NuoDB database. You can use the NuoDB Admin section and do various analyses of the query and its performance. NuoDB is a distributed database built on a patented emergent architecture with full support for SQL and ACID guarantees. It allows you to add Transaction Engine processes to a running system to improve the performance of your system. You can also add a second Storage Engine to your running system for redundancy purposes. Conversely, you can shut down processes when you don’t need the extra database resources. NuoDB also provides developers and administrators with a single intuitive interface for centrally monitoring deployments. If you have read my blog posts and have not tried out NuoDB, I strongly suggest that you download it today and catch up with the learnings with me. Trust me though the product is very powerful, it is extremely easy to learn and use. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

Read the article
SQL SERVER – Example of Performance Tuning for Advanced Users with DB Optimizer

- by Pinal Dave

Performance tuning is such a subject that everyone wants to master it. In beginning everybody is at a novice level and spend lots of time learning how to master the art of performance tuning. However, as we progress further the tuning of the system keeps on getting very difficult. I have understood in my early career there should be no need of ego in the technology field. There are always better solutions and better ideas out there and we should not resist them. Instead of resisting the change and new wave I personally adopt it. Here is a similar example, as I personally progress to the master level of performance tuning, I face that it is getting harder to come up with optimal solutions. In such scenarios I rely on various tools to teach me how I can do things better. Once I learn about tools, I am often able to come up with better solutions when I face the similar situation next time. A few days ago I had received a query where the user wanted to tune it further to get the maximum out of the performance. I have re-written the similar query with the help of AdventureWorks sample database. SELECT * FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Shift s ON edh.ShiftID = s.ShiftID; User had similar query to above query was used in very critical report and wanted to get best out of the query. When I looked at the query – here were my initial thoughts Use only column in the select statements as much as you want in the application Let us look at the query pattern and data workload and find out the optimal index for it Before I give further solutions I was told by the user that they need all the columns from all the tables and creating index was not allowed in their system. He can only re-write queries or use hints to further tune this query. Now I was in the constraint box – I believe * was not a great idea but if they wanted all the columns, I believe we can’t do much besides using *. Additionally, if I cannot create a further index, I must come up with some creative way to write this query. I personally do not like to use hints in my application but there are cases when hints work out magically and gives optimal solutions. Finally, I decided to use Embarcadero’s DB Optimizer. It is a fantastic tool and very helpful when it is about performance tuning. I have previously explained how it works over here. First open DBOptimizer and open Tuning Job from File >> New >> Tuning Job. Once you open DBOptimizer Tuning Job follow the various steps indicates in the following diagram. Essentially we will take our original script and will paste that into Step 1: New SQL Text and right after that we will enable Step 2 for Generating Various cases, Step 3 for Detailed Analysis and Step 4 for Executing each generated case. Finally we will click on Analysis in Step 5 which will generate the report detailed analysis in the result pan. The detailed pan looks like. It generates various cases of T-SQL based on the original query. It applies various hints and available hints to the query and generate various execution plans of the query and displays them in the resultant. You can clearly notice that original query had a cost of 0.0841 and logical reads about 607 pages. Whereas various options which are just following it has different execution cost as well logical read. There are few cases where we have higher logical read and there are few cases where as we have very low logical read. If we pay attention the very next row to original query have Merge_Join_Query in description and have lowest execution cost value of 0.044 and have lowest Logical Reads of 29. This row contains the query which is the most optimal re-write of the original query. Let us double click over it. Here is the query: SELECT * FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Shift s ON edh.ShiftID = s.ShiftID OPTION (MERGE JOIN) If you notice above query have additional hint of Merge Join. With the help of this Merge Join query hint this query is now performing much better than before. The entire process takes less than 60 seconds. Please note that it the join hint Merge Join was optimal for this query but it is not necessary that the same hint will be helpful in all the queries. Additionally, if the workload or data pattern changes the query hint of merge join may be no more optimal join. In that case, we will have to redo the entire exercise once again. This is the reason I do not like to use hints in my queries and I discourage all of my users to use the same. However, if you look at this example, this is a great case where hints are optimizing the performance of the query. It is humanly not possible to test out various query hints and index options with the query to figure out which is the most optimal solution. Sometimes, we need to depend on the efficiency tools like DB Optimizer to guide us the way and select the best option from the suggestion provided. Let me know what you think of this article as well your experience with DB Optimizer. Please leave a comment. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Joins, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Parsing SSIS Catalog Messages – Notes from the Field #030

- by Pinal Dave

[Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data. In this episode of the Notes from the Field series I requested SSIS Expert Andy Leonard to discuss one of the most interesting concepts of SSIS Catalog Messages. There are plenty of interesting and useful information captured in the SSIS catalog and we will learn together how to explore the same. The SSIS Catalog captures a lot of cool information by default. Here’s a query I use to parse messages from the catalog.operation_messages table in the SSISDB database, where the logged messages are stored. This query is set up to parse a default message transmitted by the Lookup Transformation. It’s one of my favorite messages in the SSIS log because it gives me excellent information when I’m tuning SSIS data flows. The message reads similar to: Data Flow Task:Information: The Lookup processed 4485 rows in the cache. The processing time was 0.015 seconds. The cache used 1376895 bytes of memory. The query: USE SSISDB GO DECLARE @MessageSourceType INT = 60 DECLARE @StartOfIDString VARCHAR(100) = 'The Lookup processed ' DECLARE @ProcessingTimeString VARCHAR(100) = 'The processing time was ' DECLARE @CacheUsedString VARCHAR(100) = 'The cache used ' DECLARE @StartOfIDSearchString VARCHAR(100) = '%' + @StartOfIDString + '%' DECLARE @ProcessingTimeSearchString VARCHAR(100) = '%' + @ProcessingTimeString + '%' DECLARE @CacheUsedSearchString VARCHAR(100) = '%' + @CacheUsedString + '%' SELECT operation_id , SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1))) AS LookupRowsCount , SUBSTRING(MESSAGE, (PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1)) - (PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))) AS LookupProcessingTime , CASE WHEN (CONVERT(numeric(3,3),SUBSTRING(MESSAGE, (PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1)) - (PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))))) = 0 THEN 0 ELSE CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))) / CONVERT(numeric(3,3),SUBSTRING(MESSAGE, (PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1)) - (PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1)))) END AS LookupRowsPerSecond , SUBSTRING(MESSAGE, (PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1)) - (PATINDEX(@CacheUsedSearchString, MESSAGE) + LEN(@CacheUsedString) + 1))) AS LookupBytesUsed ,CASE WHEN (CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))))= 0 THEN 0 ELSE CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1)) - (PATINDEX(@CacheUsedSearchString, MESSAGE) + LEN(@CacheUsedString) + 1)))) / CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))) END AS LookupBytesPerRow FROM [catalog].[operation_messages] WHERE message_source_type = @MessageSourceType AND MESSAGE LIKE @StartOfIDSearchString GO Note that you have to set some parameter values: @MessageSourceType [int] – represents the message source type value from the following results: Value Description 10 Entry APIs, such as T-SQL and CLR Stored procedures 20 External process used to run package (ISServerExec.exe) 30 Package-level objects 40 Control Flow tasks 50 Control Flow containers 60 Data Flow task 70 Custom execution message Note: Taken from Reza Rad’s (excellent!) helper.MessageSourceType table found here. @StartOfIDString [VarChar(100)] – use this to uniquely identify the message field value you wish to parse. In this case, the string ‘The Lookup processed ‘ identifies all the Lookup Transformation messages I desire to parse. @ProcessingTimeString [VarChar(100)] – this parameter is message-specific. I use this parameter to specifically search the message field value for the beginning of the Lookup Processing Time value. For this execution, I use the string ‘The processing time was ‘. @CacheUsedString [VarChar(100)] – this parameter is also message-specific. I use this parameter to specifically search the message field value for the beginning of the Lookup Cache Used value. It returns the memory used, in bytes. For this execution, I use the string ‘The cache used ‘. The other parameters are built from variations of the parameters listed above. The query parses the values into text. The string values are converted to numeric values for ratio calculations; LookupRowsPerSecond and LookupBytesPerRow. Since ratios involve division, CASE statements check for denominators that equal 0. Here are the results in an SSMS grid: This is not the only way to retrieve this information. And much of the code lends itself to conversion to functions. If there is interest, I will share the functions in an upcoming post. If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

Read the article
Big Data – What is Big Data – 3 Vs of Big Data – Volume, Velocity and Variety – Day 2 of 21

- by Pinal Dave

Data is forever. Think about it – it is indeed true. Are you using any application as it is which was built 10 years ago? Are you using any piece of hardware which was built 10 years ago? The answer is most certainly No. However, if I ask you – are you using any data which were captured 50 years ago, the answer is most certainly Yes. For example, look at the history of our nation. I am from India and we have documented history which goes back as over 1000s of year. Well, just look at our birthday data – atleast we are using it till today. Data never gets old and it is going to stay there forever. Application which interprets and analysis data got changed but the data remained in its purest format in most cases. As organizations have grown the data associated with them also grew exponentially and today there are lots of complexity to their data. Most of the big organizations have data in multiple applications and in different formats. The data is also spread out so much that it is hard to categorize with a single algorithm or logic. The mobile revolution which we are experimenting right now has completely changed how we capture the data and build intelligent systems. Big organizations are indeed facing challenges to keep all the data on a platform which give them a single consistent view of their data. This unique challenge to make sense of all the data coming in from different sources and deriving the useful actionable information out of is the revolution Big Data world is facing. Defining Big Data The 3Vs that define Big Data are Variety, Velocity and Volume. Volume We currently see the exponential growth in the data storage as the data is now more than text data. We can find data in the format of videos, musics and large images on our social media channels. It is very common to have Terabytes and Petabytes of the storage system for enterprises. As the database grows the applications and architecture built to support the data needs to be reevaluated quite often. Sometimes the same data is re-evaluated with multiple angles and even though the original data is the same the new found intelligence creates explosion of the data. The big volume indeed represents Big Data. Velocity The data growth and social media explosion have changed how we look at the data. There was a time when we used to believe that data of yesterday is recent. The matter of the fact newspapers is still following that logic. However, news channels and radios have changed how fast we receive the news. Today, people reply on social media to update them with the latest happening. On social media sometimes a few seconds old messages (a tweet, status updates etc.) is not something interests users. They often discard old messages and pay attention to recent updates. The data movement is now almost real time and the update window has reduced to fractions of the seconds. This high velocity data represent Big Data. Variety Data can be stored in multiple format. For example database, excel, csv, access or for the matter of the fact, it can be stored in a simple text file. Sometimes the data is not even in the traditional format as we assume, it may be in the form of video, SMS, pdf or something we might have not thought about it. It is the need of the organization to arrange it and make it meaningful. It will be easy to do so if we have data in the same format, however it is not the case most of the time. The real world have data in many different formats and that is the challenge we need to overcome with the Big Data. This variety of the data represent represent Big Data. Big Data in Simple Words Big Data is not just about lots of data, it is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data. It makes any business more agile and robust so it can adapt and overcome business challenges. Tomorrow In tomorrow’s blog post we will try to answer discuss Evolution of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – An Efficiency Tool to Compare and Synchronize SQL Server Databases

- by Pinal Dave

There is no need to reinvent the wheel if it is already invented and if the wheel is already available at ease, there is no need to wait to grab it. Here is the similar situation. I came across a very interesting situation and I had to look for an efficient tool which can make my life easier and solve my business problem. Here is the scenario. One of the developers had deleted few rows from the very important mapping table of our development server (thankfully, it was not the production server). Though it was a development server, the entire development team had to stop working as the application started to crash on every page. Think about the lost of manpower and efficiency which we started to loose. Pretty much every department had to stop working as our internal development application stopped working. Thankfully, we even take a backup of our development server and we had access to full backup of the entire database at 6 AM morning. We do not take as a frequent backup of development server as production server (naturally!). Even though we had a full backup, the solution was not to restore the database. Think about it, there were plenty of the other operations since the last good full backup and if we restore a full backup, we will pretty much overwrite on the top of the work done by developers since morning. Now, as restoring the full backup was not an option we decided to restore the same database on another server. Once we had restored our database to another server, the challenge was to compare the table from where the database was deleted. The mapping table from where the data were deleted contained over 5000 rows and it was humanly impossible to compare both the tables manually. Finally we decided to use efficiency tool dbForge Data Compare for SQL Server from DevArt. dbForge Data Compare for SQL Server is a powerful, fast and easy to use SQL compare tool, capable of using native SQL Server backups as metadata source. (FYI we Downloaded dbForge Data Compare) Once we discovered the product, we immediately downloaded the product and installed on our development server. After we installed the product, we were greeted with the following screen. We clicked on the New Data Comparision to start our new comparison project. It brought up following screen. Here is the best part of the product, we just had to enter our database connection username and password along with source and destination details and we are done. The entire process is very simple and self intuiting. The best part was that for the source, we can either select database or even backup. This was indeed fantastic feature. Think about this, if you have a very big database, it will take long time to restore on the server. Once it is restored, you will be able to work with it. However, when you are working with dbForge Data Compare it will accept database backup as your source or destination. Once I click on the execute it brought up following screen where it displayed an excellent summary of the data compare. It has dedicated tabs for the what is changing in what table as well had details of the changed data. The best part is that, once we had reviewed the change. We click on the Synchronize button in the menu bar and it brought up following screen. You can see that the screen has very simple straight forward but very powerful features. You can generate a script to synchronize from target to source or even from source to target. Additionally, the database is a very complicated world and there are extensive options to configure various database options on the next screen. We also have the option to either generate script or directly execute the script to target server. I like to play on the safe side and I generated the script for my synchronization and later on after review I deployed the scripts on the server. Well, my team and we were able to get going from our disaster in less than 10 minutes. There were few people in our team were indeed disappointed as they were thinking of going home early that day but in less than 10 minutes they had to get back to work. There are so many other features in dbForge Data Compare for SQL Server, I am already planning to make this product company wide recommended product for Data Compare tool. Hats off to the team who have build this product. Here are few of the features salient features of the dbForge Data Compare for SQL Server Perform SQL Server database comparison to detect changes Compare SQL Server backups with live databases Analyze data differences between two databases Synchronize two databases that went out of sync Restore data of a particular table from the backup Generate data comparison reports in Excel and HTML formats Copy look-up data from development database to production Automate routine data synchronization tasks with command-line interface Go Ahead and Download the dbForge Data Compare for SQL Server right away. It is always a good idea to get familiar with the important tools before hand instead of learning it under pressure of disaster. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article
SQL SERVER – Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2

- by Pinal Dave

This is the second part of the series Incremental Statistics. Here is the index of the complete series. What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1 Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2 DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3 In part 1 we have understood what is incremental statistics and now in this second part we will see a simple example of incremental statistics. This blog post is heavily inspired from my friend Balmukund’s must read blog post. If you have partitioned table and lots of data, this feature can be specifically very useful. Prerequisite Here are two things you must know before you start with the demonstrations. AdventureWorks – For the demonstration purpose I have installed AdventureWorks 2012 as an AdventureWorks 2014 in this demonstration. Partitions – You should know how partition works with databases. Setup Script Here is the setup script for creating Partition Function, Scheme, and the Table. We will populate the table based on the SalesOrderDetails table from AdventureWorks. -- Use Database USE AdventureWorks2014 GO -- Create Partition Function CREATE PARTITION FUNCTION IncrStatFn (INT) AS RANGE LEFT FOR VALUES (44000, 54000, 64000, 74000) GO -- Create Partition Scheme CREATE PARTITION SCHEME IncrStatSch AS PARTITION [IncrStatFn] TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]) GO -- Create Table Incremental_Statistics CREATE TABLE [IncrStatTab]( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [ModifiedDate] [datetime] NOT NULL) ON IncrStatSch(SalesOrderID) GO -- Populate Table INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate]) SELECT [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE SalesOrderID < 54000 GO Check Details Now we will check details in the partition table IncrStatSch. -- Check the partition SELECT * FROM sys.partitions WHERE OBJECT_ID = OBJECT_ID('IncrStatTab') GO You will notice that only a few of the partition are filled up with data and remaining all the partitions are empty. Now we will create statistics on the Table on the column SalesOrderID. However, here we will keep adding one more keyword which is INCREMENTAL = ON. Please note this is the new keyword and feature added in SQL Server 2014. It did not exist in earlier versions. -- Create Statistics CREATE STATISTICS IncrStat ON [IncrStatTab] (SalesOrderID) WITH FULLSCAN, INCREMENTAL = ON GO Now we have successfully created statistics let us check the statistical histogram of the table. Now let us once again populate the table with more data. This time the data are entered into a different partition than earlier populated partition. -- Populate Table INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate]) SELECT [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE SalesOrderID > 54000 GO Let us check the status of the partition once again with following script. -- Check the partition SELECT * FROM sys.partitions WHERE OBJECT_ID = OBJECT_ID('IncrStatTab') GO Statistics Update Now here has the new feature come into action. Previously, if we have to update the statistics, we will have to FULLSCAN the entire table irrespective of which partition got the data. However, in SQL Server 2014 we can just specify which partition we want to update in terms of Statistics. Here is the script for the same. -- Update Statistics Manually UPDATE STATISTICS IncrStatTab (IncrStat) WITH RESAMPLE ON PARTITIONS(3, 4) GO Now let us check the statistics once again. -- Show Statistics DBCC SHOW_STATISTICS('IncrStatTab', IncrStat) WITH HISTOGRAM GO Upon examining statistics histogram, you will notice that now the distribution has changed and there is way more rows in the histogram. Summary The new feature of Incremental Statistics is indeed a boon for the scenario where there are partitions and statistics needs to be updated frequently on the partitions. In earlier version to update statistics one has to do FULLSCAN on the entire table which was wasting too many resources. With the new feature in SQL Server 2014, now only those partitions which are significantly changed can be specified in the script to update statistics. Cleanup You can clean up the database by executing following scripts. -- Clean up DROP TABLE [IncrStatTab] DROP PARTITION SCHEME [IncrStatSch] DROP PARTITION FUNCTION [IncrStatFn] GO Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SQL Statistics, Statistics

Read the article
SQL SERVER – Auditing and Profiling Database Made Easy with SQL Audit and Comply

- by Pinal Dave

Do you like auditing your database, or can you think of about a million other things you’d rather do? Unfortunately, auditing is incredibly important. As with tax audits, it is important to audit databases to ensure they are following all the rules, but they are also important for troubleshooting and security. There are several ways to audit SQL Server. There is manual auditing, which is going through your database “by hand,” and obviously takes a long time and is quite inefficient. SQL Server also provides programs to help you audit your systems. Different administrators will have different opinions about best practices and which tools to use, and each one will be perfected for certain systems and certain users. Today, though, I would like to talk about Apex SQL Audit. It is an auditing tool that acts like “track changes” in a word processing document. It will log what has changed on the database, who made the changes, and what effects these changes have had (i.e. what objects were affected down the line). All this information is logged, and can be easily viewed or printed for easy access. One of the best features of Apex is that it is so customizable (and easy to use!). First, start Apex. Then you can connect to the database you would like to monitor. Once you select your database, you can select which table you want to audit. You can customize right down to the field you’d like to audit, and then select which types of actions you’d like tracked – insert, delete, or update. Repeat these steps for every database you want monitored. To create the logs, choose “Create triggers” in the menu. The script written here will be what logs each insert, delete, and update function. Press F5 to execute. All this tracking information will be stored in AUDIT_LOG_DATA and AUDIT_LOG_TRANSACTIONS tables. View these tables using ApexSQL Audit reports. These transaction logs can be extremely detailed – especially on very busy servers, where every move it traced. Reading them can be overwhelming, to say the least. Apex has tried to make things easier for the average DBA, though. You can read these tracking logs in Apex, and it will display data and objects that affect your server – even things that were happening on your server before you installed Apex! To read these logs, open Apex, and connect to that database you want to audit. Go to the Transaction Logs tab, and add the logs you want to read. To narrow down what results you want to see, you can use the Filter tab to choose time, operation type, name, users, and more. Click Open, and you can see the results in a grid (as shown below). You can export these results to CSV, HTML, XML or SQL files and save on the hard disk. One of the advantages is that since there are no triggers here, there are no other processes that will affect SQL Server performance. Using this method is also how to view history from your database that occurred before Apex was installed. This type of tracking does require storage space for the data sources, as the database must be fully running, and the transaction logs must exist (things not stored in the transactions logs will not be recoverable). Apex can also replace SQL Server Profiler and SQL Server Traces – which are much more complex and error-prone – with its ApexSQL Comply. It can do fault tolerant auditing, centralized reporting, and “who saw what” information in an easy-to-use interface. The tracking settings can be altered by the user, or the default options will provide solutions to the most common auditing problems. To get started: open ApexSQL Comply, and selected Database Filter Settings to choose which database you’d like to audit. You can select which tracking you’re like in Operation Types – DML, DDL, queries executed, execute statements, and more. To get started, click Start Auditing. After this, every action will be stored in the central repository database (ApexSQLCrd). You can view the audit and create a report (or view the standard default report) using a wizard. You can see how easy it is to use ApexSQL Comply. You can easily set audits, including the type and time, and create customized reports. Remote users can easily access the reports through the user interface (available online, as well), and security concerns are all taken care of by the program. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article
SQL SERVER – SSMS: Top Object and Batch Execution Statistics Reports

- by Pinal Dave

The month of June till mid of July has been the fever of sports. First, it was Wimbledon Tennis and then the Soccer fever was all over. There is a huge number of fan followers and it is great to see the level at which people sometimes worship these sports. Being an Indian, I cannot forget to mention the India tour of England later part of July. Following these sports and as the events unfold to the finals, there are a number of ways the statisticians can slice and dice the numbers. Cue from soccer I can surely say there is a team performance against another team and then there is individual member fairs against a particular opponent. Such statistics give us a fair idea to how a team in the past or in the recent past has fared against each other, head-to-head stats during World cup and during other neutral venue games. All these statistics are just pointers. In reality, they don’t reflect the calibre of the current team because the individuals who performed in each of these games are totally different (Typical example being the Brazil Vs Germany semi-final match in FIFA 2014). So at times these numbers are misleading. It is worth investigating and get the next level information. Similar to these statistics, SQL Server Management studio is also equipped with a number of reports like a) Object Execution Statistics report and b) Batch Execution Statistics reports. As discussed in the example, the team scorecard is like the Batch Execution statistics and individual stats is like Object Level statistics. The analogy can be taken only this far, trust me there is no correlation between SQL Server functioning and playing sports – It is like I think about diet all the time except while I am eating. Performance – Batch Execution Statistics Let us view the first report which can be invoked from Server Node -> Reports -> Standard Reports -> Performance – Batch Execution Statistics. Most of the values that are displayed in this report come from the DMVs sys.dm_exec_query_stats and sys.dm_exec_sql_text(sql_handle). This report contains 3 distinctive sections as outline below. Section 1: This is a graphical bar graph representation of Average CPU Time, Average Logical reads and Average Logical Writes for individual batches. The Batch numbers are indicative and the details of individual batch is available in section 3 (detailed below). Section 2: This represents a Pie chart of all the batches by Total CPU Time (%) and Total Logical IO (%) by batches. This graphical representation tells us which batch consumed the highest CPU and IO since the server started, provided plan is available in the cache. Section 3: This is the section where we can find the SQL statements associated with each of the batch Numbers. This also gives us the details of Average CPU / Average Logical Reads and Average Logical Writes in the system for the given batch with object details. Expanding the rows, I will also get the # Executions and # Plans Generated for each of the queries. Performance – Object Execution Statistics The second report worth a look is Object Execution statistics. This is a similar report as the previous but turned on its head by SQL Server Objects. The report has 3 areas to look as above. Section 1 gives the Average CPU, Average IO bar charts for specific objects. The section 2 is a graphical representation of Total CPU by objects and Total Logical IO by objects. The final section details the various objects in detail with the Avg. CPU, IO and other details which are self-explanatory. At a high-level both the reports are based on queries on two DMVs (sys.dm_exec_query_stats and sys.dm_exec_sql_text) and it builds values based on calculations using columns in them: SELECT * FROM sys.dm_exec_query_stats s1 CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 WHERE s2.objectid IS NOT NULL AND DB_NAME(s2.dbid) IS NOT NULL ORDER BY s1.sql_handle; This is one of the simplest form of reports and in future blogs we will look at more complex reports. I truly hope that these reports can give DBAs and developers a hint about what is the possible performance tuning area. As a closing point I must emphasize that all above reports pick up data from the plan cache. If a particular query has consumed a lot of resources earlier, but plan is not available in the cache, none of the above reports would show that bad query. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
SQL SERVER – SSIS Parameters in Parent-Child ETL Architectures – Notes from the Field #040

- by Pinal Dave

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Parent-Child ETL architecture’s relationship in SSIS. Linchpin People are database coaches and wellness experts for a data driven world. In this 40th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand SSIS Parameters in Parent-Child ETL Architectures. In this brief Notes from the Field post, I will review the use of SSIS parameters in parent-child ETL architectures. A very common design pattern used in SQL Server Integration Services is one I call the parent-child pattern. Simply put, this is a pattern in which packages are executed by other packages. An ETL infrastructure built using small, single-purpose packages is very often easier to develop, debug, and troubleshoot than large, monolithic packages. For a more in-depth look at parent-child architectures, check out my earlier blog post on this topic. When using the parent-child design pattern, you will frequently need to pass values from the calling (parent) package to the called (child) package. In older versions of SSIS, this process was possible but not necessarily simple. When using SSIS 2005 or 2008, or even when using SSIS 2012 or 2014 in package deployment mode, you would have to create package configurations to pass values from parent to child packages. Package configurations, while effective, were not the easiest tool to work with. Fortunately, starting with SSIS in SQL Server 2012, you can now use package parameters for this purpose. In the example I will use for this demonstration, I’ll create two packages: one intended for use as a child package, and the other configured to execute said child package. In the parent package I’m going to build a for each loop container in SSIS, and use package parameters to pass in a value – specifically, a ClientID – for each iteration of the loop. The child package will be executed from within the for each loop, and will create one output file for each client, with the source query and filename dependent on the ClientID received from the parent package. Configuring the Child and Parent Packages When you create a new package, you’ll see the Parameters tab at the package level. Clicking over to that tab allows you to add, edit, or delete package parameters. As shown above, the sample package has two parameters. Note that I’ve set the name, data type, and default value for each of these. Also note the column entitled Required: this allows me to specify whether the parameter value is optional (the default behavior) or required for package execution. In this example, I have one parameter that is required, and the other is not. Let’s shift over to the parent package briefly, and demonstrate how to supply values to these parameters in the child package. Using the execute package task, you can easily map variable values in the parent package to parameters in the child package. The execute package task in the parent package, shown above, has the variable vThisClient from the parent package mapped to the pClientID parameter shown earlier in the child package. Note that there is no value mapped to the child package parameter named pOutputFolder. Since this parameter has the Required property set to False, we don’t have to specify a value for it, which will cause that parameter to use the default value we supplied when designing the child pacakge. The last step in the parent package is to create the for each loop container I mentioned earlier, and place the execute package task inside it. I’m using an object variable to store the distinct client ID values, and I use that as the iterator for the loop (I describe how to do this more in depth here). For each iteration of the loop, a different client ID value will be passed into the child package parameter. The final step is to configure the child package to actually do something meaningful with the parameter values passed into it. In this case, I’ve modified the OleDB source query to use the pClientID value in the WHERE clause of the query to restrict results for each iteration to a single client’s data. Additionally, I’ll use both the pClientID and pOutputFolder parameters to dynamically build the output filename. As shown, the pClientID is used in the WHERE clause, so we only get the current client’s invoices for each iteration of the loop. For the flat file connection, I’m setting the Connection String property using an expression that engages both of the parameters for this package, as shown above. Parting Thoughts There are many uses for package parameters beyond a simple parent-child design pattern. For example, you can create standalone packages (those not intended to be used as a child package) and still use parameters. Parameter values may be supplied to a package directly at runtime by a SQL Server Agent job, through the command line (via dtexec.exe), or through T-SQL. Also, you can also have project parameters as well as package parameters. Project parameters work in much the same way as package parameters, but the parameters apply to all packages in a project, not just a single package. Conclusion Of the numerous advantages of using catalog deployment model in SSIS 2012 and beyond, package parameters are near the top of the list. Parameters allow you to easily share values from parent to child packages, enabling more dynamic behavior and better code encapsulation. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Planned and Unplanned Availablity Group Failovers – Notes from the Field #031

- by Pinal Dave

[Note from Pinal]: This is a new episode of Notes from the Fields series. AlwaysOn is a very complex subject and not everyone knows many things about this. The matter of the fact is there is very little information available on this subject online and not everyone knows everything about this. This is why when a very common question related to AlwaysOn comes, people get confused. In this episode of the Notes from the Field series database expert John Sterrett (Group Principal at Linchpin People) explains a very common issue DBAs and Developer faces in their career and is related to Planned and Unplanned Availablity Group Failovers. Linchpin People are database coaches and wellness experts for a data driven world. Read the experience of John in his own words. Whenever a disaster occurs it will be a stressful scenario regardless of how small or big the disaster is. This gets multiplied when it is your first time working with newer technology or the first time you are going through a disaster without a proper run book. Today, were going to help you establish a run book for creating a planned failover with availability groups. To make today’s session simple were going to have two instances of SQL Server 2012 included in an availability group and walk through the steps of doing an unplanned failover. We will focus on using the user interface and T-SQL to complete the failovers. We are going to use a two replica Availability Group where each replica is in another location. Therefore, we will be covering Asynchronous (non automatic failover) the following is a breakdown of our availability group utilized today. Seeing the following screen might be scary the first time you come across an unplanned failover. It looks like our test database used in this Availability Group is not functional and it currently isn’t. The database status is not synchronizing which makes sense because the primary replica went down so it couldn’t synchronize. With that said, we can still failover and make it functional while we troubleshoot why we lost our primary replica. To start we are going to right click on the availability group that needs to be restarted and select failover. This will bring up the following wizard, which will walk you through several steps needed to complete the failover using the graphical user interface provided with SQL Server Management Studio (SSMS). You are going to see warning messages simply because we are in Asynchronous commit mode and can not guarantee ‘no data loss’ when we do failover. Just incase you missed it; you get another screen warning you about potential data loss because we are in Asynchronous mode. Next we get to connect to the specific replica we want to become the primary replica after the failover occurs. In our case, we only have two replicas so this is trivial. In order to failover, it’s required to connect to the replica that will become primary. The following screen shows that the connection has been made successfully. Next, you will see the final summary screen. Once again, this reminds you that the failover action will cause data loss as were using Asynchronous commit mode due to the distance between instances used for disaster recovery. Finally, once the failover is completed you will see the following screen. If you followed along this long you might be wondering what T-SQL scripts are generated for clicking through all the sections of the wizard. If you have used Database Mirroring in the past you might be surprised. It’s not too different, which makes sense because the data is being replicated via SQL Server endpoints just like the good old database mirroring. Now were going to take a look at how to do a failover with just T-SQL. First, were going to need to open a new query window and run our query in SQLCMD mode. Just incase you haven’t used SQLCMD mode before we will show you how to enable it below. Now you can run the following statement. Notice, we connect to the replica we want to become primary after failover and specify to force failover to allow data loss. We can use the following script to failback over when our primary instance comes back online. -- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE. :Connect SQL2012PROD1 ALTER AVAILABILITY GROUP [AGSQL2] FORCE_FAILOVER_ALLOW_DATA_LOSS; GO Are your servers running at optimal speed or are you facing any SQL Server Performance Problems? If you want to get started with the help of experts read more over here: Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Evolution of Big Data – Day 3 of 21

- by Pinal Dave

In yesterday’s blog post we answered what is the Big Data. Today we will understand why and how the evolution of Big Data has happened. Though the answer is very simple, I would like to tell it in the form of a history lesson. Data in Flat File In earlier days data was stored in the flat file and there was no structure in the flat file. If any data has to be retrieved from the flat file it was a project by itself. There was no possibility of retrieving the data efficiently and data integrity has been just a term discussed without any modeling or structure around. Database residing in the flat file had more issues than we would like to discuss in today’s world. It was more like a nightmare when there was any data processing involved in the application. Though, applications developed at that time were also not that advanced the need of the data was always there and there was always need of proper data management. Edgar F Codd and 12 Rules Edgar Frank Codd was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases. He presented 12 rules for the Relational Database and suddenly the chaotic world of the database seems to see discipline in the rules. Relational Database was a promising land for all the unstructured database users. Relational Database brought into the relationship between data as well improved the performance of the data retrieval. Database world had immediately seen a major transformation and every single vendors and database users suddenly started to adopt the relational database models. Relational Database Management Systems Since Edgar F Codd proposed 12 rules for the RBDMS there were many different vendors who started them to build applications and tools to support the relationship between database. This was indeed a learning curve for many of the developer who had never worked before with the modeling of the database. However, as time passed by pretty much everybody accepted the relationship of the database and started to evolve product which performs its best with the boundaries of the RDBMS concepts. This was the best era for the databases and it gave the world extreme experts as well as some of the best products. The Entity Relationship model was also evolved at the same time. In software engineering, an Entity–relationship model (ER model) is a data model for describing a database in an abstract way. Enormous Data Growth Well, everything was going fine with the RDBMS in the database world. As there were no major challenges the adoption of the RDBMS applications and tools was pretty much universal. There was a race at times to make the developer’s life much easier with the RDBMS management tools. Due to the extreme popularity and easy to use system pretty much every data was stored in the RDBMS system. New age applications were built and social media took the world by the storm. Every organizations was feeling pressure to provide the best experience for their users based the data they had with them. While this was all going on at the same time data was growing pretty much every organization and application. Data Warehousing The enormous data growth now presented a big challenge for the organizations who wanted to build intelligent systems based on the data and provide near real time superior user experience to their customers. Various organizations immediately start building data warehousing solutions where the data was stored and processed. The trend of the business intelligence becomes the need of everyday. Data was received from the transaction system and overnight was processed to build intelligent reports from it. Though this is a great solution it has its own set of challenges. The relational database model and data warehousing concepts are all built with keeping traditional relational database modeling in the mind and it still has many challenges when unstructured data was present. Interesting Challenge Every organization had expertise to manage structured data but the world had already changed to unstructured data. There was intelligence in the videos, photos, SMS, text, social media messages and various other data sources. All of these needed to now bring to a single platform and build a uniform system which does what businesses need. The way we do business has also been changed. There was a time when user only got the features what technology supported, however, now users ask for the feature and technology is built to support the same. The need of the real time intelligence from the fast paced data flow is now becoming a necessity. Large amount (Volume) of difference (Variety) of high speed data (Velocity) is the properties of the data. The traditional database system has limits to resolve the challenges this new kind of the data presents. Hence the need of the Big Data Science. We need innovation in how we handle and manage data. We need creative ways to capture data and present to users. Big Data is Reality! Tomorrow In tomorrow’s blog post we will try to answer discuss Basics of Big Data Architecture. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – The Story of a Lesser Known Startup Parameter in SQL Server – Guest Post by Balmukund Lakhani

- by Pinal Dave

This is a fantastic blog post from my dear friend Balmukund ( blog | twitter | facebook ). He had presented a fantastic session in our last UG and there were lots of requests from attendees that he blogs about it. Well, here is the blog post about the same very popular UG session. Let us read the entire blog post in the voice of the Balmukund himself. During my last session in SQL Bangalore User Group (Facebook) meeting, I was lucky enough to deliver a session on SQL Server Startup issue. The name of the session was “SQL Engine Starting Trouble – How to start?” From the feedback, I realized that one of the “not well known” startup parameter is “-m”. Okay, you might say “I know that this is used to start the SQL in single user mode”. But what you might not know is that you can pass a string with -m which has special meaning and use. I have used this parameter in my blog here but looks like not many of you have seen that. It happens most of the time when we want to start SQL Server in single user mode, someone else makes connection before you can. The only choice you have is to repeat same process again till you succeed. Some smart DBAs may disable the remote network protocols (TCP/IP and Named Pipes) of SQL Instance and allow only local connections to SQL. Once the activity is complete, our dear smart DBA has to remember to re-enable network protocols. Sometimes, it may be a local service or application getting connection to SQL before we can. There is a better way to deal with it. Yes, you have guessed it correctly: -m parameter which a string. Since I work with SQL Product Support team, I may know little more undocumented commands and parameters, but this is not an undocumented stuff. It’s already documented in books online. So in this blog, I am going to show a demo of its usage. As documentation shows, “Do not use this option as a security feature.” So please read this blog as knowledge enhancer and troubleshooting issues not security feature. In my laptop, I have a default instance of SQL Server 2012 and here is what we would in the configuration manager. Now, I would go ahead and stop SQL Service by selecting SQL Server (MSSQLServer) > Right Click > Stop. There are multiple ways to start SQL with startup parameter. 1) Use Net Start Command from command prompt Net Start MSSQLServer /mSQLCMD The above command is the simplest way to add startup parameter to SQL. This parameter would be cleared once we stop and start SQL. 2) Add Startup Parameter via configuration manager. Step is already listed here. We need to add -mSQLCMD If we compare 1 and 2, it’s clear that unless we modify startup parameter and remove -m, it would be in effect. 3) Start SQL Service via command line SQLServr.exe –mSQLCMD –s<InstanceName> Wait, what does SQLCMD mean with /m? It’s the instruction to SQL that start SQL Server in Single User Mode and allow only the application which is SQLCMD. Any other application would fail with Login Failed for User Error message. It would be important to note that string is case sensitive. This value should be picked up from application_name column from sys.dm_exec_sessions. I have made a connection using SQLCMD and as we can see it comes as upper case “SQLCMD”. If we want only management studio query windows to connect then we need to give -m” Microsoft SQL Server Management Studio – Query” as startup parameter. In below example, I have given it as SQLCMd (lower case d at the end) and we would notice that we would not be able to connect to SQL Instance. Above proves that parameter works as expected and it’s case sensitive. Error Log would show below information. How to get error log location? I have already blogged about it. Hope you have learned something new. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – SSMS: Database Consistency History Report

- by Pinal Dave

Doctor and Database The last place I like to visit is always a hospital. With the monsoon season starting, intermittent rains, it has become sort of a routine to get a cycle of fever every other year (seriously I hate it). So when I visit my doctor, it is always interesting in the way he quizzes me. The routine question of – “How many days have you had this?”, “Is there any pattern?”, “Did you drench in rain?”, “Do you have any other symptom?” and so on. The idea here is that the doctor wants to find any anomaly or a pattern that will guide him to a viral or bacterial type. Most of the time they get it based on experience and sometimes after a battery of tests. So if there is consistent behavior to your problem, there is always a solution out. SQL Server has its way to find if the server data / files are in consistent state using the DBCC commands. Back to SQL Server In real life, Database consistency check is one of the critical operations a DBA generally doesn’t give much priority. Many readers of my blogs have asked many times, how do we know if the database is consistent? How do I read output of DBCC CHECKDB and find if everything is right or not? My common answer to all of them is – look at the bottom of checkdb (or checktable) output and look for below line. CHECKDB found 0 allocation errors and 0 consistency errors in database ‘DatabaseName’. Above is a “good sign” because we are seeing zero allocation and zero consistency error. If you are seeing non-zero errors then there is some problem with the database. Sample output is shown as below: CHECKDB found 0 allocation errors and 2 consistency errors in database ‘DatabaseName’. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DatabaseName). If we see non-zero error then most of the time (not always) we get repair options depending on the level of corruption. There is risk involved with above option (repair_allow_data_loss), that is – we would lose the data. Sometimes the option would be repair_rebuild which is little safer. Though these options are available, it is important to find the root cause to the problem. In standard report, there is a report which can show the history of checkdb executed for the selected database. Since this is a database level report, we need to right click on database, click Reports, click Standard Reports and then choose “Database Consistency History” report. The information in this report is picked from default trace. If default trace is disabled or there is no checkdb run or information is not there in default trace (because it’s rolled over), we would get report like below. As we can see report says it very clearly: Currently, no execution history of CHECKDB is available or default trace is not enabled. To demonstrate, I have caused corruption in one of the database and did below steps. Run CheckDB so that errors are reported. Fix the corruption by losing the data using repair option Run CheckDB again to check if corruption is cleared. After that I have launched the report and below is what we would see. If you are lazy like me and don’t want to run the report manually for each database then below query would be handy to provide same report for all database. This query is runs behind the scenes by the report. All I have done is remove the filter for database name (at the last – highlighted). DECLARE @curr_tracefilename VARCHAR(500); DECLARE @base_tracefilename VARCHAR(500); DECLARE @indx INT; SELECT @curr_tracefilename = path FROM sys.traces WHERE is_default = 1; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SELECT @indx = PATINDEX('%\%', @curr_tracefilename) ; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SET @base_tracefilename = LEFT( @curr_tracefilename,LEN(@curr_tracefilename) - @indx) + '\log.trc'; SELECT SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),36, PATINDEX('%executed%',TEXTData)-36) AS command , LoginName , StartTime , CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%found%',TEXTData) +6,PATINDEX('%errors %',TEXTData)-PATINDEX('%found%',TEXTData)-6)) AS errors , CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%repaired%',TEXTData) +9,PATINDEX('%errors.%',TEXTData)-PATINDEX('%repaired%',TEXTData)-9)) repaired , SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%time:%',TEXTData)+6,PATINDEX('%hours%',TEXTData)-PATINDEX('%time:%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%hours%',TEXTData) +6,PATINDEX('%minutes%',TEXTData)-PATINDEX('%hours%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%minutes%',TEXTData) +8,PATINDEX('%seconds.%',TEXTData)-PATINDEX('%minutes%',TEXTData)-8) AS time FROM::fn_trace_gettable( @base_tracefilename, DEFAULT) WHERE EventClass = 22 AND SUBSTRING(TEXTData,36,12) = 'DBCC CHECKDB' -- AND DatabaseName = @DatabaseName; Don’t get worried about the logic above. All it is doing is reading the trace files, parsing below entry and getting out information for underlined words. DBCC CHECKDB (CorruptedDatabase) executed by sa found 2 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds. Internal database snapshot has split point LSN = 00000029:00000030:0001 and first LSN = 00000029:00000020:0001. Hopefully now onwards you would run checkdb and understand the importance of it. As responsible DBAs I am sure you are already doing it, let me know how often do you actually run them on you production environment? Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Number-Crunching with SQL Server – Exceed the Functionality of Excel

- by Pinal Dave

Imagine this. Your users have developed an Excel spreadsheet that extracts data from your SQL Server database, manipulates that data through the use of Excel formulas and, possibly, some VBA code which is then used to calculate P&L, hedging requirements or even risk numbers. Management comes to you and tells you that they need to get rid of the spreadsheet and that the results of the spreadsheet calculations need to be persisted on the database. SQL Server has a very small set of functions for analyzing data. Excel has hundreds of functions for analyzing data, with many of them focused on specific financial and statistical calculations. Is it even remotely possible that you can use SQL Server to replace the complex calculations being done in a spreadsheet? Westclintech has developed a library of functions that match or exceed the functionality of Excel’s functions and contains many functions that are not available in EXCEL. Their XLeratorDB library of functions contains over 700 functions that can be incorporated into T-SQL statements. XLeratorDB takes advantage of the SQL CLR architecture introduced in SQL Server 2005. SQL CLR permits managed code to be compiled into the database and run alongside built-in SQL Server functions like COUNT or SUM. The Westclintech developers have taken advantage of this architecture to bring robust analytical functions to the database. In our hypothetical spreadsheet, let’s assume that our users are using the YIELD function and that the data are extracted from a table in our database called BONDS. Here’s what the spreadsheet might look like. We go to column G and see that it contains the following formula. Obviously, SQL Server does not offer a native YIELD function. However, with XLeratorDB we can replicate this calculation in SQL Server with the following statement: SELECT *, wct.YIELD(CAST(GETDATE() AS date),Maturity,Rate,Price,100,Frequency,Basis) AS YIELD FROM BONDS This produces the following result. This illustrates one of the best features about XLeratorDB; it is so easy to use. Since I knew that the spreadsheet was using the YIELD function I could use the same function with the same calling structure to do the calculation in SQL Server. I didn’t need to know anything at all about the mechanics of calculating the yield on a bond. It was pretty close to cut and paste. In fact, that’s one way to construct the SQL. Just copy the function call from the cell in the spreadsheet and paste it into SMS and change the cell references to column names. I built the SQL for this query by starting with this. SELECT * ,YIELD(TODAY(),B2,C2,D2,100,E2,F2) FROM BONDS I then changed the cell references to column names. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) ,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Finally, I replicated the TODAY() function using GETDATE() and added the schema name to the function name. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) --,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) ,wct.YIELD(GETDATE(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Then I am able to execute the statement returning the results seen above. The XLeratorDB libraries are heavy on financial, statistical, and mathematical functions. Where there is an analog to an Excel function, the XLeratorDB function uses the same naming conventions and calling structure as the Excel function, but there are also hundreds of additional functions for SQL Server that are not found in Excel. You can find the functions by opening Object Explorer in SQL Server Management Studio (SSMS) and expanding the Programmability folder under the database where the functions have been installed. The Functions folder expands to show 3 sub-folders: Table-valued Functions; Scalar-valued functions, Aggregate Functions, and System Functions. You can expand any of the first three folders to see the XLeratorDB functions. Since the wct.YIELD function is a scalar function, we will open the Scalar-valued Functions folder, scroll down to the wct.YIELD function and and click the plus sign (+) to display the input parameters. The functions are also Intellisense-enabled, with the input parameters displayed directly in the query tab. The Westclintech website contains documentation for all the functions including examples that can be copied directly into a query window and executed. There are also more one hundred articles on the site which go into more detail about how some of the functions work and demonstrate some of the extensive business processes that can be done in SQL Server using XLeratorDB functions and some T-SQL. XLeratorDB is organized into libraries: finance, statistics; math; strings; engineering; and financial options. There is also a windowing library for SQL Server 2005, 2008, and 2012 which provides functions for calculating things like running and moving averages (which were introduced in SQL Server 2012), FIFO inventory calculations, financial ratios and more, without having to use triangular joins. To get started you can download the XLeratorDB 15-day free trial from the Westclintech web site. It is a fully-functioning, unrestricted version of the software. If you need more than 15 days to evaluate the software, you can simply download another 15-day free trial. XLeratorDB is an easy and cost-effective way to start adding sophisticated data analysis to your SQL Server database without having to know anything more than T-SQL. Get XLeratorDB Today and Now! Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Excel

Read the article
Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words - Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – ?Finding Out What Changed in a Deleted Database – Notes from the Field #041

- by Pinal Dave

[Note from Pinal]: This is a 41th episode of Notes from the Field series. The real world is full of challenges. When we are reading theory or book, we sometimes do not realize how real world reacts works and that is why we have the series notes from the field, which is extremely popular with developers and DBA. Let us talk about interesting problem of how to figure out what has changed in the DELETED database. Well, you think I am just throwing the words but in reality this kind of problems are making our DBA’s life interesting and in this blog post we have amazing story from Brian Kelley about the same subject. In this episode of the Notes from the Field series database expert Brian Kelley explains a how to find out what has changed in deleted database. Read the experience of Brian in his own words. Sometimes, one of the hardest questions to answer is, “What changed?” A similar question is, “Did anything change other than what we expected to change?” The First Place to Check – Schema Changes History Report: Pinal has recently written on the Schema Changes History report and its requirement for the Default Trace to be enabled. This is always the first place I look when I am trying to answer these questions. There are a couple of obvious limitations with the Schema Changes History report. First, while it reports what changed, when it changed, and who changed it, other than the base DDL operation (CREATE, ALTER, DELETE), it does not present what the changes actually were. This is not something covered by the default trace. Second, the default trace has a fixed size. When it hits that size, the changes begin to overwrite. As a result, if you wait too long, especially on a busy database server, you may find your changes rolled off. But the Database Has Been Deleted! Pinal cited another issue, and that’s the inability to run the Schema Changes History report if the database has been dropped. Thankfully, all is not lost. One thing to remember is that the Schema Changes History report is ultimately driven by the Default Trace. As you may have guess, it’s a trace, like any other database trace. And the Default Trace does write to disk. The trace files are written to the defined LOG directory for that SQL Server instance and have a prefix of log_: Therefore, you can read the trace files like any other. Tip: Copy the files to a working directory. Otherwise, you may occasionally receive a file in use error. With the Default Trace files, if you ask the question early enough, you can see the information for a deleted database just the same as any other database. Testing with a Deleted Database: Here’s a short script that will create a database, create a schema, create an object, and then drop the database. Without the database, you can’t do a standard Schema Changes History report. CREATE DATABASE DeleteMe; GO USE DeleteMe; GO CREATE SCHEMA Test AUTHORIZATION dbo; GO CREATE TABLE Test.Foo (FooID INT); GO USE MASTER; GO DROP DATABASE DeleteMe; GO This sets up the perfect situation where we can’t retrieve the information using the Schema Changes History report but where it’s still available. Finding the Information: I’ve sorted the columns so I can see the Event Subclass, the Start Time, the Database Name, the Object Name, and the Object Type at the front, but otherwise, I’m just looking at the trace files using SQL Profiler. As you can see, the information is definitely there: Therefore, even in the case of a dropped/deleted database, you can still determine who did what and when. You can even determine who dropped the database (loginame is captured). The key is to get the default trace files in a timely manner in order to extract the information. If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Security, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Buzz Words: What is Hadoop – Day 6 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is NoSQL. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – Hadoop. What is Hadoop? Apache Hadoop is an open-source, free and Java based software framework offers a powerful distributed platform to store and manage Big Data. It is licensed under an Apache V2 license. It runs applications on large clusters of commodity hardware and it processes thousands of terabytes of data on thousands of the nodes. Hadoop is inspired from Google’s MapReduce and Google File System (GFS) papers. The major advantage of Hadoop framework is that it provides reliability and high availability. What are the core components of Hadoop? There are two major components of the Hadoop framework and both fo them does two of the important task for it. Hadoop MapReduce is the method to split a larger data problem into smaller chunk and distribute it to many different commodity servers. Each server have their own set of resources and they have processed them locally. Once the commodity server has processed the data they send it back collectively to main server. This is effectively a process where we process large data effectively and efficiently. (We will understand this in tomorrow’s blog post). Hadoop Distributed File System (HDFS) is a virtual file system. There is a big difference between any other file system and Hadoop. When we move a file on HDFS, it is automatically split into many small pieces. These small chunks of the file are replicated and stored on other servers (usually 3) for the fault tolerance or high availability. (We will understand this in the day after tomorrow’s blog post). Besides above two core components Hadoop project also contains following modules as well. Hadoop Common: Common utilities for the other Hadoop modules Hadoop Yarn: A framework for job scheduling and cluster resource management There are a few other projects (like Pig, Hive) related to above Hadoop as well which we will gradually explore in later blog posts. A Multi-node Hadoop Cluster Architecture Now let us quickly see the architecture of the a multi-node Hadoop cluster. A small Hadoop cluster includes a single master node and multiple worker or slave node. As discussed earlier, the entire cluster contains two layers. One of the layer of MapReduce Layer and another is of HDFC Layer. Each of these layer have its own relevant component. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. A slave or worker node consists of a DataNode and TaskTracker. It is also possible that slave node or worker node is only data or compute node. The matter of the fact that is the key feature of the Hadoop. In this introductory blog post we will stop here while describing the architecture of Hadoop. In a future blog post of this 31 day series we will explore various components of Hadoop Architecture in Detail. Why Use Hadoop? There are many advantages of using Hadoop. Let me quickly list them over here: Robust and Scalable – We can add new nodes as needed as well modify them. Affordable and Cost Effective – We do not need any special hardware for running Hadoop. We can just use commodity server. Adaptive and Flexible – Hadoop is built keeping in mind that it will handle structured and unstructured data. Highly Available and Fault Tolerant – When a node fails, the Hadoop framework automatically fails over to another node. Why Hadoop is named as Hadoop? In year 2005 Hadoop was created by Doug Cutting and Mike Cafarella while working at Yahoo. Doug Cutting named Hadoop after his son’s toy elephant. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – MapReduce. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQLAuthority News – #TechEdIn – TechEd India 2012 Memories and Photos

- by pinaldave

TechEd India 2012 was held in Bangalore last March 21 to 23, 2012. Just like every year, this event is bigger, grander and inspiring. Pinal Dave at TechEd India 2012 Family Event Every single year, TechEd is a special affair for my entire family. Four months before the start of TechEd, I usually start to build the mental image of the event. I start to think about various things. For the most part, what excites me most is presenting a session and meeting friends. Seriously, I start thinking about presenting my session 4 months earlier than the event! I work on my presentation day and night. I want to make sure that what I present is accurate and that I have experienced it firsthand. My wife and my daughter also contribute to my efforts. For us, TechEd is a family event, and the two of them feel equally responsible as well. They give up their family time so I can bring out the best content for the Community. Pinal, Shaivi and Nupur at TechEd India 2012 Guinea Pigs (My Experiment Victims) I do not rehearse my session, ever. However, I test my demo almost every single day till the last moment that I have to present it already. I sometimes go over the demo more than 2-3 times a day even though the event is more than a month away. I have two “guinea pigs”: 1) Nupur Dave and 2) Vinod Kumar. When I am at home, I present my demos to my wife Nupur. At times I feel that people often backup their demo, but in my case, I have backup demo presenters. In the office during lunch time, I present the demos to Vinod. I am sure he can walk my demos easily with eyes closed. Pinal and Vinod at TechEd India 2012 My Sessions I’ve been determined to present my sessions in a real and practical manner. I prefer to present the subject that I myself would be eager to attend to and sit through if I were an audience. Just keeping that principle in mind, I have created two sessions this year. SQL Server Misconception and Resolution Pinal and Vinod at TechEd India 2012 We believe all kinds of stuff – that the earth is flat, or that the forbidden fruit is apple, or that the big bang theory explains the origin of the universe, and so many other things. Just like these, we have plenty of misconceptions in SQL Server as well. I have had this dream of co-presenting a session with Vinod Kumar for the past 3 years. I have been asking him every year if we could present a session together, but we never got it to work out, until this year came. Fortunately, we got a chance to stand on the same stage and present a single subject. I believe that Vinod Kumar and I have an excellent synergy when we are working together. We know each other’s strengths and weakness. We know when the other person will speak and when he will keep quiet. The reason behind this synergy is that we have worked on 2 Video Learning Courses (SQL Server Indexes and SQL Server Questions and Answers) and authored 1 book (SQL Server Questions and Answers) together. Crowd Outside Session Hall This session was inspired from the “Laurel and Hardy” show so we performed a role-playing of those famous characters. We had an excellent time at the stage and, for sure, the audience had a wonderful time, too. We had an extremely large audience for this session and had a great time interacting with them. Speed Up! – Parallel Processes and Unparalleled Performance Pinal Dave at TechEd India 2012 I wanted to approach this session at level 400 and I was very determined to do so. The biggest challenge I had was that this was a total of 60 minutes of session and the audience profile was very generic. I had to present at level 100 as well at 400. I worked hard to tune up these demos. I wanted to make sure that my messages would land perfectly to the minds of the attendees, and when they walk out of the session, they could use the knowledge I shared on their servers. After the session, I felt an extreme satisfaction as I received lots of positive feedback at the event. At one point, so many people rushed towards me that I was a bit scared that the stage might break and someone would get injured. Fortunately, nothing like that happened and I was able to shake hands with everybody. Pinal Dave at TechEd India 2012 Crowd rushing to Pinal at TechEd India 2012 Networking This is one of the primary reasons many of us visit the annual TechEd event. I had a fantastic time meeting SQL Server enthusiasts. Well, it was a terrific time meeting old friends, user group members, MVPs and SQL Enthusiasts. I have taken many photographs with lots of people, but I have received a very few back. If you are reading this blog and have a photo of us at the event, would you please send it to me so I could keep it in my memory lane? SQL Track Speaker: Jacob and Pinal at TechEd India 2012 SQL Community: Pinal, Tejas, Nakul, Jacob, Balmukund, Manas, Sudeepta, Sahal at TechEd India 2012 Star Speakers: Amit and Balmukund at TechEd India 2012 TechED Rockstars: Nakul, Tejas and Pinal at TechEd India 2012 I guess TechEd is a mix of family affair and culture for me! Hamara TechEd (Our TechEd) Please tell me which photo you like the most! Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLAuthority News, SQLServer, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article
SQL SERVER – What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1

- by Pinal Dave

This is the first part of the series Incremental Statistics. Here is the index of the complete series. What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1 Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2 DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3 Statistics are considered one of the most important aspects of SQL Server Performance Tuning. You might have often heard the phrase, with related to performance tuning. “Update Statistics before you take any other steps to tune performance”. Honestly, I have said above statement many times and many times, I have personally updated statistics before I start to do any performance tuning exercise. You may agree or disagree to the point, but there is no denial that Statistics play an extremely vital role in the performance tuning. SQL Server 2014 has a new feature called Incremental Statistics. I have been playing with this feature for quite a while and I find that very interesting. After spending some time with this feature, I decided to write about this subject over here. New in SQL Server 2014 – Incremental Statistics Well, it seems like lots of people wants to start using SQL Server 2014′s new feature of Incremetnal Statistics. However, let us understand what actually this feature does and how it can help. I will try to simplify this feature first before I start working on the demo code. Code for all versions of SQL Server Here is the code which you can execute on all versions of SQL Server and it will update the statistics of your table. The keyword which you should pay attention is WITH FULLSCAN. It will scan the entire table and build brand new statistics for you which your SQL Server Performance Tuning engine can use for better estimation of your execution plan. UPDATE STATISTICS TableName(StatisticsName) WITH FULLSCAN Who should learn about this? Why? If you are using partitions in your database, you should consider about implementing this feature. Otherwise, this feature is pretty much not applicable to you. Well, if you are using single partition and your table data is in a single place, you still have to update your statistics the same way you have been doing. If you are using multiple partitions, this may be a very useful feature for you. In most cases, users have multiple partitions because they have lots of data in their table. Each partition will have data which belongs to itself. Now it is very common that each partition are populated separately in SQL Server. Real World Example For example, if your table contains data which is related to sales, you will have plenty of entries in your table. It will be a good idea to divide the partition into multiple filegroups for example, you can divide this table into 3 semesters or 4 quarters or even 12 months. Let us assume that we have divided our table into 12 different partitions. Now for the month of January, our first partition will be populated and for the month of February our second partition will be populated. Now assume, that you have plenty of the data in your first and second partition. Now the month of March has just started and your third partition has started to populate. Due to some reason, if you want to update your statistics, what will you do? In SQL Server 2012 and earlier version You will just use the code of WITH FULLSCAN and update the entire table. That means even though you have only data in third partition you will still update the entire table. This will be VERY resource intensive process as you will be updating the statistics of the partition 1 and 2 where data has not changed at all. In SQL Server 2014 You will just update the partition of Partition 3. There is a special syntax where you can now specify which partition you want to update now. The impact of this is that it is smartly merging the new data with old statistics and update the entire statistics without doing FULLSCAN of your entire table. This has a huge impact on performance. Remember that the new feature in SQL Server 2014 does not change anything besides the capability to update a single partition. However, there is one feature which is indeed attractive. Previously, when table data were changed 20% at that time, statistics update were triggered. However, now the same threshold is applicable to a single partition. That means if your partition faces 20% data, change it will also trigger partition level statistics update which, when merged to your final statistics will give you better performance. In summary If you are not using a partition, this feature is not applicable to you. If you are using a partition, this feature can be very helpful to you. Tomorrow: We will see working code of SQL Server 2014 Incremental Statistics. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SQL Statistics, Statistics

Read the article
SQLAuthority News – 7th Anniversary of Blog – A Personal Note

- by Pinal Dave

Special Day Today is a very special day – seven years ago I blogged for the very first time. Seven years ago, I didn’t know what I was doing, I didn’t know how to blog, or even what a blog was or what to write. I was working as a DBA, and I was trying to solve a problem – at my job, there were a few issues I had to fix again and again and again. There were days when I was rewriting the same solution over and over, and there were times when I would get very frustrated because I could not write the same elegant solution that I had written before. I came up with a solution to this problem – posting these solutions online, where I could access them whenever I needed them. At that point, I had no idea what a blog was, or even how the internet worked, I had no idea that a blog would be visible to others. Can you believe it? Google it on Yahoo! After a few posts on this “blog,” there was a surprise for me – an e-mail saying that someone had left me a comment. I was surprised, because I didn’t even know you could comment on a blog! I logged on and read my comment. It said: “I like your script,but there is a small bug. If you could fix it, it will run on multiple other versions of SQL Server.” I was like, “wow, someone figured out how to find my blog, and they figured out how to fix my script!” I found the bug, I fixed the script, and a wrote a thank you note to the guy. My first question for him was: how did you figure it out – not the script, but how to find my blog? He said he found it from Yahoo Search (this was in the time before Google, believe it or not). From that day, my life changed. I wrote a few more posts, I got a few more comments, and I started to watch my traffic. People were reading, commenting, and giving feedback. At the end of the day, people enjoyed what I was writing. This was a fantastic feeling! I never thought I would be writing for others. Even today, I don’t feel like I am writing for others, but that I am simply posting what I am learning every day. From that very first day, I decided that I would not change my intent or my blog’s purpose. 72 Million Views – 2600 Posts – 57000 comments – 10 books – 9 courses Today, this blog is my habit, my addiction, my baby. Every day I try to learn something new, and that lesson gets posted on the blog. Lately there have been days where I am traveling for a full 24 hours, but even on those days I try to learn something new, and later when I have free time, I will still post it to the blog. Because of this habit, this blog has over 72 millions views, I have written more than 2600 posts, and there are 57,000 comments and counting. I have also written 10 books, 9 courses, and learned so many things. This blog has given me back so much more than I ever put it into it. It gave me an education, a reason to learn something new every day, and a way to connect to people. I like to think of it as a learning chain, a relay where we all pass knowledge from one to another. Never Ending Journey When I started the blog, I thought I would write for a few days and stop, but now after seven years I haven’t stopped and I have no intention of stopping! However, change happens, and for this blog it will start today. This blog started as a single resource for SQL Server, but now it has grown beyond, to Sharepoint, Personal Development, Developer Training, MySQL, Big Data, and lots of other things. Truly speaking, this blog is more than just SQL Server, and that was always my intention. I named it “SQL Authority,” not “SQL Server Authority”! Loudly and clearly, I would like to announce that I am going to go back to my roots and start writing more about SQL, more about big data, and more about the other technology like relational databases, MySQL, Oracle, and others. My goal is not to become a comprehensive resource for every technology, my goal is to learn something new every day – and now it can be so much more than just SQL Server. I will learn it, and post it here for you. I have written a very long post on this anniversary, but here is the summary: Thank You. You all have been wonderful. Seven years is a long journey, and it makes me emotional. I have been “with” this blog before I met my wife, before we had our daughter. This blog is like a fourth member of the family. Keep reading, keep commenting, keep supporting. Thank you all. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: About Me, MySQL, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority News, T SQL

Read the article
SQL SERVER – Weekly Series – Memory Lane – #053 – Final Post in Series

- by Pinal Dave

It has been a fantastic journey to write memory lane series for an entire year. This series gave me the opportunity to go back and see what I have contributed to this blog throughout the last 7 years. This was indeed fantastic series as this provided me the opportunity to witness how technology has grown throughout the year and how I have progressed in my career while writing this blog post. This series was indeed fantastic experience readers as many joined during the last few years and were not sure what they have missed in recent years. Let us continue with the final episode of the Memory Lane Series. Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Get Current User – Get Logged In User Here is the straight script which list logged in SQL Server users. Disable All Triggers on a Database – Disable All Triggers on All Servers Question : How to disable all the triggers for a database? Additionally, how to disable all the triggers for all servers? For answer execute the script in the blog post. Importance of Master Database for SQL Server Startup I have received following questions many times. I will list all the questions here and answer them together. What is the purpose of Master database? Should our backup Master database? Which database is must have database for SQL Server for startup? Which are the default system database created when SQL Server 2005 is installed for the first time? What happens if Master database is corrupted? Answers to all of the questions are very much related. 2008 DECLARE Multiple Variables in One Statement SQL Server is a great product and it has many features which are very unique to SQL Server. Regarding feature of SQL Server where multiple variable can be declared in one statement, it is absolutely possible to do. 2009 How to Enable Index – How to Disable Index – Incorrect syntax near ‘ENABLE’ Many times I have seen that the index is disabled when there is a large update operation on the table. Bulk insert of very large file updates in any table using SSIS is usually preceded by disabling the index and followed by enabling the index. I have seen many developers running the following query to disable the index. 2010 List of all the Views from Database Many emails I received suggesting that they have hundreds of the view and now have no clue what is going on and how many of them have indexes and how many does not have an index. Some even asked me if there is any way they can get a list of the views with the property of Index along with it. Here is the quick script which does exactly the same. You can also include many other columns from the same view. Minimum Maximum Memory – Server Memory Options I was recently reading about SQL Server Memory Options over here. While reading this one line really caught my attention is minimum value allowed for maximum memory options. The default setting for min server memory is 0, and the default setting for max server memory is 2147483647. The minimum amount of memory you can specify for max server memory is 16 megabytes (MB). 2011 Fundamentals of Columnstore Index There are two kinds of storage in a database. Row Store and Column Store. Row store does exactly as the name suggests – stores rows of data on a page – and column store stores all the data in a column on the same page. These columns are much easier to search – instead of a query searching all the data in an entire row whether the data are relevant or not, column store queries need only to search a much lesser number of the columns. How to Ignore Columnstore Index Usage in Query In summary the question in simple words “How can we ignore using the column store index in selective queries?” Very interesting question – you can use I can understand there may be the cases when the column store index is not ideal and needs to be ignored the same. You can use the query hint IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX to ignore the column store index. The SQL Server Engine will use any other index which is best after ignoring the column store index. 2012 Storing Variable Values in Temporary Array or Temporary List SQL Server does not support arrays or a dynamic length storage mechanism like list. Absolutely there are some clever workarounds and few extra-ordinary solutions but everybody can;t come up with such solution. Additionally, sometime the requirements are very simple that doing extraordinary coding is not required. Here is the simple case. Move Database Files MDF and LDF to Another Location It is not common to keep the Database on the same location where OS is installed. Usually Database files are in SAN, Separate Disk Array or on SSDs. This is done usually for performance reason and manageability perspective. Now the challenges comes up when database which was installed at not preferred default location and needs to move to a different location. Here is the quick tutorial how you can do it. UNION ALL and ORDER BY – How to Order Table Separately While Using UNION ALL If your requirement is such that you want your top and bottom query of the UNION resultset independently sorted but in the same result set you can add an additional static column and order by that column. Let us re-create the same scenario. Copy Data from One Table to Another Table – SQL in Sixty Seconds #031 – Video http://www.youtube.com/watch?v=FVWIA-ACMNo Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Professional Development – Difference Between Bio, CV and Resume

- by Pinal Dave

Applying for work can be very stressful – you want to put your best foot forward, and it can be very hard to sell yourself to a potential employer while highlighting your best characteristics and answering questions. On top of that, some jobs require different application materials – a biography (or bio), a curriculum vitae (or CV), or a resume. These things seem so interchangeable, so what is the difference? Let’s start with the one most of us have heard of – the resume. A resume is a summary of your job and education history. If you have ever applied for a job, you will have used a resume. The ability to write a good resume that highlights your best characteristics and emphasizes your qualifications for a specific job is a skill that will take you a long way in the world. For such an essential skill, unfortunately it is one that many people struggle with. RESUME So let’s discuss what makes a great resume. First, make sure that your name and contact information are at the top, in large print (slightly larger font than the rest of the text, size 14 or 16 if the rest is size 12, for example). You need to make sure that if you catch the recruiter’s attention and they know how to get a hold of you. As for qualifications, be quick and to the point. Make your job title and the company the headline, and include your skills, accomplishments, and qualifications as bullet points. Use good action verbs, like “finished,” “arranged,” “solved,” and “completed.” Include hard numbers – don’t just say you “changed the filing system,” say that you “revolutionized the storage of over 250 files in less than five days.” Doesn’t that sentence sound much more powerful? Curriculum Vitae (CV) Now let’s talk about curriculum vitae, or “CVs”. A CV is more like an expanded resume. The same rules are still true: put your name front and center, keep your contact info up to date, and summarize your skills with bullet points. However, CVs are often required in more technical fields – like science, engineering, and computer science. This means that you need to really highlight your education and technical skills. Difference between Resume and CV Resumes are expected to be one or two pages long – CVs can be as many pages as necessary. If you are one of those people lucky enough to feel limited by the size constraint of resumes, a CV is for you! On a CV you can expand on your projects, highlight really exciting accomplishments, and include more educational experience – including GPA and test scores from the GRE or MCAT (as applicable). You can also include awards, associations, teaching and research experience, and certifications. A CV is a place to really expand on all your experience and how great you will be in this particular position. Biography (Bio) Chances are, you already know what a bio is, and you have even read a few of them. Think about the one or two paragraphs that every author includes in the back flap of a book. Think about the sentences under a blogger’s photo on every “About Me” page. That is a bio. It is a way to quickly highlight your life experiences. It is essentially the way you would introduce yourself at a party. Where a bio is required for a job, chances are they won’t want to know about where you were born and how many pets you have, though. This is a way to summarize your entire job history in quick-to-read format – and sometimes during a job hunt, being able to get to the point and grab the recruiter’s interest is the best way to get your foot in the door. Think of a bio as your entire resume put into words. Most bios have a standard format. In paragraph one, talk about your most recent position and accomplishments there, specifically how they relate to the job you are applying for. If you have teaching or research experience, training experience, certifications, or management experience, talk about them in paragraph two. Paragraph three and four are for highlighting publications, education, certifications, associations, etc. To wrap up your bio, provide your contact info and availability (dates and times). Where to use What? For most positions, you will know exactly what kind of application to use, because the job announcement will state what materials are needed – resume, CV, bio, cover letter, skill set, etc. If there is any confusion, choose whatever the industry standard is (CV for technical fields, resume for everything else) or choose which of your documents is the strongest. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: About Me, PostADay, Professional Development, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Buzz Words: What is HDFS – Day 8 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is MapReduce. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – HDFS. What is HDFS ? HDFS stands for Hadoop Distributed File System and it is a primary storage system used by Hadoop. It provides high performance access to data across Hadoop clusters. It is usually deployed on low-cost commodity hardware. In commodity hardware deployment server failures are very common. Due to the same reason HDFS is built to have high fault tolerance. The data transfer rate between compute nodes in HDFS is very high, which leads to reduced risk of failure. HDFS creates smaller pieces of the big data and distributes it on different nodes. It also copies each smaller piece to multiple times on different nodes. Hence when any node with the data crashes the system is automatically able to use the data from a different node and continue the process. This is the key feature of the HDFS system. Architecture of HDFS The architecture of the HDFS is master/slave architecture. An HDFS cluster always consists of single NameNode. This single NameNode is a master server and it manages the file system as well regulates access to various files. In additional to NameNode there are multiple DataNodes. There is always one DataNode for each data server. In HDFS a big file is split into one or more blocks and those blocks are stored in a set of DataNodes. The primary task of the NameNode is to open, close or rename files and directory and regulate access to the file system, whereas the primary task of the DataNode is read and write to the file systems. DataNode is also responsible for the creation, deletion or replication of the data based on the instruction from NameNode. In reality, NameNode and DataNode are software designed to run on commodity machine build in Java language. Visual Representation of HDFS Architecture Let us understand how HDFS works with the help of the diagram. Client APP or HDFS Client connects to NameSpace as well as DataNode. Client App access to the DataNode is regulated by NameSpace Node. NameSpace Node allows Client App to connect to the DataNode based by allowing the connection to the DataNode directly. A big data file is divided into multiple data blocks (let us assume that those data chunks are A,B,C and D. Client App will later on write data blocks directly to the DataNode. Client App does not have to directly write to all the node. It just has to write to any one of the node and NameNode will decide on which other DataNode it will have to replicate the data. In our example Client App directly writes to DataNode 1 and detained 3. However, data chunks are automatically replicated to other nodes. All the information like in which DataNode which data block is placed is written back to NameNode. High Availability During Disaster Now as multiple DataNode have same data blocks in the case of any DataNode which faces the disaster, the entire process will continue as other DataNode will assume the role to serve the specific data block which was on the failed node. This system provides very high tolerance to disaster and provides high availability. If you notice there is only single NameNode in our architecture. If that node fails our entire Hadoop Application will stop performing as it is a single node where we store all the metadata. As this node is very critical, it is usually replicated on another clustered as well as on another data rack. Though, that replicated node is not operational in architecture, it has all the necessary data to perform the task of the NameNode in the case of the NameNode fails. The entire Hadoop architecture is built to function smoothly even there are node failures or hardware malfunction. It is built on the simple concept that data is so big it is impossible to have come up with a single piece of the hardware which can manage it properly. We need lots of commodity (cheap) hardware to manage our big data and hardware failure is part of the commodity servers. To reduce the impact of hardware failure Hadoop architecture is built to overcome the limitation of the non-functioning hardware. Tomorrow In tomorrow’s blog post we will discuss the importance of the relational database in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Search Results

Search found 2468 results on 99 pages for 'dave johansen'.

Page 21/99 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

< Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >