Search Results

Search found 1456 results on 59 pages for 'authority'.

Page 29/59 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36 | Next Page >

Developer Training – Difficult Questions and Alternative Perspective – Part 3

- by pinaldave

Developer Training - Importance and Significance - Part 1 Developer Training – Employee Morals and Ethics – Part 2 Developer Training – Difficult Questions and Alternative Perspective - Part 3 Developer Training – Various Options for Developer Training – Part 4 Developer Training – A Conclusive Summary- Part 5 Congratulations! You are now a fully trained developer! You spent hours in a classroom, watching webinars, and reading materials. You are now more educated and more prepared than ever before. Now what? Stay or Quit The simple answer is that you now have two options – stay where you are or move on to a new job. Even though you might now be smarter than you have ever felt before, this can still be a tough decision to make. You feel extra trained and ready for a promotion or a raise, but you and your employer might not see eye to eye on this issue. The logical conclusion is to go on a job hunt, but that might not be the most ethical thing to do. Click Image to Enlarge Manager’s Perspective Click Image to Enlarge Try to see the issue from your manager’s perspective. You feel that you have just spent a lot of time and energy getting trained, and you should be rewarded. But they have invested their time and energy in you. They might see the training as a way to help you complete the goals they require from you, or as a way to help you complete tasks that will ultimately end in a reward or promotion. Moral Compass As in most cases, honesty is the best policy. Be open with your manager about your expectations, and ask them to explain their goals. When there is open and honest communication, everyone can walk away happy. If you’re unable to discuss with your manager for one reason or another, just try to keep the company policy in mind and follow your own moral compass. If all else fails, and your company is unwilling to make allowances for your new value, offer to pay the company back for the training before moving on your way. Whether you stay at your old job or move on to a new one, you are still faced with the question of what you’re going to do with all your new knowledge. If you feel comfortable, offer to train others around you who are interested in the same subject. This can look very good on your resume, and if you are working in a team environment it is sure to help you in the long run! What Next? You can even offer to train other trainers at the company – managers, those above you, or even report back to your original trainer about how your education is helping you in the work place. Obviously this should be completely voluntary on the trainer’s part. Taking advice from a “newbie” may not be their favorite idea, but it could also show the company that you are open to expanding your horizons and being helpful to everyone around you. Last in Line for Opportunity Click Image to Enlarge At this time, let us address a subject related to training and what to do with it – what if you are always overlooked for training? This can as thorny a problem as receiving training in the first place. The best advice is to let your supervisors know that you are always open to training and very interested in certain topics. If you are consistently passed over, be patient. Your turn will probably come, but the company as a whole has to focus on other problems at the moment. If you feel that there are more personal issues at play, be sure to bring this up with your supervisor in a calm and professional manager so that everything can be worked out best for both parties. You, Yourself and Your Future! If all else fails, offer to pay for training yourself. Perhaps money problems are at the root of being passed over. Even if there are other reasons, offering to pay your own way shows your dedication and could work out well for you in the long run. Always remember – in life you have to go out and make your own way, you cannot always sit and wait for things to land in your lap. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Developer Training, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – An Efficiency Tool to Compare and Synchronize SQL Server Databases

- by Pinal Dave

There is no need to reinvent the wheel if it is already invented and if the wheel is already available at ease, there is no need to wait to grab it. Here is the similar situation. I came across a very interesting situation and I had to look for an efficient tool which can make my life easier and solve my business problem. Here is the scenario. One of the developers had deleted few rows from the very important mapping table of our development server (thankfully, it was not the production server). Though it was a development server, the entire development team had to stop working as the application started to crash on every page. Think about the lost of manpower and efficiency which we started to loose. Pretty much every department had to stop working as our internal development application stopped working. Thankfully, we even take a backup of our development server and we had access to full backup of the entire database at 6 AM morning. We do not take as a frequent backup of development server as production server (naturally!). Even though we had a full backup, the solution was not to restore the database. Think about it, there were plenty of the other operations since the last good full backup and if we restore a full backup, we will pretty much overwrite on the top of the work done by developers since morning. Now, as restoring the full backup was not an option we decided to restore the same database on another server. Once we had restored our database to another server, the challenge was to compare the table from where the database was deleted. The mapping table from where the data were deleted contained over 5000 rows and it was humanly impossible to compare both the tables manually. Finally we decided to use efficiency tool dbForge Data Compare for SQL Server from DevArt. dbForge Data Compare for SQL Server is a powerful, fast and easy to use SQL compare tool, capable of using native SQL Server backups as metadata source. (FYI we Downloaded dbForge Data Compare) Once we discovered the product, we immediately downloaded the product and installed on our development server. After we installed the product, we were greeted with the following screen. We clicked on the New Data Comparision to start our new comparison project. It brought up following screen. Here is the best part of the product, we just had to enter our database connection username and password along with source and destination details and we are done. The entire process is very simple and self intuiting. The best part was that for the source, we can either select database or even backup. This was indeed fantastic feature. Think about this, if you have a very big database, it will take long time to restore on the server. Once it is restored, you will be able to work with it. However, when you are working with dbForge Data Compare it will accept database backup as your source or destination. Once I click on the execute it brought up following screen where it displayed an excellent summary of the data compare. It has dedicated tabs for the what is changing in what table as well had details of the changed data. The best part is that, once we had reviewed the change. We click on the Synchronize button in the menu bar and it brought up following screen. You can see that the screen has very simple straight forward but very powerful features. You can generate a script to synchronize from target to source or even from source to target. Additionally, the database is a very complicated world and there are extensive options to configure various database options on the next screen. We also have the option to either generate script or directly execute the script to target server. I like to play on the safe side and I generated the script for my synchronization and later on after review I deployed the scripts on the server. Well, my team and we were able to get going from our disaster in less than 10 minutes. There were few people in our team were indeed disappointed as they were thinking of going home early that day but in less than 10 minutes they had to get back to work. There are so many other features in dbForge Data Compare for SQL Server, I am already planning to make this product company wide recommended product for Data Compare tool. Hats off to the team who have build this product. Here are few of the features salient features of the dbForge Data Compare for SQL Server Perform SQL Server database comparison to detect changes Compare SQL Server backups with live databases Analyze data differences between two databases Synchronize two databases that went out of sync Restore data of a particular table from the backup Generate data comparison reports in Excel and HTML formats Copy look-up data from development database to production Automate routine data synchronization tasks with command-line interface Go Ahead and Download the dbForge Data Compare for SQL Server right away. It is always a good idea to get familiar with the important tools before hand instead of learning it under pressure of disaster. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article
SQL SERVER – Fundamentals of Columnstore Index

- by pinaldave

There are two kind of storage in database. Row Store and Column Store. Row store does exactly as the name suggests – stores rows of data on a page – and column store stores all the data in a column on the same page. These columns are much easier to search – instead of a query searching all the data in an entire row whether the data is relevant or not, column store queries need only to search much lesser number of the columns. This means major increases in search speed and hard drive use. Additionally, the column store indexes are heavily compressed, which translates to even greater memory and faster searches. I am sure this looks very exciting and it does not mean that you convert every single index from row store to column store index. One has to understand the proper places where to use row store or column store indexes. Let us understand in this article what is the difference in Columnstore type of index. Column store indexes are run by Microsoft’s VertiPaq technology. However, all you really need to know is that this method of storing data is columns on a single page is much faster and more efficient. Creating a column store index is very easy, and you don’t have to learn new syntax to create them. You just need to specify the keyword “COLUMNSTORE” and enter the data as you normally would. Keep in mind that once you add a column store to a table, though, you cannot delete, insert or update the data – it is READ ONLY. However, since column store will be mainly used for data warehousing, this should not be a big problem. You can always use partitioning to avoid rebuilding the index. A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. The difference between column store and row store approaches is illustrated below: In case of the row store indexes multiple pages will contain multiple rows of the columns spanning across multiple pages. In case of column store indexes multiple pages will contain multiple single columns. This will lead only the columns needed to solve a query will be fetched from disk. Additionally there is good chance that there will be redundant data in a single column which will further help to compress the data, this will have positive effect on buffer hit rate as most of the data will be in memory and due to same it will not need to be retrieved. Let us see small example of how columnstore index improves the performance of the query on a large table. As a first step let us create databaseset which is large enough to show performance impact of columnstore index. The time taken to create sample database may vary on different computer based on the resources. USE AdventureWorks GO -- Create New Table CREATE TABLE [dbo].[MySalesOrderDetail]( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [LineTotal] [numeric](38, 6) NOT NULL, [rowguid] [uniqueidentifier] NOT NULL, [ModifiedDate] [datetime] NOT NULL ) ON [PRIMARY] GO -- Create clustered index CREATE CLUSTERED INDEX [CL_MySalesOrderDetail] ON [dbo].[MySalesOrderDetail] ( [SalesOrderDetailID]) GO -- Create Sample Data Table -- WARNING: This Query may run upto 2-10 minutes based on your systems resources INSERT INTO [dbo].[MySalesOrderDetail] SELECT S1.* FROM Sales.SalesOrderDetail S1 GO 100 Now let us do quick performance test. I have kept STATISTICS IO ON for measuring how much IO following queries take. In my test first I will run query which will use regular index. We will note the IO usage of the query. After that we will create columnstore index and will measure the IO of the same. -- Performance Test -- Comparing Regular Index with ColumnStore Index USE AdventureWorks GO SET STATISTICS IO ON GO -- Select Table with regular Index SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM [dbo].[MySalesOrderDetail] GROUP BY ProductID ORDER BY ProductID GO -- Table 'MySalesOrderDetail'. Scan count 1, logical reads 342261, physical reads 0, read-ahead reads 0. -- Create ColumnStore Index CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_MySalesOrderDetail_ColumnStore] ON [MySalesOrderDetail] (UnitPrice, OrderQty, ProductID) GO -- Select Table with Columnstore Index SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM [dbo].[MySalesOrderDetail] GROUP BY ProductID ORDER BY ProductID GO It is very clear from the results that query is performance extremely fast after creating ColumnStore Index. The amount of the pages it has to read to run query is drastically reduced as the column which are needed in the query are stored in the same page and query does not have to go through every single page to read those columns. If we enable execution plan and compare we can see that column store index performance way better than regular index in this case. Let us clean up the database. -- Cleanup DROP INDEX [IX_MySalesOrderDetail_ColumnStore] ON [dbo].[MySalesOrderDetail] GO TRUNCATE TABLE dbo.MySalesOrderDetail GO DROP TABLE dbo.MySalesOrderDetail GO In future posts we will see cases where Columnstore index is not appropriate solution as well few other tricks and tips of the columnstore index. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Index, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQLAuthority News – Storing Data and Files in Cloud – Dropbox – Personal Technology Tip

- by pinaldave

I thought long and hard about doing a Personal Technology Tips series for this blog. I have so many tips I’d like to share. I am on my computer almost all day, every day, so I have a treasure trove of interesting tidbits I like to share if given the chance. The only thing holding me back – which tip to share first? The first tip obviously has the weight of seeming like the most important. But this would mean choosing amongst my favorite tricks and shortcuts. This is a hard task. Source: Dropbox.com My Dropbox I have finally decided, though, and have determined that the first Personal Technology Tip may not be the most secret or even trickier to master – in fact, it is probably the easiest. My today’s Personal Technology Tip is Dropbox. I hope that all of you are nodding along in recognition right now. If you do not use Dropbox, or have not even heard of it before, get on the internet and find their site. You won’t be disappointed. A quick recap for those in the dark: Dropbox is an online storage site with a lot of additional syncing and cloud-computing capabilities. Now that we’ve covered the basics, let’s explore some of my favorite options in Dropbox. Collaborate with All The first thing I love about Dropbox is the ability it gives you to collaborate with others. You can share files easily with other Dropbox users, and they can alter them, share them with you, all while keeping track of different versions in on easy place. I’d like to see anyone try to accomplish that key idea – “easily” – using e-mail versions and multiple computers. It’s even difficult to accomplish using a shared network. Afraid that this kind of ease looks too good to be true? Afraid that maybe there isn’t enough storage space, or the user interface is confusing? Think again. There is plenty of space – you can get 2 GB with just a free account, and upgrades are inexpensive and go up to 100 GB of storage. And the user interface is so easy that anyone can learn to use it. What I use Dropbox for I love Dropbox because I give a lot of presentations and often they are far from home. I can keep my presentations on Dropbox and have easy access to them anywhere, without needing to have my whole computer with me. This is just one small way that you can use Dropbox. You can sync your entire hard drive, or hard drives if you have multiple computers (home, work, office, shared), and you can set Dropbox to automatically sync files on a certain timeline, or whenever Dropbox notices that they’ve been changed. Why I love Dropbox Dropbox has plenty of storage, but 2 GB still has a hard time competing with the average desktop’s storage space. So what if you want to sync most of your files, but only the ones you use the most and share between work and home, and not all your files (especially large files like pictures and videos)? You can use selective sync to choose which files to sync. Above all, my favorite feature is LanSync. Dropbox will search your Local Area Network (LAN) for new files and sync them to Dropbox, as well as downloading the new version to all the shared files across the network. That means that if move around on different computers at work or at home, you will have the same version of the file every time. Or, other users on the LAN will have access to the new version, which makes collaboration extremely easy. Ref: rzfeeser.com Dropbox has so many other features that I feel like I could create a Personal Technology Tips series devoted entirely to Dropbox. I’m going to create a bullet list here to make things shorter, but I strongly encourage you to look further into these into options if it sounds like something you would use. Theft Recover Home Security File Hosting and Sharing Portable Dropbox Sync your iCal calendar Password Storage What is your favorite tool and why? I could go on and on, but I will end here. In summary – I strongly encourage everyone to investigate Dropbox to see if it’s something they would find useful. If you use Dropbox and know of a great feature I failed to mention, please share it with me, I’d love to hear how everyone uses this program. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority News, T SQL, Technology Tagged: Personal Technology

Read the article
SQL SERVER – Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2

- by Pinal Dave

This is the second part of the series Incremental Statistics. Here is the index of the complete series. What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1 Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2 DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3 In part 1 we have understood what is incremental statistics and now in this second part we will see a simple example of incremental statistics. This blog post is heavily inspired from my friend Balmukund’s must read blog post. If you have partitioned table and lots of data, this feature can be specifically very useful. Prerequisite Here are two things you must know before you start with the demonstrations. AdventureWorks – For the demonstration purpose I have installed AdventureWorks 2012 as an AdventureWorks 2014 in this demonstration. Partitions – You should know how partition works with databases. Setup Script Here is the setup script for creating Partition Function, Scheme, and the Table. We will populate the table based on the SalesOrderDetails table from AdventureWorks. -- Use Database USE AdventureWorks2014 GO -- Create Partition Function CREATE PARTITION FUNCTION IncrStatFn (INT) AS RANGE LEFT FOR VALUES (44000, 54000, 64000, 74000) GO -- Create Partition Scheme CREATE PARTITION SCHEME IncrStatSch AS PARTITION [IncrStatFn] TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]) GO -- Create Table Incremental_Statistics CREATE TABLE [IncrStatTab]( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [ModifiedDate] [datetime] NOT NULL) ON IncrStatSch(SalesOrderID) GO -- Populate Table INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate]) SELECT [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE SalesOrderID < 54000 GO Check Details Now we will check details in the partition table IncrStatSch. -- Check the partition SELECT * FROM sys.partitions WHERE OBJECT_ID = OBJECT_ID('IncrStatTab') GO You will notice that only a few of the partition are filled up with data and remaining all the partitions are empty. Now we will create statistics on the Table on the column SalesOrderID. However, here we will keep adding one more keyword which is INCREMENTAL = ON. Please note this is the new keyword and feature added in SQL Server 2014. It did not exist in earlier versions. -- Create Statistics CREATE STATISTICS IncrStat ON [IncrStatTab] (SalesOrderID) WITH FULLSCAN, INCREMENTAL = ON GO Now we have successfully created statistics let us check the statistical histogram of the table. Now let us once again populate the table with more data. This time the data are entered into a different partition than earlier populated partition. -- Populate Table INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate]) SELECT [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount], [ModifiedDate] FROM [Sales].[SalesOrderDetail] WHERE SalesOrderID > 54000 GO Let us check the status of the partition once again with following script. -- Check the partition SELECT * FROM sys.partitions WHERE OBJECT_ID = OBJECT_ID('IncrStatTab') GO Statistics Update Now here has the new feature come into action. Previously, if we have to update the statistics, we will have to FULLSCAN the entire table irrespective of which partition got the data. However, in SQL Server 2014 we can just specify which partition we want to update in terms of Statistics. Here is the script for the same. -- Update Statistics Manually UPDATE STATISTICS IncrStatTab (IncrStat) WITH RESAMPLE ON PARTITIONS(3, 4) GO Now let us check the statistics once again. -- Show Statistics DBCC SHOW_STATISTICS('IncrStatTab', IncrStat) WITH HISTOGRAM GO Upon examining statistics histogram, you will notice that now the distribution has changed and there is way more rows in the histogram. Summary The new feature of Incremental Statistics is indeed a boon for the scenario where there are partitions and statistics needs to be updated frequently on the partitions. In earlier version to update statistics one has to do FULLSCAN on the entire table which was wasting too many resources. With the new feature in SQL Server 2014, now only those partitions which are significantly changed can be specified in the script to update statistics. Cleanup You can clean up the database by executing following scripts. -- Clean up DROP TABLE [IncrStatTab] DROP PARTITION SCHEME [IncrStatSch] DROP PARTITION FUNCTION [IncrStatFn] GO Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SQL Statistics, Statistics

Read the article
SQL SERVER – Auditing and Profiling Database Made Easy with SQL Audit and Comply

- by Pinal Dave

Do you like auditing your database, or can you think of about a million other things you’d rather do? Unfortunately, auditing is incredibly important. As with tax audits, it is important to audit databases to ensure they are following all the rules, but they are also important for troubleshooting and security. There are several ways to audit SQL Server. There is manual auditing, which is going through your database “by hand,” and obviously takes a long time and is quite inefficient. SQL Server also provides programs to help you audit your systems. Different administrators will have different opinions about best practices and which tools to use, and each one will be perfected for certain systems and certain users. Today, though, I would like to talk about Apex SQL Audit. It is an auditing tool that acts like “track changes” in a word processing document. It will log what has changed on the database, who made the changes, and what effects these changes have had (i.e. what objects were affected down the line). All this information is logged, and can be easily viewed or printed for easy access. One of the best features of Apex is that it is so customizable (and easy to use!). First, start Apex. Then you can connect to the database you would like to monitor. Once you select your database, you can select which table you want to audit. You can customize right down to the field you’d like to audit, and then select which types of actions you’d like tracked – insert, delete, or update. Repeat these steps for every database you want monitored. To create the logs, choose “Create triggers” in the menu. The script written here will be what logs each insert, delete, and update function. Press F5 to execute. All this tracking information will be stored in AUDIT_LOG_DATA and AUDIT_LOG_TRANSACTIONS tables. View these tables using ApexSQL Audit reports. These transaction logs can be extremely detailed – especially on very busy servers, where every move it traced. Reading them can be overwhelming, to say the least. Apex has tried to make things easier for the average DBA, though. You can read these tracking logs in Apex, and it will display data and objects that affect your server – even things that were happening on your server before you installed Apex! To read these logs, open Apex, and connect to that database you want to audit. Go to the Transaction Logs tab, and add the logs you want to read. To narrow down what results you want to see, you can use the Filter tab to choose time, operation type, name, users, and more. Click Open, and you can see the results in a grid (as shown below). You can export these results to CSV, HTML, XML or SQL files and save on the hard disk. One of the advantages is that since there are no triggers here, there are no other processes that will affect SQL Server performance. Using this method is also how to view history from your database that occurred before Apex was installed. This type of tracking does require storage space for the data sources, as the database must be fully running, and the transaction logs must exist (things not stored in the transactions logs will not be recoverable). Apex can also replace SQL Server Profiler and SQL Server Traces – which are much more complex and error-prone – with its ApexSQL Comply. It can do fault tolerant auditing, centralized reporting, and “who saw what” information in an easy-to-use interface. The tracking settings can be altered by the user, or the default options will provide solutions to the most common auditing problems. To get started: open ApexSQL Comply, and selected Database Filter Settings to choose which database you’d like to audit. You can select which tracking you’re like in Operation Types – DML, DDL, queries executed, execute statements, and more. To get started, click Start Auditing. After this, every action will be stored in the central repository database (ApexSQLCrd). You can view the audit and create a report (or view the standard default report) using a wizard. You can see how easy it is to use ApexSQL Comply. You can easily set audits, including the type and time, and create customized reports. Remote users can easily access the reports through the user interface (available online, as well), and security concerns are all taken care of by the program. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article
SQL SERVER – Simple Example to Configure Resource Governor – Introduction to Resource Governor

- by pinaldave

Let us jump right away with question and answer mode. What is resource governor? Resource Governor is a feature which can manage SQL Server Workload and System Resource Consumption. We can limit the amount of CPU and memory consumption by limiting /governing /throttling on the SQL Server. Why is resource governor required? If there are different workloads running on SQL Server and each of the workload needs different resources or when workloads are competing for resources with each other and affecting the performance of the whole server resource governor is a very important task. What will be the real world example of need of resource governor? Here are two simple scenarios where the resource governor can be very useful. Scenario 1: A server which is running OLTP workload and various resource intensive reports on the same server. The ideal situation is where there are two servers which are data synced with each other and one server runs OLTP transactions and the second server runs all the resource intensive reports. However, not everybody has the luxury to set up this kind of environment. In case of the situation where reports and OLTP transactions are running on the same server, limiting the resources to the reporting workload it can be ensured that OTLP’s critical transaction is not throttled. Scenario 2: There are two DBAs in one organization. One DBA A runs critical queries for business and another DBA B is doing maintenance of the database. At any point in time the DBA A’s work should not be affected but at the same time DBA B should be allowed to work as well. The ideal situation is that when DBA B starts working he get some resources but he can’t get more than defined resources. Does SQL Server have any default resource governor component? Yes, SQL Server have two by default created resource governor component. 1) Internal –This is used by database engine exclusives and user have no control. 2) Default – This is used by all the workloads which are not assigned to any other group. What are the major components of the resource governor? Resource Pools Workload Groups Classification In simple words here is what the process of resource governor is. Create resource pool Create a workload group Create classification function based on the criteria specified Enable Resource Governor with classification function Let me further explain you the same with graphical image. Is it possible to configure resource governor with T-SQL? Yes, here is the code for it with explanation in between. Step 0: Here we are assuming that there are separate login accounts for Reporting server and OLTP server. /*----------------------------------------------- Step 0: (Optional and for Demo Purpose) Create Two User Logins 1) ReportUser, 2) PrimaryUser Use ReportUser login for Reports workload Use PrimaryUser login for OLTP workload -----------------------------------------------*/ Step 1: Creating Resource Pool We are creating two resource pools. 1) Report Server and 2) Primary OLTP Server. We are giving only a few resources to the Report Server Pool as described in the scenario 1 the other server is mission critical and not the report server. ----------------------------------------------- -- Step 1: Create Resource Pool ----------------------------------------------- -- Creating Resource Pool for Report Server CREATE RESOURCE POOL ReportServerPool WITH ( MIN_CPU_PERCENT=0, MAX_CPU_PERCENT=30, MIN_MEMORY_PERCENT=0, MAX_MEMORY_PERCENT=30) GO -- Creating Resource Pool for OLTP Primary Server CREATE RESOURCE POOL PrimaryServerPool WITH ( MIN_CPU_PERCENT=50, MAX_CPU_PERCENT=100, MIN_MEMORY_PERCENT=50, MAX_MEMORY_PERCENT=100) GO Step 2: Creating Workload Group We are creating two workloads each mapping to each of the resource pool which we have just created. ----------------------------------------------- -- Step 2: Create Workload Group ----------------------------------------------- -- Creating Workload Group for Report Server CREATE WORKLOAD GROUP ReportServerGroup USING ReportServerPool ; GO -- Creating Workload Group for OLTP Primary Server CREATE WORKLOAD GROUP PrimaryServerGroup USING PrimaryServerPool ; GO Step 3: Creating user defiled function which routes the workload to the appropriate workload group. In this example we are checking SUSER_NAME() and making the decision of Workgroup selection. We can use other functions such as HOST_NAME(), APP_NAME(), IS_MEMBER() etc. ----------------------------------------------- -- Step 3: Create UDF to Route Workload Group ----------------------------------------------- CREATE FUNCTION dbo.UDFClassifier() RETURNS SYSNAME WITH SCHEMABINDING AS BEGIN DECLARE @WorkloadGroup AS SYSNAME IF(SUSER_NAME() = 'ReportUser') SET @WorkloadGroup = 'ReportServerGroup' ELSE IF (SUSER_NAME() = 'PrimaryServerPool') SET @WorkloadGroup = 'PrimaryServerGroup' ELSE SET @WorkloadGroup = 'default' RETURN @WorkloadGroup END GO Step 4: In this final step we enable the resource governor with the classifier function created in earlier step 3. ----------------------------------------------- -- Step 4: Enable Resource Governer -- with UDFClassifier ----------------------------------------------- ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION=dbo.UDFClassifier); GO ALTER RESOURCE GOVERNOR RECONFIGURE GO Step 5: If you are following this demo and want to clean up your example, you should run following script. Running them will disable your resource governor as well delete all the objects created so far. ----------------------------------------------- -- Step 5: Clean Up -- Run only if you want to clean up everything ----------------------------------------------- ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = NULL) GO ALTER RESOURCE GOVERNOR DISABLE GO DROP FUNCTION dbo.UDFClassifier GO DROP WORKLOAD GROUP ReportServerGroup GO DROP WORKLOAD GROUP PrimaryServerGroup GO DROP RESOURCE POOL ReportServerPool GO DROP RESOURCE POOL PrimaryServerPool GO ALTER RESOURCE GOVERNOR RECONFIGURE GO I hope this introductory example give enough light on the subject of Resource Governor. In future posts we will take this same example and learn a few more details. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Resource Governor

Read the article
SQL SERVER – Solution – Puzzle – SELECT * vs SELECT COUNT(*)

- by pinaldave

Earlier I have published Puzzle Why SELECT * throws an error but SELECT COUNT(*) does not. This question have received many interesting comments. Let us go over few of the answers, which are valid. Before I start the same, let me acknowledge Rob Farley who has not only answered correctly very first but also started interesting conversation in the same thread. The usual question will be what is the right answer. I would like to point to official Microsoft Connect Items which discusses the same. RGarvao https://connect.microsoft.com/SQLServer/feedback/details/671475/select-test-where-exists-select tiberiu utan http://connect.microsoft.com/SQLServer/feedback/details/338532/count-returns-a-value-1 Rob Farley count(*) is about counting rows, not a particular column. It doesn’t even look to see what columns are available, it’ll just count the rows, which in the case of a missing FROM clause, is 1. “select *” is designed to return columns, and therefore barfs if there are none available. Even more odd is this one: select ‘blah’ where exists (select *) You might be surprised at the results… Koushik The engine performs a “Constant scan” for Count(*) where as in the case of “SELECT *” the engine is trying to perform either Index/Cluster/Table scans. amikolaj When you query ‘select * from sometable’, SQL replaces * with the current schema of that table. With out a source for the schema, SQL throws an error. so when you query ‘select count(*)’, you are counting the one row. * is just a constant to SQL here. Check out the execution plan. Like the description states – ‘Scan an internal table of constants.’ You could do ‘select COUNT(‘my name is adam and this is my answer’)’ and get the same answer. Netra Acharya SELECT * Here, * represents all columns from a table. So it always looks for a table (As we know, there should be FROM clause before specifying table name). So, it throws an error whenever this condition is not satisfied. SELECT COUNT(*) Here, COUNT is a Function. So it is not mandetory to provide a table. Check it out this: DECLARE @cnt INT SET @cnt = COUNT(*) SELECT @cnt SET @cnt = COUNT(‘x’) SELECT @cnt Naveen Select 1 / Select ‘*’ will return 1/* as expected. Select Count(1)/Count(*) will return the count of result set of select statement. Count(1)/Count(*) will have one 1/* for each row in the result set of select statement. Select 1 or Select ‘*’ result set will contain only 1 result. so count is 1. Where as “Select *” is a sysntax which expects the table or equauivalent to table (table functions, etc..). It is like compilation error for that query. Ramesh Hi Friends, Count is an aggregate function and it expects the rows (list of records) for a specified single column or whole rows for *. So, when we use ‘select *’ it definitely give and error because ‘*’ is meant to have all the fields but there is not any table and without table it can only raise an error. So, in the case of ‘Select Count(*)’, there will be an error as a record in the count function so you will get the result as ’1'. Try using : Select COUNT(‘RAMESH’) and think there is an error ‘Must specify table to select from.’ in place of ‘RAMESH’ Pinal : If i am wrong then please clarify this. Sachin Nandanwar Any aggregate function expects a constant or a column name as an expression. DO NOT be confused with * in an aggregate function.The aggregate function does not treat it as a column name or a set of column names but a constant value, as * is a key word in SQL. You can replace any value instead of * for the COUNT function.Ex Select COUNT(5) will result as 1. The error resulting from select * is obvious it expects an object where it can extract the result set. I sincerely thank you all for wonderful conversation, I personally enjoyed it and I am sure all of you have the same feeling. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
SQL SERVER – SSMS: Top Object and Batch Execution Statistics Reports

- by Pinal Dave

The month of June till mid of July has been the fever of sports. First, it was Wimbledon Tennis and then the Soccer fever was all over. There is a huge number of fan followers and it is great to see the level at which people sometimes worship these sports. Being an Indian, I cannot forget to mention the India tour of England later part of July. Following these sports and as the events unfold to the finals, there are a number of ways the statisticians can slice and dice the numbers. Cue from soccer I can surely say there is a team performance against another team and then there is individual member fairs against a particular opponent. Such statistics give us a fair idea to how a team in the past or in the recent past has fared against each other, head-to-head stats during World cup and during other neutral venue games. All these statistics are just pointers. In reality, they don’t reflect the calibre of the current team because the individuals who performed in each of these games are totally different (Typical example being the Brazil Vs Germany semi-final match in FIFA 2014). So at times these numbers are misleading. It is worth investigating and get the next level information. Similar to these statistics, SQL Server Management studio is also equipped with a number of reports like a) Object Execution Statistics report and b) Batch Execution Statistics reports. As discussed in the example, the team scorecard is like the Batch Execution statistics and individual stats is like Object Level statistics. The analogy can be taken only this far, trust me there is no correlation between SQL Server functioning and playing sports – It is like I think about diet all the time except while I am eating. Performance – Batch Execution Statistics Let us view the first report which can be invoked from Server Node -> Reports -> Standard Reports -> Performance – Batch Execution Statistics. Most of the values that are displayed in this report come from the DMVs sys.dm_exec_query_stats and sys.dm_exec_sql_text(sql_handle). This report contains 3 distinctive sections as outline below. Section 1: This is a graphical bar graph representation of Average CPU Time, Average Logical reads and Average Logical Writes for individual batches. The Batch numbers are indicative and the details of individual batch is available in section 3 (detailed below). Section 2: This represents a Pie chart of all the batches by Total CPU Time (%) and Total Logical IO (%) by batches. This graphical representation tells us which batch consumed the highest CPU and IO since the server started, provided plan is available in the cache. Section 3: This is the section where we can find the SQL statements associated with each of the batch Numbers. This also gives us the details of Average CPU / Average Logical Reads and Average Logical Writes in the system for the given batch with object details. Expanding the rows, I will also get the # Executions and # Plans Generated for each of the queries. Performance – Object Execution Statistics The second report worth a look is Object Execution statistics. This is a similar report as the previous but turned on its head by SQL Server Objects. The report has 3 areas to look as above. Section 1 gives the Average CPU, Average IO bar charts for specific objects. The section 2 is a graphical representation of Total CPU by objects and Total Logical IO by objects. The final section details the various objects in detail with the Avg. CPU, IO and other details which are self-explanatory. At a high-level both the reports are based on queries on two DMVs (sys.dm_exec_query_stats and sys.dm_exec_sql_text) and it builds values based on calculations using columns in them: SELECT * FROM sys.dm_exec_query_stats s1 CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 WHERE s2.objectid IS NOT NULL AND DB_NAME(s2.dbid) IS NOT NULL ORDER BY s1.sql_handle; This is one of the simplest form of reports and in future blogs we will look at more complex reports. I truly hope that these reports can give DBAs and developers a hint about what is the possible performance tuning area. As a closing point I must emphasize that all above reports pick up data from the plan cache. If a particular query has consumed a lot of resources earlier, but plan is not available in the cache, none of the above reports would show that bad query. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
SQL SERVER – Replace a Column Name in Multiple Stored Procedure all together

- by pinaldave

I receive a lot of emails every day. I try to answer each and every email and comments on Facebook and Twitter. I prefer communication on social media as this gives opportunities to others to read the questions and participate along with me. There is always some question which everyone likes to read and remember. Here is one of the questions which I received in email. I believe the same question will be there any many developers who are beginning with SQL Server. I decided to blog about it so everyone can read it and participate. “I am beginner in SQL Server. I have a very interesting situation and need your help. I am beginner to SQL Server and that is why I do not have access to the production server and I work entirely on the development server. The project I am working on is also in the infant stage as well. In product I had to create a multiple tables and every table had few columns. Later on I have written Stored Procedures using those tables. During a code review my manager has requested to change one of the column which I have used in the table. As per him the naming convention was not accurate. Now changing the columname in the table is not a big issue. I figured out that I can do it very quickly either using T-SQL script or SQL Server Management Studio. The real problem is that I have used this column in nearly 50+ stored procedure. This looks like a very mechanical task. I believe I can go and change it in nearly 50+ stored procedure but is there a better solution I can use. Someone suggested that I should just go ahead and find the text in system table and update it there. Is that safe solution? If not, what is your solution. In simple words, How to replace a column name in multiple stored procedure efficiently and quickly? Please help me here with keeping my experience and non-production server in mind.” Well, I found this question very interesting. Honestly I would have preferred if this question was asked on my social media handles (Facebook and Twitter) as I am very active there and quite often before I reach there other experts have already answered this question. Anyway I am now answering the same question on the blog so all of us can participate here and come up with an appropriate answer. Here is my answer - “My Friend, I do not advice to touch system table. Please do not go that route. It can be dangerous and not appropriate. The issue which you faced today is what I used to face in early career as well I still face it often. There are two sets of argument I have observed – there are people who see no value in the name of the object and name objects like obj1, obj2 etc. There are sets of people who carefully chose the name of the object where object name is self-explanatory and almost tells a story. I am not here to take any side in this blog post – so let me go to a quick solution for your problem. Note: Following should not be directly practiced on Production Server. It should be properly tested on development server and once it is validated they should be pushed to your production server with your existing deployment practice. The answer is here assuming you have regular stored procedures and you are working on the Development NON Production Server. Go to Server Note >> Databases >> DatabaseName >> Programmability >> Stored Procedure Now make sure that Object Explorer Details are open (if not open it by clicking F7). You will see the list of all the stored procedures there. Now you will see a list of all the stored procedures on the right side list. Select either all of them or the one which you believe are relevant to your query. Now… Right click on the stored procedures >> SELECT DROP and CREATE to >> Now select New Query Editor Window or Clipboard. Paste the complete script to a new window if you have selected Clipboard option. Now press Control+H which will bring up the Find and Replace Screen. In this screen insert the column to be replaced in the “Find What”box and new column name into “Replace With” box. Now execute the whole script. As we have selected DROP and CREATE to, it will created drop the old procedure and create the new one. Another method would do all the same procedure but instead of DROP and CREATE manually replace the CREATE word with ALTER world. There is a small advantage in doing this is that if due to any reason the error comes up which prevents the new stored procedure to be created you will have your old stored procedure in the system as it is. “ Well, this was my answer to the question which I have received. Do you see any other workaround or solution? Reference : Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Stored Procedure, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – SSIS Parameters in Parent-Child ETL Architectures – Notes from the Field #040

- by Pinal Dave

[Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Parent-Child ETL architecture’s relationship in SSIS. Linchpin People are database coaches and wellness experts for a data driven world. In this 40th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand SSIS Parameters in Parent-Child ETL Architectures. In this brief Notes from the Field post, I will review the use of SSIS parameters in parent-child ETL architectures. A very common design pattern used in SQL Server Integration Services is one I call the parent-child pattern. Simply put, this is a pattern in which packages are executed by other packages. An ETL infrastructure built using small, single-purpose packages is very often easier to develop, debug, and troubleshoot than large, monolithic packages. For a more in-depth look at parent-child architectures, check out my earlier blog post on this topic. When using the parent-child design pattern, you will frequently need to pass values from the calling (parent) package to the called (child) package. In older versions of SSIS, this process was possible but not necessarily simple. When using SSIS 2005 or 2008, or even when using SSIS 2012 or 2014 in package deployment mode, you would have to create package configurations to pass values from parent to child packages. Package configurations, while effective, were not the easiest tool to work with. Fortunately, starting with SSIS in SQL Server 2012, you can now use package parameters for this purpose. In the example I will use for this demonstration, I’ll create two packages: one intended for use as a child package, and the other configured to execute said child package. In the parent package I’m going to build a for each loop container in SSIS, and use package parameters to pass in a value – specifically, a ClientID – for each iteration of the loop. The child package will be executed from within the for each loop, and will create one output file for each client, with the source query and filename dependent on the ClientID received from the parent package. Configuring the Child and Parent Packages When you create a new package, you’ll see the Parameters tab at the package level. Clicking over to that tab allows you to add, edit, or delete package parameters. As shown above, the sample package has two parameters. Note that I’ve set the name, data type, and default value for each of these. Also note the column entitled Required: this allows me to specify whether the parameter value is optional (the default behavior) or required for package execution. In this example, I have one parameter that is required, and the other is not. Let’s shift over to the parent package briefly, and demonstrate how to supply values to these parameters in the child package. Using the execute package task, you can easily map variable values in the parent package to parameters in the child package. The execute package task in the parent package, shown above, has the variable vThisClient from the parent package mapped to the pClientID parameter shown earlier in the child package. Note that there is no value mapped to the child package parameter named pOutputFolder. Since this parameter has the Required property set to False, we don’t have to specify a value for it, which will cause that parameter to use the default value we supplied when designing the child pacakge. The last step in the parent package is to create the for each loop container I mentioned earlier, and place the execute package task inside it. I’m using an object variable to store the distinct client ID values, and I use that as the iterator for the loop (I describe how to do this more in depth here). For each iteration of the loop, a different client ID value will be passed into the child package parameter. The final step is to configure the child package to actually do something meaningful with the parameter values passed into it. In this case, I’ve modified the OleDB source query to use the pClientID value in the WHERE clause of the query to restrict results for each iteration to a single client’s data. Additionally, I’ll use both the pClientID and pOutputFolder parameters to dynamically build the output filename. As shown, the pClientID is used in the WHERE clause, so we only get the current client’s invoices for each iteration of the loop. For the flat file connection, I’m setting the Connection String property using an expression that engages both of the parameters for this package, as shown above. Parting Thoughts There are many uses for package parameters beyond a simple parent-child design pattern. For example, you can create standalone packages (those not intended to be used as a child package) and still use parameters. Parameter values may be supplied to a package directly at runtime by a SQL Server Agent job, through the command line (via dtexec.exe), or through T-SQL. Also, you can also have project parameters as well as package parameters. Project parameters work in much the same way as package parameters, but the parameters apply to all packages in a project, not just a single package. Conclusion Of the numerous advantages of using catalog deployment model in SSIS 2012 and beyond, package parameters are near the top of the list. Parameters allow you to easily share values from parent to child packages, enabling more dynamic behavior and better code encapsulation. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Planned and Unplanned Availablity Group Failovers – Notes from the Field #031

- by Pinal Dave

[Note from Pinal]: This is a new episode of Notes from the Fields series. AlwaysOn is a very complex subject and not everyone knows many things about this. The matter of the fact is there is very little information available on this subject online and not everyone knows everything about this. This is why when a very common question related to AlwaysOn comes, people get confused. In this episode of the Notes from the Field series database expert John Sterrett (Group Principal at Linchpin People) explains a very common issue DBAs and Developer faces in their career and is related to Planned and Unplanned Availablity Group Failovers. Linchpin People are database coaches and wellness experts for a data driven world. Read the experience of John in his own words. Whenever a disaster occurs it will be a stressful scenario regardless of how small or big the disaster is. This gets multiplied when it is your first time working with newer technology or the first time you are going through a disaster without a proper run book. Today, were going to help you establish a run book for creating a planned failover with availability groups. To make today’s session simple were going to have two instances of SQL Server 2012 included in an availability group and walk through the steps of doing an unplanned failover. We will focus on using the user interface and T-SQL to complete the failovers. We are going to use a two replica Availability Group where each replica is in another location. Therefore, we will be covering Asynchronous (non automatic failover) the following is a breakdown of our availability group utilized today. Seeing the following screen might be scary the first time you come across an unplanned failover. It looks like our test database used in this Availability Group is not functional and it currently isn’t. The database status is not synchronizing which makes sense because the primary replica went down so it couldn’t synchronize. With that said, we can still failover and make it functional while we troubleshoot why we lost our primary replica. To start we are going to right click on the availability group that needs to be restarted and select failover. This will bring up the following wizard, which will walk you through several steps needed to complete the failover using the graphical user interface provided with SQL Server Management Studio (SSMS). You are going to see warning messages simply because we are in Asynchronous commit mode and can not guarantee ‘no data loss’ when we do failover. Just incase you missed it; you get another screen warning you about potential data loss because we are in Asynchronous mode. Next we get to connect to the specific replica we want to become the primary replica after the failover occurs. In our case, we only have two replicas so this is trivial. In order to failover, it’s required to connect to the replica that will become primary. The following screen shows that the connection has been made successfully. Next, you will see the final summary screen. Once again, this reminds you that the failover action will cause data loss as were using Asynchronous commit mode due to the distance between instances used for disaster recovery. Finally, once the failover is completed you will see the following screen. If you followed along this long you might be wondering what T-SQL scripts are generated for clicking through all the sections of the wizard. If you have used Database Mirroring in the past you might be surprised. It’s not too different, which makes sense because the data is being replicated via SQL Server endpoints just like the good old database mirroring. Now were going to take a look at how to do a failover with just T-SQL. First, were going to need to open a new query window and run our query in SQLCMD mode. Just incase you haven’t used SQLCMD mode before we will show you how to enable it below. Now you can run the following statement. Notice, we connect to the replica we want to become primary after failover and specify to force failover to allow data loss. We can use the following script to failback over when our primary instance comes back online. -- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE. :Connect SQL2012PROD1 ALTER AVAILABILITY GROUP [AGSQL2] FORCE_FAILOVER_ALLOW_DATA_LOSS; GO Are your servers running at optimal speed or are you facing any SQL Server Performance Problems? If you want to get started with the help of experts read more over here: Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Evolution of Big Data – Day 3 of 21

- by Pinal Dave

In yesterday’s blog post we answered what is the Big Data. Today we will understand why and how the evolution of Big Data has happened. Though the answer is very simple, I would like to tell it in the form of a history lesson. Data in Flat File In earlier days data was stored in the flat file and there was no structure in the flat file. If any data has to be retrieved from the flat file it was a project by itself. There was no possibility of retrieving the data efficiently and data integrity has been just a term discussed without any modeling or structure around. Database residing in the flat file had more issues than we would like to discuss in today’s world. It was more like a nightmare when there was any data processing involved in the application. Though, applications developed at that time were also not that advanced the need of the data was always there and there was always need of proper data management. Edgar F Codd and 12 Rules Edgar Frank Codd was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases. He presented 12 rules for the Relational Database and suddenly the chaotic world of the database seems to see discipline in the rules. Relational Database was a promising land for all the unstructured database users. Relational Database brought into the relationship between data as well improved the performance of the data retrieval. Database world had immediately seen a major transformation and every single vendors and database users suddenly started to adopt the relational database models. Relational Database Management Systems Since Edgar F Codd proposed 12 rules for the RBDMS there were many different vendors who started them to build applications and tools to support the relationship between database. This was indeed a learning curve for many of the developer who had never worked before with the modeling of the database. However, as time passed by pretty much everybody accepted the relationship of the database and started to evolve product which performs its best with the boundaries of the RDBMS concepts. This was the best era for the databases and it gave the world extreme experts as well as some of the best products. The Entity Relationship model was also evolved at the same time. In software engineering, an Entity–relationship model (ER model) is a data model for describing a database in an abstract way. Enormous Data Growth Well, everything was going fine with the RDBMS in the database world. As there were no major challenges the adoption of the RDBMS applications and tools was pretty much universal. There was a race at times to make the developer’s life much easier with the RDBMS management tools. Due to the extreme popularity and easy to use system pretty much every data was stored in the RDBMS system. New age applications were built and social media took the world by the storm. Every organizations was feeling pressure to provide the best experience for their users based the data they had with them. While this was all going on at the same time data was growing pretty much every organization and application. Data Warehousing The enormous data growth now presented a big challenge for the organizations who wanted to build intelligent systems based on the data and provide near real time superior user experience to their customers. Various organizations immediately start building data warehousing solutions where the data was stored and processed. The trend of the business intelligence becomes the need of everyday. Data was received from the transaction system and overnight was processed to build intelligent reports from it. Though this is a great solution it has its own set of challenges. The relational database model and data warehousing concepts are all built with keeping traditional relational database modeling in the mind and it still has many challenges when unstructured data was present. Interesting Challenge Every organization had expertise to manage structured data but the world had already changed to unstructured data. There was intelligence in the videos, photos, SMS, text, social media messages and various other data sources. All of these needed to now bring to a single platform and build a uniform system which does what businesses need. The way we do business has also been changed. There was a time when user only got the features what technology supported, however, now users ask for the feature and technology is built to support the same. The need of the real time intelligence from the fast paced data flow is now becoming a necessity. Large amount (Volume) of difference (Variety) of high speed data (Velocity) is the properties of the data. The traditional database system has limits to resolve the challenges this new kind of the data presents. Hence the need of the Big Data Science. We need innovation in how we handle and manage data. We need creative ways to capture data and present to users. Big Data is Reality! Tomorrow In tomorrow’s blog post we will try to answer discuss Basics of Big Data Architecture. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – The Story of a Lesser Known Startup Parameter in SQL Server – Guest Post by Balmukund Lakhani

- by Pinal Dave

This is a fantastic blog post from my dear friend Balmukund ( blog | twitter | facebook ). He had presented a fantastic session in our last UG and there were lots of requests from attendees that he blogs about it. Well, here is the blog post about the same very popular UG session. Let us read the entire blog post in the voice of the Balmukund himself. During my last session in SQL Bangalore User Group (Facebook) meeting, I was lucky enough to deliver a session on SQL Server Startup issue. The name of the session was “SQL Engine Starting Trouble – How to start?” From the feedback, I realized that one of the “not well known” startup parameter is “-m”. Okay, you might say “I know that this is used to start the SQL in single user mode”. But what you might not know is that you can pass a string with -m which has special meaning and use. I have used this parameter in my blog here but looks like not many of you have seen that. It happens most of the time when we want to start SQL Server in single user mode, someone else makes connection before you can. The only choice you have is to repeat same process again till you succeed. Some smart DBAs may disable the remote network protocols (TCP/IP and Named Pipes) of SQL Instance and allow only local connections to SQL. Once the activity is complete, our dear smart DBA has to remember to re-enable network protocols. Sometimes, it may be a local service or application getting connection to SQL before we can. There is a better way to deal with it. Yes, you have guessed it correctly: -m parameter which a string. Since I work with SQL Product Support team, I may know little more undocumented commands and parameters, but this is not an undocumented stuff. It’s already documented in books online. So in this blog, I am going to show a demo of its usage. As documentation shows, “Do not use this option as a security feature.” So please read this blog as knowledge enhancer and troubleshooting issues not security feature. In my laptop, I have a default instance of SQL Server 2012 and here is what we would in the configuration manager. Now, I would go ahead and stop SQL Service by selecting SQL Server (MSSQLServer) > Right Click > Stop. There are multiple ways to start SQL with startup parameter. 1) Use Net Start Command from command prompt Net Start MSSQLServer /mSQLCMD The above command is the simplest way to add startup parameter to SQL. This parameter would be cleared once we stop and start SQL. 2) Add Startup Parameter via configuration manager. Step is already listed here. We need to add -mSQLCMD If we compare 1 and 2, it’s clear that unless we modify startup parameter and remove -m, it would be in effect. 3) Start SQL Service via command line SQLServr.exe –mSQLCMD –s<InstanceName> Wait, what does SQLCMD mean with /m? It’s the instruction to SQL that start SQL Server in Single User Mode and allow only the application which is SQLCMD. Any other application would fail with Login Failed for User Error message. It would be important to note that string is case sensitive. This value should be picked up from application_name column from sys.dm_exec_sessions. I have made a connection using SQLCMD and as we can see it comes as upper case “SQLCMD”. If we want only management studio query windows to connect then we need to give -m” Microsoft SQL Server Management Studio – Query” as startup parameter. In below example, I have given it as SQLCMd (lower case d at the end) and we would notice that we would not be able to connect to SQL Instance. Above proves that parameter works as expected and it’s case sensitive. Error Log would show below information. How to get error log location? I have already blogged about it. Hope you have learned something new. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – SmallDateTime and Precision – A Continuous Confusion

- by pinaldave

Some kinds of confusion never go away. Here is one of the ancient confusing things in SQL. The precision of the SmallDateTime is one concept that confuses a lot of people, proven by the many messages I receive everyday relating to this subject. Let me start with the question: What is the precision of the SMALLDATETIME datatypes? What is your answer? Write it down on your notepad. Now if you do not want to continue reading the blog post, head to my previous blog post over here: SQL SERVER – Precision of SMALLDATETIME. A Social Media Question Since the increase of social media conversations, I noticed that the amount of the comments I receive on this blog is a bit staggering. I receive lots of questions on facebook, twitter or Google+. One of the very interesting questions yesterday was asked on Facebook by Raghavendra. I am re-organizing his script and asking all of the questions he has asked me. Let us see if we could help him with his question: CREATE TABLE #temp (name VARCHAR(100),registered smalldatetime) GO DECLARE @test smalldatetime SET @test=GETDATE() INSERT INTO #temp VALUES ('Value1',@test) INSERT INTO #temp VALUES ('Value2',@test) GO SELECT * FROM #temp ORDER BY registered DESC GO DROP TABLE #temp GO Now when the above script is ran, we will get the following result: Well, the expectation of the query was to have the following result. The row which was inserted last was expected to return as first row in result set as the ORDER BY descending. Side note: Because the requirement is to get the latest data, we can’t use any column other than smalldatetime column in order by. If we use name column in the order by, we will get an incorrect result as it can be any name. My Initial Reaction My initial reaction was as follows: 1) DataType DateTime2: If file precision of the column is expected from the column which store date and time, it should not be smalldatetime. The precision of the column smalldatetime is One Minute (Read Here) for finer precision use DateTime or DateTime2 data type. Here is the code which includes above suggestion: CREATE TABLE #temp (name VARCHAR(100), registered datetime2) GO DECLARE @test datetime2 SET @test=GETDATE() INSERT INTO #temp VALUES ('Value1',@test) INSERT INTO #temp VALUES ('Value2',@test) GO SELECT * FROM #temp ORDER BY registered DESC GO DROP TABLE #temp GO 2) Tie Breaker Identity: There are always possibilities that two rows were inserted at the same time. In that case, you may need a tie breaker. If you have an increasing identity column, you can use that as a tie breaker as well. CREATE TABLE #temp (ID INT IDENTITY(1,1), name VARCHAR(100),registered datetime2) GO DECLARE @test datetime2 SET @test=GETDATE() INSERT INTO #temp VALUES ('Value1',@test) INSERT INTO #temp VALUES ('Value2',@test) GO SELECT * FROM #temp ORDER BY ID DESC GO DROP TABLE #temp GO Those two were the quick suggestions I provided. It is not necessary that you should use both advices. It is possible that one can use only DATETIME datatype or Identity column can have datatype of BIGINT or have another tie breaker. An Alternate NO Solution In the facebook thread this was also discussed as one of the solutions: CREATE TABLE #temp (name VARCHAR(100),registered smalldatetime) GO DECLARE @test smalldatetime SET @test=GETDATE() INSERT INTO #temp VALUES ('Value1',@test) INSERT INTO #temp VALUES ('Value2',@test) GO SELECT name, registered, ROW_NUMBER() OVER(ORDER BY registered DESC) AS "Row Number" FROM #temp ORDER BY 3 DESC GO DROP TABLE #temp GO However, I believe it is not the solution and can be further misleading if used in a production server. Here is the example of why it is not a good solution: CREATE TABLE #temp (name VARCHAR(100) NOT NULL,registered smalldatetime) GO DECLARE @test smalldatetime SET @test=GETDATE() INSERT INTO #temp VALUES ('Value1',@test) INSERT INTO #temp VALUES ('Value2',@test) GO -- Before Index SELECT name, registered, ROW_NUMBER() OVER(ORDER BY registered DESC) AS "Row Number" FROM #temp ORDER BY 3 DESC GO -- Create Index ALTER TABLE #temp ADD CONSTRAINT [PK_#temp] PRIMARY KEY CLUSTERED (name DESC) GO -- After Index SELECT name, registered, ROW_NUMBER() OVER(ORDER BY registered DESC) AS "Row Number" FROM #temp ORDER BY 3 DESC GO DROP TABLE #temp GO Now let us examine the resultset. You will notice that an index which is created on the base table which is (indeed) schema change the table but can affect the resultset. As you can see, an index can change the resultset, so this method is not yet perfect to get the latest inserted resultset. No Schema Change Requirement After giving these two suggestions, I was waiting for the feedback of the asker. However, the requirement of the asker was there can’t be any schema change because the application was used by many other applications. I validated again, and of course, the requirement is no schema change at all. No addition of the column of change of datatypes of any other columns. There is no further help as well. This is indeed an interesting question. I personally can’t think of any solution which I could provide him given the requirement of no schema change. Can you think of any other solution to this? Need of Database Designer This question once again brings up another ancient question: “Do we need a database designer?” I often come across databases which are facing major performance problems or have redundant data. Normalization is often ignored when a database is built fast under a very tight deadline. Often I come across a database which has table with unnecessary columns and performance problems. While working as Developer Lead in my earlier jobs, I have seen developers adding columns to tables without anybody’s consent and retrieving them as SELECT *. There is a lot to discuss on this subject in detail, but for now, let’s discuss the question first. Do you have any suggestions for the above question? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: CodeProject, Developer Training, PostADay, SQL, SQL Authority, SQL DateTime, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
SQL SERVER – TechEd India 2012 – Content, Speakers and a Lots of Fun

- by pinaldave

TechEd is one event which every developers and IT professionals are looking forward to attend. It is opportunity of life time and no matter how many time one gets chance to engage with it, it is never enough. I still remember every single moment of every TechEd I have attended so far. We are less than 100 hours away from TechEd India 2012 event.This event is the one must attend event for every Technology Enthusiast. Fourth time in the row I am going to attend this event and I am equally excited as the first time of the event. There are going to be two very solid SQL Server track this time and I will be attending end of the end both the tracks. Here is my view on each of the 10 sessions. Each session is carefully crafted and leading exeprts from industry will present it. Day 1, March 21, 2012 T-SQL Rediscovered with SQL Server 2012 – This session is going to bring some of the lesser known enhancements that were brought with SQL Server 2012. When I learned that Jacob Sebastian is going to do this session my reaction to this is DEMO, DEMO and DEMO! Jacob spends hours and hours of his time preparing his session and this will be one of those session that I am confident will be delivered over and over through out the next many events. Catapult your data with SQL Server 2012 Integration Services – Praveen is expert story teller and one of the wizard when it is about SQL Server and business intelligence. He is surely going to mesmerize you with some interesting insights on SSIS performance too. Processing Big Data with SQL Server 2012 and Hadoop – There are three sessions on Big Data at TechEd India 2012. Stephen is going to deliver one of the session. Watching Stephen present is always joy and quite entertaining. He shares knowledge with his typical humor which captures ones attention. I wrote about what is BIG DATA in a blog post. SQL Server Misconceptions and Resolutions – I will be presenting this Session along with Vinod Kumar. READ MORE HERE. Securing with ContainedDB in SQL Server 2012 – Pranab is expert when it is about SQL Server and Security. I have seen him presenting and he is indeed very pleasant to watch. A dry subject like security, he makes it much lively. A Contained Database is a database which contains all the necessary settings and metadata, making database easily portable to another server. This database will contain all the necessary details and will not have to depend on any server where it is installed for anything. You can take this database and move it to another server without having any worries. Day 3, March 23, 2012 Peeling SQL Server like an Onion: Internals Demystified – Vinod Kumar has been writing about this extensively on his other blog post. In recent conversation he suggested that he will be creating very exclusive content for this presentation. I know Vinod for long time and have worked with him along many community activities. I am going to pay special attention to the details. I know Vinod has few give-away planned now for attending the session now only if he shares with us. Speed Up – Parallel Processes and unparalleled Performance – Performance tuning is my favorite subject. I will be discussing effect of parallelism on performance in this session. Here me out, there will be lots of quiz questions during this session and if you get the answers correct – you can win some really cool goodies – I Promise! READ MORE HERE. Keep your database available – AlwaysOn – Balmukund is like an army man. He is always ready to show and prove that he has coolest toys in terms of SQL Server and he knows how to keep them running AlwaysON. Availability groups, Listener, Clustering, Failover, Read-Only replica etc all will be demo’ed in this session. This is really heavy but very interesting content not to be missed. Lesser known facts about SQL Server Backup and Restore – Amit Banerjee – this name is known internationally for solving SQL Server problems in 140 characters. He has already blogged about this and this topic is going to be interesting. A successful restore strategy for applications is as good as their last good known backup. I have few difficult questions to ask to Amit and I am very sure that his unique style will entertain people. By the way, his one of the slide may give few in audience a funny heart attack. Top 5 reasons why you want SQL Server 2012 BI – Praveen plans to take a tour of some of the BI enhancements introduced in the new version. Business Insights with SQL Server is a critical building block and this version of SQL Server is no exception. For the matter of the fact, when I saw the demos he was going to show during this session, I felt like that I wish I can set up all of this on my machine. If you miss this session – you will miss one of the most informative session of the day. Also TechEd India 2012 has a Live streaming of some content and this can be watched here. The TechEd Team is planning to have some really good exclusive content in this channel as well. If you spot me, just do not hesitate to come by me and introduce yourself, I want to remember you! Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLServer, T SQL, Technology Tagged: TechEd, TechEdIn

Read the article
SQL SERVER – SSMS: Database Consistency History Report

- by Pinal Dave

Doctor and Database The last place I like to visit is always a hospital. With the monsoon season starting, intermittent rains, it has become sort of a routine to get a cycle of fever every other year (seriously I hate it). So when I visit my doctor, it is always interesting in the way he quizzes me. The routine question of – “How many days have you had this?”, “Is there any pattern?”, “Did you drench in rain?”, “Do you have any other symptom?” and so on. The idea here is that the doctor wants to find any anomaly or a pattern that will guide him to a viral or bacterial type. Most of the time they get it based on experience and sometimes after a battery of tests. So if there is consistent behavior to your problem, there is always a solution out. SQL Server has its way to find if the server data / files are in consistent state using the DBCC commands. Back to SQL Server In real life, Database consistency check is one of the critical operations a DBA generally doesn’t give much priority. Many readers of my blogs have asked many times, how do we know if the database is consistent? How do I read output of DBCC CHECKDB and find if everything is right or not? My common answer to all of them is – look at the bottom of checkdb (or checktable) output and look for below line. CHECKDB found 0 allocation errors and 0 consistency errors in database ‘DatabaseName’. Above is a “good sign” because we are seeing zero allocation and zero consistency error. If you are seeing non-zero errors then there is some problem with the database. Sample output is shown as below: CHECKDB found 0 allocation errors and 2 consistency errors in database ‘DatabaseName’. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DatabaseName). If we see non-zero error then most of the time (not always) we get repair options depending on the level of corruption. There is risk involved with above option (repair_allow_data_loss), that is – we would lose the data. Sometimes the option would be repair_rebuild which is little safer. Though these options are available, it is important to find the root cause to the problem. In standard report, there is a report which can show the history of checkdb executed for the selected database. Since this is a database level report, we need to right click on database, click Reports, click Standard Reports and then choose “Database Consistency History” report. The information in this report is picked from default trace. If default trace is disabled or there is no checkdb run or information is not there in default trace (because it’s rolled over), we would get report like below. As we can see report says it very clearly: Currently, no execution history of CHECKDB is available or default trace is not enabled. To demonstrate, I have caused corruption in one of the database and did below steps. Run CheckDB so that errors are reported. Fix the corruption by losing the data using repair option Run CheckDB again to check if corruption is cleared. After that I have launched the report and below is what we would see. If you are lazy like me and don’t want to run the report manually for each database then below query would be handy to provide same report for all database. This query is runs behind the scenes by the report. All I have done is remove the filter for database name (at the last – highlighted). DECLARE @curr_tracefilename VARCHAR(500); DECLARE @base_tracefilename VARCHAR(500); DECLARE @indx INT; SELECT @curr_tracefilename = path FROM sys.traces WHERE is_default = 1; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SELECT @indx = PATINDEX('%\%', @curr_tracefilename) ; SET @curr_tracefilename = REVERSE(@curr_tracefilename); SET @base_tracefilename = LEFT( @curr_tracefilename,LEN(@curr_tracefilename) - @indx) + '\log.trc'; SELECT SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),36, PATINDEX('%executed%',TEXTData)-36) AS command , LoginName , StartTime , CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%found%',TEXTData) +6,PATINDEX('%errors %',TEXTData)-PATINDEX('%found%',TEXTData)-6)) AS errors , CONVERT(INT,SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%repaired%',TEXTData) +9,PATINDEX('%errors.%',TEXTData)-PATINDEX('%repaired%',TEXTData)-9)) repaired , SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%time:%',TEXTData)+6,PATINDEX('%hours%',TEXTData)-PATINDEX('%time:%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%hours%',TEXTData) +6,PATINDEX('%minutes%',TEXTData)-PATINDEX('%hours%',TEXTData)-6)+':'+SUBSTRING(CONVERT(NVARCHAR(MAX),TEXTData),PATINDEX('%minutes%',TEXTData) +8,PATINDEX('%seconds.%',TEXTData)-PATINDEX('%minutes%',TEXTData)-8) AS time FROM::fn_trace_gettable( @base_tracefilename, DEFAULT) WHERE EventClass = 22 AND SUBSTRING(TEXTData,36,12) = 'DBCC CHECKDB' -- AND DatabaseName = @DatabaseName; Don’t get worried about the logic above. All it is doing is reading the trace files, parsing below entry and getting out information for underlined words. DBCC CHECKDB (CorruptedDatabase) executed by sa found 2 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds. Internal database snapshot has split point LSN = 00000029:00000030:0001 and first LSN = 00000029:00000020:0001. Hopefully now onwards you would run checkdb and understand the importance of it. As responsible DBAs I am sure you are already doing it, let me know how often do you actually run them on you production environment? Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Number-Crunching with SQL Server – Exceed the Functionality of Excel

- by Pinal Dave

Imagine this. Your users have developed an Excel spreadsheet that extracts data from your SQL Server database, manipulates that data through the use of Excel formulas and, possibly, some VBA code which is then used to calculate P&L, hedging requirements or even risk numbers. Management comes to you and tells you that they need to get rid of the spreadsheet and that the results of the spreadsheet calculations need to be persisted on the database. SQL Server has a very small set of functions for analyzing data. Excel has hundreds of functions for analyzing data, with many of them focused on specific financial and statistical calculations. Is it even remotely possible that you can use SQL Server to replace the complex calculations being done in a spreadsheet? Westclintech has developed a library of functions that match or exceed the functionality of Excel’s functions and contains many functions that are not available in EXCEL. Their XLeratorDB library of functions contains over 700 functions that can be incorporated into T-SQL statements. XLeratorDB takes advantage of the SQL CLR architecture introduced in SQL Server 2005. SQL CLR permits managed code to be compiled into the database and run alongside built-in SQL Server functions like COUNT or SUM. The Westclintech developers have taken advantage of this architecture to bring robust analytical functions to the database. In our hypothetical spreadsheet, let’s assume that our users are using the YIELD function and that the data are extracted from a table in our database called BONDS. Here’s what the spreadsheet might look like. We go to column G and see that it contains the following formula. Obviously, SQL Server does not offer a native YIELD function. However, with XLeratorDB we can replicate this calculation in SQL Server with the following statement: SELECT *, wct.YIELD(CAST(GETDATE() AS date),Maturity,Rate,Price,100,Frequency,Basis) AS YIELD FROM BONDS This produces the following result. This illustrates one of the best features about XLeratorDB; it is so easy to use. Since I knew that the spreadsheet was using the YIELD function I could use the same function with the same calling structure to do the calculation in SQL Server. I didn’t need to know anything at all about the mechanics of calculating the yield on a bond. It was pretty close to cut and paste. In fact, that’s one way to construct the SQL. Just copy the function call from the cell in the spreadsheet and paste it into SMS and change the cell references to column names. I built the SQL for this query by starting with this. SELECT * ,YIELD(TODAY(),B2,C2,D2,100,E2,F2) FROM BONDS I then changed the cell references to column names. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) ,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Finally, I replicated the TODAY() function using GETDATE() and added the schema name to the function name. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) --,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) ,wct.YIELD(GETDATE(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Then I am able to execute the statement returning the results seen above. The XLeratorDB libraries are heavy on financial, statistical, and mathematical functions. Where there is an analog to an Excel function, the XLeratorDB function uses the same naming conventions and calling structure as the Excel function, but there are also hundreds of additional functions for SQL Server that are not found in Excel. You can find the functions by opening Object Explorer in SQL Server Management Studio (SSMS) and expanding the Programmability folder under the database where the functions have been installed. The Functions folder expands to show 3 sub-folders: Table-valued Functions; Scalar-valued functions, Aggregate Functions, and System Functions. You can expand any of the first three folders to see the XLeratorDB functions. Since the wct.YIELD function is a scalar function, we will open the Scalar-valued Functions folder, scroll down to the wct.YIELD function and and click the plus sign (+) to display the input parameters. The functions are also Intellisense-enabled, with the input parameters displayed directly in the query tab. The Westclintech website contains documentation for all the functions including examples that can be copied directly into a query window and executed. There are also more one hundred articles on the site which go into more detail about how some of the functions work and demonstrate some of the extensive business processes that can be done in SQL Server using XLeratorDB functions and some T-SQL. XLeratorDB is organized into libraries: finance, statistics; math; strings; engineering; and financial options. There is also a windowing library for SQL Server 2005, 2008, and 2012 which provides functions for calculating things like running and moving averages (which were introduced in SQL Server 2012), FIFO inventory calculations, financial ratios and more, without having to use triangular joins. To get started you can download the XLeratorDB 15-day free trial from the Westclintech web site. It is a fully-functioning, unrestricted version of the software. If you need more than 15 days to evaluate the software, you can simply download another 15-day free trial. XLeratorDB is an easy and cost-effective way to start adding sophisticated data analysis to your SQL Server database without having to know anything more than T-SQL. Get XLeratorDB Today and Now! Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Excel

Read the article
SQL SERVER – SQL in Sixty Seconds – 5 Videos from Joes 2 Pros Series – SQL Exam Prep Series 70-433

- by pinaldave

Joes 2 Pros SQL Server Learning series is indeed fun. Joes 2 Pros series is written for beginners and who wants to build expertise for SQL Server programming and development from fundamental. In the beginning of the series author Rick Morelan is not shy to explain the simplest concept of how to open SQL Server Management Studio. Honestly the book starts with that much basic but as it progresses further Rick discussing about various advanced concepts from query tuning to Core Architecture. This five part series is written with keeping SQL Server Exam 70-433. Instead of just focusing on what will be there in exam, this series is focusing on learning the important concepts thoroughly. This book no way take short cut to explain any concepts and at times, will go beyond the topic at length. The best part is that all the books has many companion videos explaining the concepts and videos. Every Wednesday I like to post a video which explains something in quick few seconds. Today we will go over five videos which I posted in my earlier posts related to Joes 2 Pros series. Introduction to XML Data Type Methods – SQL in Sixty Seconds #015 The XML data type was first introduced with SQL Server 2005. This data type continues with SQL Server 2008 where expanded XML features are available, most notably is the power of the XQuery language to analyze and query the values contained in your XML instance. There are five XML data type methods available in SQL Server 2008: query() – Used to extract XML fragments from an XML data type. value() – Used to extract a single value from an XML document. exist() – Used to determine if a specified node exists. Returns 1 if yes and 0 if no. modify() – Updates XML data in an XML data type. node() – Shreds XML data into multiple rows (not covered in this blog post). [Detailed Blog Post] | [Quiz with Answer] Introduction to SQL Error Actions – SQL in Sixty Seconds #014 Most people believe that when SQL Server encounters an error severity level 11 or higher the remaining SQL statements will not get executed. In addition, people also believe that if any error severity level of 11 or higher is hit inside an explicit transaction, then the whole statement will fail as a unit. While both of these beliefs are true 99% of the time, they are not true in all cases. It is these outlying cases that frequently cause unexpected results in your SQL code. To understand how to achieve consistent results you need to know the four ways SQL Error Actions can react to error severity levels 11-16: Statement Termination – The statement with the procedure fails but the code keeps on running to the next statement. Transactions are not affected. Scope Abortion – The current procedure, function or batch is aborted and the next calling scope keeps running. That is, if Stored Procedure A calls B and C, and B fails, then nothing in B runs but A continues to call C. @@Error is set but the procedure does not have a return value. Batch Termination – The entire client call is terminated. XACT_ABORT – (ON = The entire client call is terminated.) or (OFF = SQL Server will choose how to handle all errors.) [Detailed Blog Post] | [Quiz with Answer] Introduction to Basics of a Query Hint – SQL in Sixty Seconds #013 Query hints specify that the indicated hints should be used throughout the query. Query hints affect all operators in the statement and are implemented using the OPTION clause. Cautionary Note: Because the SQL Server Query Optimizer typically selects the best execution plan for a query, it is highly recommended that hints be used as a last resort for experienced developers and database administrators to achieve the desired results. [Detailed Blog Post] | [Quiz with Answer] Introduction to Hierarchical Query – SQL in Sixty Seconds #012 A CTE can be thought of as a temporary result set and are similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. A CTE is generally considered to be more readable than a derived table and does not require the extra effort of declaring a Temp Table while providing the same benefits to the user. However; a CTE is more powerful than a derived table as it can also be self-referencing, or even referenced multiple times in the same query. A recursive CTE requires four elements in order to work properly: Anchor query (runs once and the results ‘seed’ the Recursive query) Recursive query (runs multiple times and is the criteria for the remaining results) UNION ALL statement to bind the Anchor and Recursive queries together. INNER JOIN statement to bind the Recursive query to the results of the CTE. [Detailed Blog Post] | [Quiz with Answer] Introduction to SQL Server Security – SQL in Sixty Seconds #011 Let’s get some basic definitions down first. Take the workplace example where “Tom” needs “Read” access to the “Financial Folder”. What are the Securable, Principal, and Permissions from that last sentence? A Securable is a resource that someone might want to access (like the Financial Folder). A Principal is anything that might want to gain access to the securable (like Tom). A Permission is the level of access a principal has to a securable (like Read). [Detailed Blog Post] | [Quiz with Answer] Please leave a comment explain which one was your favorite video as that will help me understand what works and what needs improvement. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology, Video

Read the article
Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words - Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – ?Finding Out What Changed in a Deleted Database – Notes from the Field #041

- by Pinal Dave

[Note from Pinal]: This is a 41th episode of Notes from the Field series. The real world is full of challenges. When we are reading theory or book, we sometimes do not realize how real world reacts works and that is why we have the series notes from the field, which is extremely popular with developers and DBA. Let us talk about interesting problem of how to figure out what has changed in the DELETED database. Well, you think I am just throwing the words but in reality this kind of problems are making our DBA’s life interesting and in this blog post we have amazing story from Brian Kelley about the same subject. In this episode of the Notes from the Field series database expert Brian Kelley explains a how to find out what has changed in deleted database. Read the experience of Brian in his own words. Sometimes, one of the hardest questions to answer is, “What changed?” A similar question is, “Did anything change other than what we expected to change?” The First Place to Check – Schema Changes History Report: Pinal has recently written on the Schema Changes History report and its requirement for the Default Trace to be enabled. This is always the first place I look when I am trying to answer these questions. There are a couple of obvious limitations with the Schema Changes History report. First, while it reports what changed, when it changed, and who changed it, other than the base DDL operation (CREATE, ALTER, DELETE), it does not present what the changes actually were. This is not something covered by the default trace. Second, the default trace has a fixed size. When it hits that size, the changes begin to overwrite. As a result, if you wait too long, especially on a busy database server, you may find your changes rolled off. But the Database Has Been Deleted! Pinal cited another issue, and that’s the inability to run the Schema Changes History report if the database has been dropped. Thankfully, all is not lost. One thing to remember is that the Schema Changes History report is ultimately driven by the Default Trace. As you may have guess, it’s a trace, like any other database trace. And the Default Trace does write to disk. The trace files are written to the defined LOG directory for that SQL Server instance and have a prefix of log_: Therefore, you can read the trace files like any other. Tip: Copy the files to a working directory. Otherwise, you may occasionally receive a file in use error. With the Default Trace files, if you ask the question early enough, you can see the information for a deleted database just the same as any other database. Testing with a Deleted Database: Here’s a short script that will create a database, create a schema, create an object, and then drop the database. Without the database, you can’t do a standard Schema Changes History report. CREATE DATABASE DeleteMe; GO USE DeleteMe; GO CREATE SCHEMA Test AUTHORIZATION dbo; GO CREATE TABLE Test.Foo (FooID INT); GO USE MASTER; GO DROP DATABASE DeleteMe; GO This sets up the perfect situation where we can’t retrieve the information using the Schema Changes History report but where it’s still available. Finding the Information: I’ve sorted the columns so I can see the Event Subclass, the Start Time, the Database Name, the Object Name, and the Object Type at the front, but otherwise, I’m just looking at the trace files using SQL Profiler. As you can see, the information is definitely there: Therefore, even in the case of a dropped/deleted database, you can still determine who did what and when. You can even determine who dropped the database (loginame is captured). The key is to get the default trace files in a timely manner in order to extract the information. If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Security, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQLAuthority News – #SQLPASS 2012 Seattle Update – Memorylane 2009, 2010, 2011

- by pinaldave

Today is the first day of the SQLPASS 2012 and I will be soon posting SQL Server 2012 experience over here. Today when I landed in Seattle, I got the nostalgia feeling. I used to stay in the USA. I stayed here for more than 7 years – I studied here and I worked in USA. I had lots of friends in Seattle when I used to stay in the USA. I always wanted to visit Seattle because it is THE place. I remember once I purchased a ticket to travel to Seattle through Priceline (well it was the cheapest option and I was a student) but could not fly because of an interesting issue. I used to be Teaching Assistant of an advanced course and the professor asked me to build a pop-quiz for the course. I unfortunately had to cancel the trip. Before I returned to India – I pretty much covered every city existed in my list to must visit, except one – Seattle. It was so interesting that I never made it to Seattle even though I wanted to visit, when I was in USA. After that one time I never got a chance to travel to Seattle. After a few years I also returned to India for good. Once on Television I saw “Sleepless in Seattle” movie playing and I immediately changed the channel as it reminded me that I never made it to Seattle before. However, destiny has its own way to handle decisions. After I returned to India – I visited Seattle total of 5 times and this is my 6th visit to Seattle in less than 3 years. I was here for 3 previous SQLPASS events – 2009, 2010, and 2011 as well two Microsoft Most Valuable Professional Summit in 2009 and 2010. During these five trips I tried to catch up with all of my all friends but I realize that time has its own way of doing things. Many moved out of Seattle and many were too busy revive the old friendship but there were few who always make a point to meet me when I travel to the city. During the course of my visits I have made few fantastic new friends – Rick Morelan (Joes 2 Pros) and Greg Lynch. Every time I meet them I feel that I know them for years. I think city of Seattle has played very important part in our relationship that I got these fantastic friends. SQLPASS is the event where I find all of my SQL Friends and I look for this event for an entire year. This year’s my goal is to meet as many as new friends I can meet. If you are going to be at SQLPASS – FIND ME. I want to have a photo with you. I want to remember each name as I believe this is very important part of our life – making new friends and sustaining new friendship. Here are few of the pointers where you can find me. All Keynotes – Blogger’s Table Exhibition Booth Joes 2 Pros Booth #117 – Do not forget to stop by at the booth – I might have goodies for you – limited editions. Book Signing Events – Check details in tomorrow’s blog or stop by Booth #117 Evening Parties 6th Nov – Welcome Reception Evening Parties 7th Nov - Exhibitor Reception – Do not miss Booth #117 Evening Parties 8th Nov - Community Appreciation Party Additionally at few other locations – Embarcadero Booth In Coffee shops in Convention Center If you are SQLPASS – make sure that I find an opportunity to meet you at the event. Reserve a little time and lets have a coffee together. I will be continuously tweeting about my where about on twitter so let us stay connected on twitter. Here is my experience of my earlier experience of attending SQLPASS. SQLAuthority News – Book Signing Event – SQLPASS 2011 Event Log SQLAuthority News – Meeting SQL Friends – SQLPASS 2011 Event Log SQLAuthority News – Story of Seattle – SQLPASS 2011 Event Log SQLAuthority News – SQLPASS Nov 8-11, 2010-Seattle – An Alternative Look at Experience SQLAuthority News – Notes of Excellent Experience at SQL PASS 2009 Summit, Seattle Let us meet! Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL PASS, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority News, T SQL, Technology

Read the article
SQL SERVER – Beginning New Weekly Series – Memory Lane – #002

- by pinaldave

Here is the list of curetted articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2006 Query to Find ByteSize of All the Tables in Database This was my second blog post and today I do not remember what was the business need which has made me build this query. It was built for SQL Server 2000 and it will not directly run on SQL Server 2005 or later version now. It measured the byte size of the tables in the database. This can be done in many different ways as well for example SP_HELPDB as well SP_HELP. I wish to build similar script in 2005 and later version. 2007 This week I had completed my – 1 Year (365 blogs) and very first 1 Million Views. I was pretty excited at that time with this new achievement. SQL SERVER Versions, CodeNames, Year of Release When I started with SQL Server I did not know all the names correctly for each version and I often used to get confused with this. However, as time passed by I started to remember all the codename as well. In this blog post I have not included SQL Server 2012′s code name as it was not released at the time. SQL Server 2012′s code name is Denali. Here is the question for you – anyone know what is the internal name of the SQL Server’s next version? Searching String in Stored Procedure I have already started to work with 2005 by this time and I was personally converting each of my stored procedures to SQL Server 2005 compatible. As we were upgrading from SQL Server 2000 to SQL Server 2005 we had to search each of the stored procedures and make sure that we remove incompatible code from it. For example, syscolumns of SQL Server 2000 was now being replaced by sys.columns of SQL Server 2005. This stored procedure was pretty helpful at that time. Later on I build few additional versions of the same stored procedure. Version 1: This version finds the Stored Procedures related to Table Version 2: This is specific version which works with SQL Server 2005 and later version 2008 Clear Drop Down List of Recent Connection From SQL Server Management Studio It happens to all of us when we connected to some remote client server and we never ever have to connect to it again. However, it keeps on bothering us that the name shows up in the list all the time. In this blog post I covered a quick tip about how we can remove the same. I also wrote a small article about How to Check Database Integrity for all Databases and there was a funny question from a reader requesting T-SQL code to refresh databases. 2009 Stored Procedure are Compiled on First Run – SP is taking Longer to Run First Time A myth is quite prevailing in the industry that Stored Procedures are pre-compiled and they should always run faster. It is not true. Stored procedures are compiled on very first execution of it and that is the reason why it takes longer when it executes first time. In this blog post I had a great time discussing the same concept. If you do not agree with it, you are welcome to read this blog post. Removing Key Lookup – Seek Predicate – Predicate – An Interesting Observation Related to Datatypes Performance Tuning is an interesting concept and my personal favorite one. In many blog posts I have described how to do performance tuning and how to improve the performance of the queries. In this quick quick tip I have explained how one can remove the Key Lookup and improve performance. Here are very relevant articles on this subject: Article 1 | Article 2 | Article 3 2010 Recycle Error Log – Create New Log file without a Server Restart During one of the consulting assignments I noticed DBA restarting server to create new log file. This is absolutely not necessary and restarting server might have many other negative impacts. There is a common sp_cycle_errorlog which can do the same task efficiently and properly. Have you ever used this SP or feature? Additionally I had a great time presenting on SQL Server Best Practices in SharePoint Conference. 2011 SSMS 2012 Reset Keyboard Shortcuts to Default It is very much possible that we mix up various SQL Server shortcuts and at times we feel like resetting it to default. In SQL Server 2012 it is not easy to do it, there is a process to follow and I enjoyed blogging about it. Fundamentals of Columnstore Index Columnstore index is introduced in SQL Server 2012 and have been a very popular subject. It increases the speed of the server dramatically as well can be an extremely useful feature with Datawharehousing. However updating the columnstore index is not as simple as a simple UPDATE statement. Read in a detailed blog post about how Update works with Columnstore Index. Additionally, you can watch a Quick Video on this subject. SQL Server 2012 New Features I had decided to explore SQL Server 2012 features last year and went through pretty much every single concept introduced in separate blog posts. Here are two blog posts where I describe how SQL Server 2012 functions works. Introduction to CUME_DIST – Analytic Functions Introduction to FIRST _VALUE and LAST_VALUE – Analytic Functions OVER clause with FIRST_VALUE and LAST_VALUE – Analytic Functions I indeed enjoyed writing about SQL Server 2012 functions last year. Have you gone through all the new features which are introduced in SQL Server 2012? If not, it is still not late to go through them. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Big Data – Buzz Words: What is Hadoop – Day 6 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is NoSQL. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – Hadoop. What is Hadoop? Apache Hadoop is an open-source, free and Java based software framework offers a powerful distributed platform to store and manage Big Data. It is licensed under an Apache V2 license. It runs applications on large clusters of commodity hardware and it processes thousands of terabytes of data on thousands of the nodes. Hadoop is inspired from Google’s MapReduce and Google File System (GFS) papers. The major advantage of Hadoop framework is that it provides reliability and high availability. What are the core components of Hadoop? There are two major components of the Hadoop framework and both fo them does two of the important task for it. Hadoop MapReduce is the method to split a larger data problem into smaller chunk and distribute it to many different commodity servers. Each server have their own set of resources and they have processed them locally. Once the commodity server has processed the data they send it back collectively to main server. This is effectively a process where we process large data effectively and efficiently. (We will understand this in tomorrow’s blog post). Hadoop Distributed File System (HDFS) is a virtual file system. There is a big difference between any other file system and Hadoop. When we move a file on HDFS, it is automatically split into many small pieces. These small chunks of the file are replicated and stored on other servers (usually 3) for the fault tolerance or high availability. (We will understand this in the day after tomorrow’s blog post). Besides above two core components Hadoop project also contains following modules as well. Hadoop Common: Common utilities for the other Hadoop modules Hadoop Yarn: A framework for job scheduling and cluster resource management There are a few other projects (like Pig, Hive) related to above Hadoop as well which we will gradually explore in later blog posts. A Multi-node Hadoop Cluster Architecture Now let us quickly see the architecture of the a multi-node Hadoop cluster. A small Hadoop cluster includes a single master node and multiple worker or slave node. As discussed earlier, the entire cluster contains two layers. One of the layer of MapReduce Layer and another is of HDFC Layer. Each of these layer have its own relevant component. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. A slave or worker node consists of a DataNode and TaskTracker. It is also possible that slave node or worker node is only data or compute node. The matter of the fact that is the key feature of the Hadoop. In this introductory blog post we will stop here while describing the architecture of Hadoop. In a future blog post of this 31 day series we will explore various components of Hadoop Architecture in Detail. Why Use Hadoop? There are many advantages of using Hadoop. Let me quickly list them over here: Robust and Scalable – We can add new nodes as needed as well modify them. Affordable and Cost Effective – We do not need any special hardware for running Hadoop. We can just use commodity server. Adaptive and Flexible – Hadoop is built keeping in mind that it will handle structured and unstructured data. Highly Available and Fault Tolerant – When a node fails, the Hadoop framework automatically fails over to another node. Why Hadoop is named as Hadoop? In year 2005 Hadoop was created by Doug Cutting and Mike Cafarella while working at Yahoo. Doug Cutting named Hadoop after his son’s toy elephant. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – MapReduce. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Search Results

Search found 1456 results on 59 pages for 'authority'.

Page 29/59 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36 | Next Page >

- by pinaldave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by pinaldave

- by pinaldave

- by Pinal Dave

< Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36 | Next Page >