Search Results

Search found 3636 results on 146 pages for 'dave smith'.

Page 23/146 | < Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30  | Next Page >

  • Big Data – How to become a Data Scientist and Learn Data Science? – Day 19 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the analytics in Big Data Story. In this article we will understand how to become a Data Scientist for Big Data Story. Data Scientist is a new buzz word, everyone seems to be wanting to become Data Scientist. Let us go over a few key topics related to Data Scientist in this blog post. First of all we will understand what is a Data Scientist. In the new world of Big Data, I see pretty much everyone wants to become Data Scientist and there are lots of people I have already met who claims that they are Data Scientist. When I ask what is their role, I have got a wide variety of answers. What is Data Scientist? Data scientists are the experts who understand various aspects of the business and know how to strategies data to achieve the business goals. They should have a solid foundation of various data algorithms, modeling and statistics methodology. What do Data Scientists do? Data scientists understand the data very well. They just go beyond the regular data algorithms and builds interesting trends from available data. They innovate and resurrect the entire new meaning from the existing data. They are artists in disguise of computer analyst. They look at the data traditionally as well as explore various new ways to look at the data. Data Scientists do not wait to build their solutions from existing data. They think creatively, they think before the data has entered into the system. Data Scientists are visionary experts who understands the business needs and plan ahead of the time, this tremendously help to build solutions at rapid speed. Besides being data expert, the major quality of Data Scientists is “curiosity”. They always wonder about what more they can get from their existing data and how to get maximum out of future incoming data. Data Scientists do wonders with the data, which goes beyond the job descriptions of Data Analysist or Business Analysist. Skills Required for Data Scientists Here are few of the skills a Data Scientist must have. Expert level skills with statistical tools like SAS, Excel, R etc. Understanding Mathematical Models Hands-on with Visualization Tools like Tableau, PowerPivots, D3. j’s etc. Analytical skills to understand business needs Communication skills On the technology front any Data Scientists should know underlying technologies like (Hadoop, Cloudera) as well as their entire ecosystem (programming language, analysis and visualization tools etc.) . Remember that for becoming a successful Data Scientist one require have par excellent skills, just having a degree in a relevant education field will not suffice. Final Note Data Scientists is indeed very exciting job profile. As per research there are not enough Data Scientists in the world to handle the current data explosion. In near future Data is going to expand exponentially, and the need of the Data Scientists will increase along with it. It is indeed the job one should focus if you like data and science of statistics. Courtesy: emc Tomorrow In tomorrow’s blog post we will discuss about various Big Data Learning resources. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Developer Training – 6 Online Courses to Learn SQL Server, MySQL and Technology

    - by Pinal Dave
    Video courses are the next big thing and I am so happy that I have so far authored 6 different video courses with Pluralsight. Here is the list of the courses. I have listed all of my video courses over here. Note: If you click on the courses and it does not open, you need to login to Pluralsight with a valid username and password or sign up for a FREE trial. Please leave a comment with your favorite course in the comment section. Random 10 winners will get surprise gift via email. Bonus: If you list your favorite module from the course site. SQL Server Performance: Introduction to Query Tuning SQL Server performance tuning is an in-depth topic, and an art to master. A key component of overall application performance tuning is query tuning. Writing queries in an efficient manner, and making sure they execute in the most optimal way possible, is always a challenge. The basics revolve around the details of how SQL Server carries out query execution, so the optimizations explored in this course follow along the same lines. Click to View Course SQL Server Performance: Indexing Basics Indexes are the most crucial objects of the database. They are the first stop for any DBA and Developer when it is about performance tuning. There is a good side as well evil side of the indexes. To master the art of performance tuning one has to understand the fundamentals of the indexes and the best practices associated with the same. This course is for every DBA and Developer who deals with performance tuning and wants to use indexes to improve the performance of the server. Click to View Course SQL Server Questions and Answers This course is designed to help you better understand how to use SQL Server effectively. The course presents many of the common misconceptions about SQL Server, and then carefully debunks those misconceptions with clear explanations and short but compelling demos, showing you how SQL Server really works. This course is for anyone working with SQL Server databases who wants to improve her knowledge and understanding of this complex platform. Click to View Course MySQL Fundamentals MySQL is a popular choice of database for use in web applications, and is a central component of the widely used LAMP open source web application software stack. This course covers the fundamentals of MySQL, including how to install MySQL as well as written basic data retrieval and data modification queries. Click to View Course Building a Successful Blog Expressing yourself is the most common behavior of humans. Blogging has made easy to express yourself. Just like a letter or book has a structure and formula, blogging also has structure and formula. In this introductory course on blogging we will go over a few of the basics of blogging and show the way to get started with blogging immediately. If you already have a blog, this course will be even more relevant as this will discuss many of the common questions and issue you face in your blogging routine. Click to View Course Introduction to ColdFusion ColdFusion is rapid web application development platform. In this course you will learn the basics of how to use ColdFusion platform and rapidly develop web sites. The course begins with learning basics of ColdFusion Markup Language and moves to common development language practices. From there we move to frequent database operations and advanced concepts of Forms, Sessions and Cookies. The last module sums up all the concepts covered in the course with sample application. Click to View Course Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, T SQL, Technology

    Read the article

  • SQL SERVER – Puzzle #1 – Querying Pattern Ranges and Wild Cards

    - by Pinal Dave
    Note: Read at the end of the blog post how you can get five Joes 2 Pros Book #1 and a surprise gift. I have been blogging for almost 7 years and every other day I receive questions about Querying Pattern Ranges. The most common way to solve the problem is to use Wild Cards. However, not everyone knows how to use wild card properly. SQL Queries 2012 Joes 2 Pros Volume 1 – The SQL Queries 2012 Hands-On Tutorial for Beginners Book On Amazon | Book On Flipkart Learn SQL Server get all the five parts combo kit Kit on Amazon | Kit on Flipkart Many people know wildcards are great for finding patterns in character data. There are also some special sequences with wildcards that can give you even more power. This series from SQL Queries 2012 Joes 2 Pros® Volume 1 will show you some of these cool tricks. All supporting files are available with a free download from the www.Joes2Pros.com web site. This example is from the SQL 2012 series Volume 1 in the file SQLQueries2012Vol1Chapter2.2Setup.sql. If you need help setting up then look in the “Free Videos” section on Joes2Pros under “Getting Started” called “How to install your labs” Querying Pattern Ranges The % wildcard character represents any number of characters of any length. Let’s find all first names that end in the letter ‘A’. By using the percentage ‘%’ sign with the letter ‘A’, we achieve this goal using the code sample below: SELECT * FROM Employee WHERE FirstName LIKE '%A' To find all FirstName values beginning with the letters ‘A’ or ‘B’ we can use two predicates in our WHERE clause, by separating them with the OR statement. Finding names beginning with an ‘A’ or ‘B’ is easy and this works fine until we want a larger range of letters as in the example below for ‘A’ thru ‘K’: SELECT * FROM Employee WHERE FirstName LIKE 'A%' OR FirstName LIKE 'B%' OR FirstName LIKE 'C%' OR FirstName LIKE 'D%' OR FirstName LIKE 'E%' OR FirstName LIKE 'F%' OR FirstName LIKE 'G%' OR FirstName LIKE 'H%' OR FirstName LIKE 'I%' OR FirstName LIKE 'J%' OR FirstName LIKE 'K%' The previous query does find FirstName values beginning with the letters ‘A’ thru ‘K’. However, when a query requires a large range of letters, the LIKE operator has an even better option. Since the first letter of the FirstName field can be ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’ or ‘K’, simply list all these choices inside a set of square brackets followed by the ‘%’ wildcard, as in the example below: SELECT * FROM Employee WHERE FirstName LIKE '[ABCDEFGHIJK]%' A more elegant example of this technique recognizes that all these letters are in a continuous range, so we really only need to list the first and last letter of the range inside the square brackets, followed by the ‘%’ wildcard allowing for any number of characters after the first letter in the range. Note: A predicate that uses a range will not work with the ‘=’ operator (equals sign). It will neither raise an error, nor produce a result set. --Bad query (will not error or return any records) SELECT * FROM Employee WHERE FirstName = '[A-K]%' Question: You want to find all first names that start with the letters A-M in your Customer table and end with the letter Z. Which SQL code would you use? a. SELECT * FROM Customer WHERE FirstName LIKE 'm%z' b. SELECT * FROM Customer WHERE FirstName LIKE 'a-m%z' c. SELECT * FROM Customer WHERE FirstName LIKE 'a-m%z' d. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]%z' e. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]z%' f. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]%z' g. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]z%' Contest Leave a valid answer before June 18, 2013 in the comment section. 5 winners will be selected from all the valid answers and will receive Joes 2 Pros Book #1. 1 Lucky person will get a surprise gift from Joes 2 Pros. The contest is open for all the countries where Amazon ships the book (USA, UK, Canada, India and many others). Special Note: Read all the options before you provide valid answer as there is a small trick hidden in answers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases  Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases  We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQLAuthority News – Advantages of Distance Learning

    - by Pinal Dave
    Distance education is extremely popular – almost overnight, it seems.  Almost everyone has taken an online course, or knows someone who has, or is considering joining an online school.  There are many advantages and disadvantages to attending an online school – but the same can be said of attending a physical school!  Let’s take a look at the top reasons to use distance education. 1) Flexibility.  Physical universities are usually willing to make some concessions to student – like night classes, study hours, and online networks.  However, nothing is going to beat the flexibility of distance education.  You can attend classes and take notes anytime, anywhere, wearing anything you’d like! 2) Affordability.  We don’t need to get into hard numbers to understand how an expensive university can be.  Students are taking on more and more debt just to get an education.  Many of these fees pay for room, board, and facilities.   Distance education cuts out all these costs, and makes attending school much more affordable for the average student. 3) Try before you buy.  Did you know that the average college student changes his or her major 10 times before they graduate?  You can imagine that this kind of indecision plays a huge part in WHEN you graduate – not being able to make up your mind can cost you big bucks if you have to stay in school for extra years!  Distance education allows you to take different classes from a wide range of disciplines.  Do you want to study forensic science or English literature?  Now you don’t have to pay for classes you can’t afford just to find out. 4) Pace yourself.  Some students struggle in a traditional classroom setting – classes can be taught too fast, too slow, or there are too many distractions.  Distance education allows mature students to set the pace themselves.  They can rewatch lectures they didn’t catch the first time, or go through classes quickly if they are already familiar with the material – cutting out the chance of burning out or getting bored. 5) Lifelong learning.  Maybe you already have a degree, but would like to learn more about your field, or a related field, or maybe even about something completely unrelated – just because you are curious!  Distance education allows you to learn whatever you want ,whenever you want (and yes, wearing anything you’d like!). 6) Attend whatever college you want.  Because of the popularity of distance education, physical campuses are getting in on the game by offering online courses – often just uploaded versions of classes already taught at their campus.  Ever wanted to attend Harvard, but knew you couldn’t get in?  Take a class online!  Of course, you probably should not attempt to lie and say you have a Harvard degree, but Ivy League colleges are prestigious because they are the best in their field – take advantage of the best by taking an online course! I am a big believer in continuing education, whether it is online courses, returning to school, or even take informal classes online.  Distance education can be a great way to accomplish these goals and become a lifelong learner. My friends at provides training through virtual classrooms for students who want to avoid travelling. Distance learning course allows IT aspirants to connect with trainers using the internet.  I encourage everyone to check it out! Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, T SQL, Technology

    Read the article

  • SQL Authority News – Secret Tool Box of Successful Bloggers: 52 Tips to Build a High Traffic Top Ranking Blog

    - by Pinal Dave
    When I started this blog, it was meant as a bookmark for myself for helpful tips and tricks.  Gradually, it grew into a blog that others were reading and commenting on.  While SQL and databases are my first love and the reason I started this blog, the side effect was that I discovered I loved writing.  I discovered a secret goal I didn’t even know I wanted – I wanted to become an author.  For a long time, writing this blog satisfied that urge.  Gradually, though, I wanted to see my name in print. 12th Book Over the past few years I have authored and co-authored a number of books – they are all based on my knowledge of SQL Server, and were meant to spread my years of experience into the world, to share what I have learned with my community.  I currently have elevan of these “manuals” available for sale.  As exciting as it was to see my name in print, I still felt that there was more I could do as an author. That is when I realized that I am more than just a SQL expert.  I have been writing this blog now for more than 10 years, and it grew from a personal bookmark to a thriving website with over 2 million views per month.  I thought to myself “I could write a book about how to create a successful blog!”  And that is exactly what I did.  I am extremely excited to share with all of you my new book – “Secret Toolbox of Successful Bloggers.” A Labor of Love This project has been a labor of love for me.  It started out as a series for this blog – I would post one article a week until I felt the topic had been covered.  I found that as I wrote, new topics kept popping up in my mind, and eventually this small blog series grew into a full book.  The blog series was large enough to last a whole year, so I definitely thought that it could be a full book.  Ideas on how to become a successful blogger were so frequent that, I will admit, I feel like there is so much I left out of this book.  I had a lot more to say than I originally thought! I am so excited to be sharing this book with all of you.  I am so passionate about this topic, and I feel like there are so many people who can benefit from this book.  I know that when I started this blog, I did not know what I was doing, and I would have loved a “helping hand” to tell what to do and what not to do.  If this book can act that way to any of my readers, I feel it is a success. Rules of Thumb If you are interested in the topic of becoming a blogger, as you read this book, keep in mind that it is suggestions only.  Blogging is so new to the world that while there are “rules of thumb” about what to do and what not to do, a map of steps (“first, do x, then do y”) is not going to work for every single blogger.  This book is meant to encourage new bloggers to put their content out there in the world, to be brave and create a community like the one I have here at SQL Authority.  I have gained so much from this community, I wanted to give something back, and this book is just one small part. I hope that everyone who reads this books finds at least one helpful tip, and that everyone can experience the joy of blogging.  That is the whole reason I wrote this book, and what I hope everyone takes away from it. Where Can You Get It? You can get the book from following URL: Kindle eBook | Print Book Reference: Pinal Dave (http://blog.SQLAuthority.com)Filed under: About Me, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, T SQL

    Read the article

  • SQL Authority News – FalafelCON 2014: 2 days with the Best Developers in the World

    - by Pinal Dave
    I love presenting at various forums on various technologies. I am extremely excited that I got invited to speak at Falafel Conference 2014 in San Francisco. I will present two technology sessions on SQL Server. If you are into web development or if you just want to attend a conference with the best of the industry speakers, this may be the right conference for you. What set apart this conference from other conference is technology presented as well as speakers. Usually one has to attend very expensive and high scale event when they have to hear good speakers. At this conference, you will find quite a many industry legends are available to present on the bleeding edge technology. Here are few of the reasons why I believe you should attend this conference: Choose from four tracks covering Web, Mobile development and testing, Sitefinity, and Automated Testing, or attend sessions from all four! Learn from the best developers and testers in the business in an intimate setting. Surround yourself with your peers and the opportunity to network Learn about the latest platforms and technologies including Kendo UI, AngularJS, ASP.NET MVC, WebAPI, and more! Here are the details for the sessions which I am going to present at Falafel Conference. Secrets of SQL Server: Database Worst Practices Abstract: Chances are you have heard, or even uttered, this expression. This demo-oriented session will show many examples where database professionals were dumbfounded by their own mistakes, and could even bring back memories of your own early DBA days. The goal of this session is to expose the small details that can be dangerous to the production environment and SQL Server as a whole, as well as talk about worst practices and how to avoid them. Shedding light on some of these perils and the tricks to avoid them may even save your current job. After attending this session, Developers will only need 60 seconds to improve performance of their database server in their SharePoint implementation. We will have a quiz during the session to keep the conversation alive. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. Additionally, all attendees of the session will have access to learning material presented in the session. The Unsung Hero Abstract: Slow Running Queries are the most common problem that developers face while working with SQL Server. While it is easy to blame the SQL Server for unsatisfactory performance, however the issue often persists with the way queries have been written, and how Indexes has been set up. The session will focus on the ways of identifying problems that slow down SQL Server, and Indexing tricks to fix them. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. Register Now! I have learned from the Falafel Team that they are running out of tickets and soon they will close the registration.  For next 10 days the price for the registration is only USD 149. Trust me, you can’t get such a world class training and networking opportunity at such a low price. Click to Register Here! Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority News, T SQL

    Read the article

  • SQL SERVER – How to Get SQL Server Restart Notification?

    - by Pinal Dave
    Few days back my friend called me to know if there is any tool which can be used to get restart notification about SQL in their environment. I told that SQL Server can do it by itself with some configurations. He was happy and surprised to know that he need not spend any extra money. In SQL Server, we can configure stored procedure(s) to run at start-up of SQL Server. This blog would give steps to achieve how to achieve it. There are many situations where this feature can be used. Below are few. Logging SQL Server startup timings Modify data in some table during startup (i.e. table in tempdb) Sending notification about SQL start. Step 1 – Enable ‘scan for startup procs’ This can be done either using T-SQL or User Interface of Management Studio. EXEC sys.sp_configure N'Show Advanced Options', N'1' GO RECONFIGURE WITH OVERRIDE GO EXEC sys.sp_configure N'scan for startup procs', N'1' GO RECONFIGURE WITH OVERRIDE GO Below is the interface to change the setting. We need to go to “Server” > “Properties” and use “Advanced” tab. “Scan for Startup Procs” is the parameter under “Miscellaneous” section as shown below. We need to make value as “True” and hit OK. Step 2 – Create stored procedure It’s important to note that the procedure is executed after recovery is finished for ALL databases. Here is a sample stored procedure. You can use your own logic in the procedure. CREATE PROCEDURE SQLStartupProc AS BEGIN CREATE TABLE ##ThisTableShouldAlwaysExists (AnyColumn INT) END Step 3 – Set Procedure to run at startup We need to use sp_procoption to mark the procedure to run at startup. Here is the code to let SQL know that this is startup proc. sp_procoption 'SQLStartupProc', 'startup', 'true' This can be used only for procedures in master database. Msg 15398, Level 11, State 1, Procedure sp_procoption, Line 89 Only objects in the master database owned by dbo can have the startup setting changed. We also need to remember that such procedure should not have any input/output parameter. Here is the error which would be raised. Msg 15399, Level 11, State 1, Procedure sp_procoption, Line 107 Could not change startup option because this option is restricted to objects that have no parameters. Verification Here is the query to find which procedures is marked as startup procedures. SELECT name FROM sys.objects WHERE OBJECTPROPERTY(OBJECT_ID, 'ExecIsStartup') = 1 Once this is done, I have restarted SQL instance and here is what we would see in SQL ERRORLOG Launched startup procedure 'SQLStartupProc'. This confirms that stored procedure is executed. You can also notice that this is done after all databases are recovered. Recovery is complete. This is an informational message only. No user action is required. After few days my friend again called me and asked – I want to turn this OFF? Use comments section and post the answer for him.  Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL

    Read the article

  • Big Data – ClustrixDB – Extreme Scale SQL Database with Real-time Analytics, Releases Software Download – NewSQL

    - by Pinal Dave
    There are so many things to learn and there is so little time we all have. As we have little time we need to be selective to learn whatever we learn. I believe I know quite a lot of things in SQL but I still do not know what is around SQL. I have started to learn about NewSQL recently. If you wonder what is NewSQL I encourage all of you to read my blog post about NewSQL over here Big Data – Buzz Words: What is NewSQL – Day 10 of 21. NewSQL databases are quickly becoming popular – providing the scale of NoSQL with the SQL features and transactions. As a part of learning NewSQL database, I have recently started to learn about ClustrixDB. ClustrixDB has been the most mature NewSQL database used by some of the largest internet sites in the world for over 3 years, with extensive SQL support. In addition to scale, it provides fast real-time analytics by bringing massively parallel processing (MPP), available only in warehousing databases, to the transactional database. The reason I am more intrigued about learning ClustrixDB is their recent announcement on Oct 31. ClustrixDB was only available as an appliance, but now with their software release on Oct 31, everyone can use it. It is now available as forever free for up to 12 cores with community support, and there is a 45 day trial for unlimited cluster sizes. With the forever free world, I am indeed interested in ClustrixDB now. I know that few of the leading eCommerce sites in the world uses them for their transactional database. Here are few of the details I have quickly noted for ClustrixDB. ClustrixDB allows user to: Scale by simply adding nodes to the cluster with a single command Run billions of transactions a day Run fast real-time analytics Achieve high-availability with recovery from node failure Manages itself Easily migrate from MySQL as it is nearly plug-and-play compatible, use MySQL drivers, tools and replication. While I was going through the documentation I realized that ClustrixDB also has extensive support for SQL features including complex queries involving joins on a dozen or more tables, aggregates, sorts, sub-queries. It also supports stored procedures, triggers, foreign keys, partitioned and temporary tables, and fully online schema changes. It is indeed a very matured product and SQL solution. Indeed Clusterix sound very promising solution, I decided to dig a bit deeper to understand who are current customers of the Clustrix as they exist in the industry for quite a few years. Their client list is indeed very interesting and here is my quick research about them. Twoo.com – Europe’s largest social discovery (dating) site runs 4.4 Billion Transactions a day with table sizes over a Terabyte, on a 168 core cluster. EngageBDR – Top 3 in the online advertising category uses ClustrixDB to serve 6.9 billion ads a day through real-time bidding platform. Their reports went from 4 hours to 15 seconds. NoMoreRack – Top 2 fastest growing e-commerce company in US used ClustrixDB for high availability and fast growth through Amazon cloud. MakeMyTrip – India’s leading travel site runs on ClustrixDB with two clusters running as multi-master in Chennai and Bangalore. Many enterprises such as AOL, CSC, Rakuten, Symantec use ClustrixDB when their applications need scale. I must accept that I am impressed with the information I have learned so far and now is the time to do some hand’s on experience with their product. I want to learn this technology so in future when it is about NewSQL, I know what I am talking about. Read more why Clustrix explains why you ClustrixDB might be the right database for you. Download ClustrixDB with me today and install it on your machine so in future when we discuss the technical aspects of it, we all are on the same page. The software can be downloaded here. Reference : Pinal Dave (http://blog.SQLAuthority.com)Filed under: Big Data, MySQL, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Clustrix

    Read the article

  • 256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

    - by Alan Smith
    For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour.  This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue   static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005>     define wooden texture { noise surface { ambient 0.2  diffuse 0.7  specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1  lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7  } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } }   viewpoint {     from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60     resolution 640, 480 aspect 1.6 image_format 0 }       light <-10, 30, 20> light <-10, 30, -20>   object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden }   object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome }   After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue }   In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000   As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72   Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: ·         On-Premise o   Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o   Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o   Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o   Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. ·         Windows Azure o   Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment.   The architecture of each worker role is shown below.   The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" >   <WorkerRole name="CloudRayWorkerRole" vmsize="Small">     <Imports>     </Imports>     <ConfigurationSettings>       <Setting name="DataConnectionString" />     </ConfigurationSettings>     <LocalResources>       <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" />     </LocalResources>   </WorkerRole> </ServiceDefinition>     The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1.       The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2.       PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3.       DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4.       The JPG file is uploaded from local storage to the images blob container. 5.       A message is placed on the images queue to indicate a new image is available for download. 6.       The job message is deleted from the job queue. 7.       The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() {     // Set environment variables     string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation);     string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation);       LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder");     string localStorageRootPath = rayStorage.RootPath;       JobQueue jobQueue = new JobQueue("renderjobs");     JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs");     CloudRayBlob sceneBlob = new CloudRayBlob("scenes");     CloudRayBlob imageBlob = new CloudRayBlob("images");     RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource();       Frames = 0;       while (true)     {         // Get the render job from the queue         CloudQueueMessage jobMsg = jobQueue.Get();           if (jobMsg != null)         {             // Get the file details             string sceneFile = jobMsg.AsString;             string tgaFile = sceneFile.Replace(".pi", ".tga");             string jpgFile = sceneFile.Replace(".pi", ".jpg");               string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile);             string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile);             string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile);               // Copy the scene file to local storage             sceneBlob.DownloadFile(sceneFilePath);               // Run the ray tracer.             string polyrayArguments =                 string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath);             Process polyRayProcess = new Process();             polyRayProcess.StartInfo.FileName =                 Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath);             polyRayProcess.StartInfo.Arguments = polyrayArguments;             polyRayProcess.Start();             polyRayProcess.WaitForExit();               // Convert the image             string dtaArguments =                 string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath));             Process dtaProcess = new Process();             dtaProcess.StartInfo.FileName =                 Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath);             dtaProcess.StartInfo.Arguments = dtaArguments;             dtaProcess.Start();             dtaProcess.WaitForExit();               // Upload the image to blob storage             imageBlob.UploadFile(jpgFilePath);               // Add a download job.             downloadQueue.Add(jpgFile);               // Delete the render job message             jobQueue.Delete(jobMsg);               Frames++;         }         else         {             Thread.Sleep(1000);         }           // Log the worker role activity.         roleLifecycleDataSource.Alive             ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames);     } }     Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage.   public class RoleLifecycle : TableServiceEntity {     public string ServerName { get; set; }     public string Status { get; set; }     public DateTime StartTime { get; set; }     public DateTime EndTime { get; set; }     public long SecondsRunning { get; set; }     public DateTime LastActiveTime { get; set; }     public int Frames { get; set; }     public string Comment { get; set; }       public RoleLifecycle()     {     }       public RoleLifecycle(string roleName)     {         PartitionKey = roleName;         RowKey = Utils.GetAscendingRowKey();         Status = "Started";         StartTime = DateTime.UtcNow;         LastActiveTime = StartTime;         EndTime = StartTime;         SecondsRunning = 0;         Frames = 0;     } }     A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*">   <Role name="CloudRayWorkerRole">     <Instances count="16" />     <ConfigurationSettings>       <Setting name="DataConnectionString"         value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." />     </ConfigurationSettings>   </Role> </ServiceConfiguration>     About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames.   Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation.   With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes.     At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances.   <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*">   <Role name="CloudRayWorkerRole">     <Instances count="256" />     <ConfigurationSettings>       <Setting name="DataConnectionString"         value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." />     </ConfigurationSettings>   </Role> </ServiceConfiguration>     Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames.   Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds.   We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes.   The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each.   Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles   The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. ·         Windows Azure can provide massive compute power, on demand, in a matter of minutes. ·         The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. ·         Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. ·         No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. ·         Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. ·         Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. ·         Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. ·         Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. ·         If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. ·         Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. ·         Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

    Read the article

  • Developer’s Life – Summary of Superhero Articles

    - by Pinal Dave
    Earlier this year, I wrote an article series where I talked about developer’s life and compared it with Superhero. I have got amazing response to this series and I have been receiving quite a lots of email suggesting that I should write more blog post about them. Currently I am not planning to write more blog post but I will soon continue another series. In this blog post, I have summarized the entire series. Let me know if you want me to write about any superhero. I will see what I can do about that hero. Developer’s Life – Every Developer is a Captain America Captain America was first created as a comic book character in the 1940’s as a way to boost morale during World War II.  Aimed at a children’s audience, his legacy faded away when the war ended.  However, he has recently has a major reboot to become a popular movie character that deals with modern issues. Developer’s Life – Every Developer is the Incredible Hulk The Incredible Hulk is possibly one of the scariest superheroes out there.  All superheroes are meant to be “out of this world” and awe-inspiring, but I think most people will agree with I say The Hulk takes this to the next level.  He is the result of an industrial accident, which is scary enough in it’s own right.  Plus, when mild-mannered Bruce Banner is angered, he goes completely out-of-control and transforms into a destructive monster that he cannot control and has no memories of. Developer’s Life – Every Developer is a Wonder Woman We have focused a lot lately on this “superhero series.”  I love fantasy books and movies, and I feel like there is a lot to be learned from them.  As I am writing this series, though, I have noticed that every super hero I write about is a man.  So today, I would like to talk about the major female super hero – Wonder Woman. Developer’s Life – Every Developer is a Harry Potter Harry Potter might not be a superhero in the traditional sense, but I believe he still has a lot to teach us and show us about life as a developer.  If you have been living under a rock for the last 17 years, you might not know that Harry Potter is the main character in an extremely popular series of books and movies documenting the education and tribulation of a young wizard (and his friends). Developer’s Life – Every Developer is Like Transformers Transformers may not be superheroes – they don’t wear capes, they don’t have amazing powers outside of their size and folding ability, they’re not even human (technically).  Part of their enduring popularity is that while we are enjoying over-the-top movies, we are learning about good leadership and strong personal skills. Developer’s Life – Every Developer is a Iron Man Iron Man is another superhero who is not naturally “super,” but relies on his brain (and money) to turn him into a fighting machine.  While traditional superheroes are still popular, a three-movie franchise and incorporation into the new Avengers series shows that Iron Man is popular enough on his own. Developer’s Life – Every Developer is a Sherlock Holmes I have been thinking a lot about how developers are like super heroes, and I have written two blog posts now comparing them to Spiderman and Superman.  I have a lot of love and respect for developers, and I hope that they are enjoying these articles, and others are learning a little bit about the profession.  There is another fictional character who, while not technically asuper hero, is very powerful, and I also think stands as a good example of a developer. That character is Sherlock Holmes.  Sherlock Holmes is a British detective, first made popular at the turn of the 19thcentury by author Sir Arthur Conan Doyle.  The original Sherlock Holmes was a brilliant detective who could solve the most mind-boggling crime through simple observations and deduction. Developer’s Life – Every Developer is a Chhota Bheem Chhota Bheem is a cartoon character that is extremely popular where I live.  He is my daughter’s favorite characters.  I like to say that children love Chhota Bheem more than their parents – it is lucky for us he is not real!  Children love Chhota Bheem because he is the absolute “good guy.”  He is smart, loyal, and strong.  He and his friends live in Dholakpur and fight off their many enemies – and always win – in every episode.  In each episode, they learn something about friendship, bravery, and being kind to others.  Chhota Bheem is a good role model for children, and I think that he is a good role model for developers are well. Developer’s Life – Every Developer is a Batman Batman is one of the darkest superheroes in the fantasy canon.  He does not come to his powers through any sort of magical coincidence or radioactive insect, but through a lot of psychological scarring caused by witnessing the death of his parents.  Despite his dark back story, he possesses a lot of admirable abilities that I feel bear comparison to developers. Developer’s Life – Every Developer is a Superman I enjoyed comparing developers to Spiderman so much, that I have decided to continue the trend and encourage some of my favorite people (developers) with another favorite superhero – Superman.  Superman is probably the most famous superhero – and one of the most inspiring. Developer’s Life – Every Developer is a Spiderman I have to admit, Spiderman is my favorite superhero.  The most recent movie recently was released in theaters, so it has been at the front of my mind for some time. Spiderman was my favorite superhero even before the latest movie came out, but of course I took my whole family to see the movie as soon as I could!  Every one of us loved it, including my daughter.  We all left the movie thinking how great it would be to be Spiderman.  So, with that in mind, I started thinking about how we are like Spiderman in our everyday lives, especially developers. I would like to know which Superhero is your favorite hero! Reference: Pinal Dave (http://blog.SQLAuthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Developer, Superhero

    Read the article

  • SQLAuthority News – Don’t Be Afraid To Fool The World – Video by John Sonmez

    - by Pinal Dave
    Sometime some words and statements grabs your attention and it is hard to stop thinking about that after a while. Something similar happened a few days ago when I read the twitter statement of my friend and Pluralsight author John Sonmez. He twitted few days ago very interesting statement. “I don’t know a single successful person, who doesn’t deep down think that have the world fooled. #fooltheworld” by John Sonmez. When I read it, I was extremely intrigued by this statement. I read it many times, I shared with my family and I just could not stop interpreting this statement. It was indeed fun to read it again and again and there are so many different meanings one can take away from the statement. I know John very well, he is a  wonderful person and have very positive energy for the life. I just had to request him to build a video around it. Right after 5 days of my request, John created a wonderful video around this subject. I watched it multiple times as it was a wonderful video. I am not going to write about what was in the video much as I suggest you to watch the video itself. Here is one of the personal stories I want to share which is absolutely relevant to this video. I think my story 100% resonant the story of John. A Real Story from My Past Three years ago, I submitted a session in one of the SharePoint conference as a SQL Server session. My session was accepted and I prepared it very well. I put more than 2 month’s time to prepare for the session and I was very excited to present the session. I reached to the event place traveling thousands of the miles and I was very much excited to present the session. However, there was a little mixed up in the session. There were multiple session which were similar to my session title. One of the other speakers also had proposed a database related session and was selected. When the material went to print the printing team got confused and by mistake swapped the sessions. The other speaker got Performance with SQL Server session and I had received Performance with SharePoint session. IT was indeed a big mixed up but now that is how it was in the event guide and it was marketed the same way everything in the event. A Big Mix Up I had to talk with the event organizer and we come to the conclusion that we all had good intention but things just got mixed up and now was the time when “The show must go on“. I had a great amount of hesitation to go and present the session as I had personally never worked with Sharepoint so close in my life and my session abstracted talked about SharePoint tricks in depth. Two hours before the session I took the help of one of my friend and installed the SharePoint on my box. He showed me a few things here and there but it was never a good enough time to learn everything which I wanted to learn. The Moments of Confidence I was very scared and nervous to go on the stage as a SharePoint was not something I felt comfortable. However, I decided to go on stage with confidence as a SharePoint expert. Though I did not know SharePoint at the best, I had confidence that whatever I know is correct and I will not misguide people. I had no intention to fool people but I had no intention to accept that I am a fool and you all wasted your time and money to dedicate your time to attend my session. I decided to be honest but at the same time decided to take the session beyond my expertise. The sixty minutes of the session went very fine and I was able to manage all the difficult question at a satisfactory level. When the session was over my feeling was that I would have not presented or talked any different if I had more knowledge of the SharePoint at that time. I think it was one of my best sessions and it was reflected in the session feedback as well. I was the best speaker across all the track and my session had highest ranking. I was delighted and I learned a very valuable lesson. I must go beyond my limits and knowledge. I must aim higher and work harder. I should not lie but I should have confidence that I have a good heart and I put 100% in my efforts.  Lessions Learned Since this incident I have learned a lot about SharePoint and I am now a regular speaker at various SharePoint conferences along with SQL Server sessions. I am motivated and I am not afraid. I know people have lots of expectation from me but I have learned not to judge myself before I do my best. I leave the judgement of my efforts to my audience. I do not take the burden of the feedback on me, even though I know my audience have expected from me. I know what I know and I put my best. I must go out, if I fail, I learn from my mistake but I must keep my progress trajectory very high. As John said in the video, sometime success is not something we can achieve 100% but we can keep on going near to it. As long as we do not lose our focus from our goal and do not deviate from our progress path, we are doing things right. Reference: Pinal Dave (http://blog.sqlauthority.com)  Filed under: About Me, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL SERVER – SSMS: Memory Usage By Memory Optimized Objects Report

    - by Pinal Dave
    At conferences and at speaking engagements at the local UG, there is one question that keeps on coming which I wish were never asked. The question around, “Why is SQL Server using up all the memory and not releasing even when idle?” Well, the answer can be long and with the release of SQL Server 2014, this got even more complicated. This release of SQL Server 2014 has the option of introducing In-Memory OLTP which is completely new concept and our dependency on memory has increased multifold. In reality, nothing much changes but we have memory optimized objects (Tables and Stored Procedures) additional which are residing completely in memory and improving performance. As a DBA, it is humanly impossible to get a hang of all the innovations and the new features introduced in the next version. So today’s blog is around the report added to SSMS which gives a high level view of this new feature addition. This reports is available only from SQL Server 2014 onwards because the feature was introduced in SQL Server 2014. Earlier versions of SQL Server Management Studio would not show the report in the list. If we try to launch the report on the database which is not having In-Memory File group defined, then we would see the message in report. To demonstrate, I have created new fresh database called MemoryOptimizedDB with no special file group. Here is the query used to identify whether a database has memory-optimized file group or not. SELECT TOP(1) 1 FROM sys.filegroups FG WHERE FG.[type] = 'FX' Once we add filegroup using below command, we would see different version of report. USE [master] GO ALTER DATABASE [MemoryOptimizedDB] ADD FILEGROUP [IMO_FG] CONTAINS MEMORY_OPTIMIZED_DATA GO The report is still empty because we have not defined any Memory Optimized table in the database.  Total allocated size is shown as 0 MB. Now, let’s add the folder location into the filegroup and also created few in-memory tables. We have used the nomenclature of IMO to denote “InMemory Optimized” objects. USE [master] GO ALTER DATABASE [MemoryOptimizedDB] ADD FILE ( NAME = N'MemoryOptimizedDB_IMO', FILENAME = N'E:\Program Files\Microsoft SQL Server\MSSQL12.SQL2014\MSSQL\DATA\MemoryOptimizedDB_IMO') TO FILEGROUP [IMO_FG] GO You may have to change the path based on your SQL Server configuration. Below is the script to create the table. USE MemoryOptimizedDB GO --Drop table if it already exists. IF OBJECT_ID('dbo.SQLAuthority','U') IS NOT NULL DROP TABLE dbo.SQLAuthority GO CREATE TABLE dbo.SQLAuthority ( ID INT IDENTITY NOT NULL, Name CHAR(500)  COLLATE Latin1_General_100_BIN2 NOT NULL DEFAULT 'Pinal', CONSTRAINT PK_SQLAuthority_ID PRIMARY KEY NONCLUSTERED (ID), INDEX hash_index_sample_memoryoptimizedtable_c2 HASH (Name) WITH (BUCKET_COUNT = 131072) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) GO As soon as above script is executed, table and index both are created. If we run the report again, we would see something like below. Notice that table memory is zero but index is using memory. This is due to the fact that hash index needs memory to manage the buckets created. So even if table is empty, index would consume memory. More about the internals of how In-Memory indexes and tables work will be reserved for future posts. Now, use below script to populate the table with 10000 rows INSERT INTO SQLAuthority VALUES (DEFAULT) GO 10000 Here is the same report after inserting 1000 rows into our InMemory table.    There are total three sections in the whole report. Total Memory consumed by In-Memory Objects Pie chart showing memory distribution based on type of consumer – table, index and system. Details of memory usage by each table. The information about all three is taken from one single DMV, sys.dm_db_xtp_table_memory_stats This DMV contains memory usage statistics for both user and system In-Memory tables. If we query the DMV and look at data, we can easily notice that the system tables have negative object IDs.  So, to look at user table memory usage, below is the over-simplified version of query. USE MemoryOptimizedDB GO SELECT OBJECT_NAME(OBJECT_ID), * FROM sys.dm_db_xtp_table_memory_stats WHERE OBJECT_ID > 0 GO This report would help DBA to identify which in-memory object taking lot of memory which can be used as a pointer for designing solution. I am sure in future we will discuss at lengths the whole concept of In-Memory tables in detail over this blog. To read more about In-Memory OLTP, have a look at In-Memory OLTP Series at Balmukund’s Blog. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Memory, SQL Reports

    Read the article

  • SQL SERVER – SSMS: Backup and Restore Events Report

    - by Pinal Dave
    A DBA wears multiple hats and in fact does more than what an eye can see. One of the core task of a DBA is to take backups. This looks so trivial that most developers shrug this off as the only activity a DBA might be doing. I have huge respect for DBA’s all around the world because even if they seem cool with all the scripting, automation, maintenance works round the clock to keep the business working almost 365 days 24×7, their worth is knowing that one day when the systems / HDD crashes and you have an important delivery to make. So these backup tasks / maintenance jobs that have been done come handy and are no more trivial as they might seem to be as considered by many. So the important question like: “When was the last backup taken?”, “How much time did the last backup take?”, “What type of backup was taken last?” etc are tricky questions and this report lands answers to the same in a jiffy. So the SSMS report, we are talking can be used to find backups and restore operation done for the selected database. Whenever we perform any backup or restore operation, the information is stored in the msdb database. This report can utilize that information and provide information about the size, time taken and also the file location for those operations. Here is how this report can be launched.   Once we launch this report, we can see 4 major sections shown as listed below. Average Time Taken For Backup Operations Successful Backup Operations Backup Operation Errors Successful Restore Operations Let us look at each section next. Average Time Taken For Backup Operations Information shown in “Average Time Taken For Backup Operations” section is taken from a backupset table in the msdb database. Here is the query and the expanded version of that particular section USE msdb; SELECT (ROW_NUMBER() OVER (ORDER BY t1.TYPE))%2 AS l1 ,       1 AS l2 ,       1 AS l3 ,       t1.TYPE AS [type] ,       (AVG(DATEDIFF(ss,backup_start_date, backup_finish_date)))/60.0 AS AverageBackupDuration FROM backupset t1 INNER JOIN sys.databases t3 ON ( t1.database_name = t3.name) WHERE t3.name = N'AdventureWorks2014' GROUP BY t1.TYPE ORDER BY t1.TYPE On my small database the time taken for differential backup was less than a minute, hence the value of zero is displayed. This is an important piece of backup operation which might help you in planning maintenance windows. Successful Backup Operations Here is the expanded version of this section.   This information is derived from various backup tracking tables from msdb database.  Here is the simplified version of the query which can be used separately as well. SELECT * FROM sys.databases t1 INNER JOIN backupset t3 ON (t3.database_name = t1.name) LEFT OUTER JOIN backupmediaset t5 ON ( t3.media_set_id = t5.media_set_id) LEFT OUTER JOIN backupmediafamily t6 ON ( t6.media_set_id = t5.media_set_id) WHERE (t1.name = N'AdventureWorks2014') ORDER BY backup_start_date DESC,t3.backup_set_id,t6.physical_device_name; The report does some calculations to show the data in a more readable format. For example, the backup size is shown in KB, MB or GB. I have expanded first row by clicking on (+) on “Device type” column. That has shown me the path of the physical backup file. Personally looking at this section, the Backup Size, Device Type and Backup Name are critical and are worth a note. As mentioned in the previous section, this section also has the Duration embedded inside it. Backup Operation Errors This section of the report gets data from default trace. You might wonder how. One of the event which is tracked by default trace is “ErrorLog”. This means that whatever message is written to errorlog gets written to default trace file as well. Interestingly, whenever there is a backup failure, an error message is written to ERRORLOG and hence default trace. This section takes advantage of that and shows the information. We can read below message under this section, which confirms above logic. No backup operations errors occurred for (AdventureWorks2014) database in the recent past or default trace is not enabled. Successful Restore Operations This section may not be very useful in production server (do you perform a restore of database?) but might be useful in the development and log shipping secondary environment, where we might be interested to see restore operations for a particular database. Here is the expanded version of the section. To fill this section of the report, I have restored the same backups which were taken to populate earlier sections. Here is the simplified version of the query used to populate this output. USE msdb; SELECT * FROM restorehistory t1 LEFT OUTER JOIN restorefile t2 ON ( t1.restore_history_id = t2.restore_history_id) LEFT OUTER JOIN backupset t3 ON ( t1.backup_set_id = t3.backup_set_id) WHERE t1.destination_database_name = N'AdventureWorks2014' ORDER BY restore_date DESC,  t1.restore_history_id,t2.destination_phys_name Have you ever looked at the backup strategy of your key databases? Are they in sync and do we have scope for improvements? Then this is the report to analyze after a week or month of maintenance plans running in your database. Do chime in with what are the strategies you are using in your environments. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

    Read the article

  • Big Data – Buzz Words: Importance of Relational Database in Big Data World – Day 9 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned what is HDFS. In this article we will take a quick look at the importance of the Relational Database in Big Data world. A Big Question? Here are a few questions I often received since the beginning of the Big Data Series - Does the relational database have no space in the story of the Big Data? Does relational database is no longer relevant as Big Data is evolving? Is relational database not capable to handle Big Data? Is it true that one no longer has to learn about relational data if Big Data is the final destination? Well, every single time when I hear that one person wants to learn about Big Data and is no longer interested in learning about relational database, I find it as a bit far stretched. I am not here to give ambiguous answers of It Depends. I am personally very clear that one who is aspiring to become Big Data Scientist or Big Data Expert they should learn about relational database. NoSQL Movement The reason for the NoSQL Movement in recent time was because of the two important advantages of the NoSQL databases. Performance Flexible Schema In personal experience I have found that when I use NoSQL I have found both of the above listed advantages when I use NoSQL database. There are instances when I found relational database too much restrictive when my data is unstructured as well as they have in the datatype which my Relational Database does not support. It is the same case when I have found that NoSQL solution performing much better than relational databases. I must say that I am a big fan of NoSQL solutions in the recent times but I have also seen occasions and situations where relational database is still perfect fit even though the database is growing increasingly as well have all the symptoms of the big data. Situations in Relational Database Outperforms Adhoc reporting is the one of the most common scenarios where NoSQL is does not have optimal solution. For example reporting queries often needs to aggregate based on the columns which are not indexed as well are built while the report is running, in this kind of scenario NoSQL databases (document database stores, distributed key value stores) database often does not perform well. In the case of the ad-hoc reporting I have often found it is much easier to work with relational databases. SQL is the most popular computer language of all the time. I have been using it for almost over 10 years and I feel that I will be using it for a long time in future. There are plenty of the tools, connectors and awareness of the SQL language in the industry. Pretty much every programming language has a written drivers for the SQL language and most of the developers have learned this language during their school/college time. In many cases, writing query based on SQL is much easier than writing queries in NoSQL supported languages. I believe this is the current situation but in the future this situation can reverse when No SQL query languages are equally popular. ACID (Atomicity Consistency Isolation Durability) – Not all the NoSQL solutions offers ACID compliant language. There are always situations (for example banking transactions, eCommerce shopping carts etc.) where if there is no ACID the operations can be invalid as well database integrity can be at risk. Even though the data volume indeed qualify as a Big Data there are always operations in the application which absolutely needs ACID compliance matured language. The Mixed Bag I have often heard argument that all the big social media sites now a days have moved away from Relational Database. Actually this is not entirely true. While researching about Big Data and Relational Database, I have found that many of the popular social media sites uses Big Data solutions along with Relational Database. Many are using relational databases to deliver the results to end user on the run time and many still uses a relational database as their major backbone. Here are a few examples: Facebook uses MySQL to display the timeline. (Reference Link) Twitter uses MySQL. (Reference Link) Tumblr uses Sharded MySQL (Reference Link) Wikipedia uses MySQL for data storage. (Reference Link) There are many for prominent organizations which are running large scale applications uses relational database along with various Big Data frameworks to satisfy their various business needs. Summary I believe that RDBMS is like a vanilla ice cream. Everybody loves it and everybody has it. NoSQL and other solutions are like chocolate ice cream or custom ice cream – there is a huge base which loves them and wants them but not every ice cream maker can make it just right  for everyone’s taste. No matter how fancy an ice cream store is there is always plain vanilla ice cream available there. Just like the same, there are always cases and situations in the Big Data’s story where traditional relational database is the part of the whole story. In the real world scenarios there will be always the case when there will be need of the relational database concepts and its ideology. It is extremely important to accept relational database as one of the key components of the Big Data instead of treating it as a substandard technology. Ray of Hope – NewSQL In this module we discussed that there are places where we need ACID compliance from our Big Data application and NoSQL will not support that out of box. There is a new termed coined for the application/tool which supports most of the properties of the traditional RDBMS and supports Big Data infrastructure – NewSQL. Tomorrow In tomorrow’s blog post we will discuss about NewSQL. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Example of Performance Tuning for Advanced Users with DB Optimizer

    - by Pinal Dave
    Performance tuning is such a subject that everyone wants to master it. In beginning everybody is at a novice level and spend lots of time learning how to master the art of performance tuning. However, as we progress further the tuning of the system keeps on getting very difficult. I have understood in my early career there should be no need of ego in the technology field. There are always better solutions and better ideas out there and we should not resist them. Instead of resisting the change and new wave I personally adopt it. Here is a similar example, as I personally progress to the master level of performance tuning, I face that it is getting harder to come up with optimal solutions. In such scenarios I rely on various tools to teach me how I can do things better. Once I learn about tools, I am often able to come up with better solutions when I face the similar situation next time. A few days ago I had received a query where the user wanted to tune it further to get the maximum out of the performance. I have re-written the similar query with the help of AdventureWorks sample database. SELECT * FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Shift s ON edh.ShiftID = s.ShiftID; User had similar query to above query was used in very critical report and wanted to get best out of the query. When I looked at the query – here were my initial thoughts Use only column in the select statements as much as you want in the application Let us look at the query pattern and data workload and find out the optimal index for it Before I give further solutions I was told by the user that they need all the columns from all the tables and creating index was not allowed in their system. He can only re-write queries or use hints to further tune this query. Now I was in the constraint box – I believe * was not a great idea but if they wanted all the columns, I believe we can’t do much besides using *. Additionally, if I cannot create a further index, I must come up with some creative way to write this query. I personally do not like to use hints in my application but there are cases when hints work out magically and gives optimal solutions. Finally, I decided to use Embarcadero’s DB Optimizer. It is a fantastic tool and very helpful when it is about performance tuning. I have previously explained how it works over here. First open DBOptimizer and open Tuning Job from File >> New >> Tuning Job. Once you open DBOptimizer Tuning Job follow the various steps indicates in the following diagram. Essentially we will take our original script and will paste that into Step 1: New SQL Text and right after that we will enable Step 2 for Generating Various cases, Step 3 for Detailed Analysis and Step 4 for Executing each generated case. Finally we will click on Analysis in Step 5 which will generate the report detailed analysis in the result pan. The detailed pan looks like. It generates various cases of T-SQL based on the original query. It applies various hints and available hints to the query and generate various execution plans of the query and displays them in the resultant. You can clearly notice that original query had a cost of 0.0841 and logical reads about 607 pages. Whereas various options which are just following it has different execution cost as well logical read. There are few cases where we have higher logical read and there are few cases where as we have very low logical read. If we pay attention the very next row to original query have Merge_Join_Query in description and have lowest execution cost value of 0.044 and have lowest Logical Reads of 29. This row contains the query which is the most optimal re-write of the original query. Let us double click over it. Here is the query: SELECT * FROM HumanResources.Employee e INNER JOIN HumanResources.EmployeeDepartmentHistory edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Shift s ON edh.ShiftID = s.ShiftID OPTION (MERGE JOIN) If you notice above query have additional hint of Merge Join. With the help of this Merge Join query hint this query is now performing much better than before. The entire process takes less than 60 seconds. Please note that it the join hint Merge Join was optimal for this query but it is not necessary that the same hint will be helpful in all the queries. Additionally, if the workload or data pattern changes the query hint of merge join may be no more optimal join. In that case, we will have to redo the entire exercise once again. This is the reason I do not like to use hints in my queries and I discourage all of my users to use the same. However, if you look at this example, this is a great case where hints are optimizing the performance of the query. It is humanly not possible to test out various query hints and index options with the query to figure out which is the most optimal solution. Sometimes, we need to depend on the efficiency tools like DB Optimizer to guide us the way and select the best option from the suggestion provided. Let me know what you think of this article as well your experience with DB Optimizer. Please leave a comment. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Joins, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • SQL – Quick Start with Explorer Sections of NuoDB – Query NuoDB Database

    - by Pinal Dave
    This is the third post in the series of the blog posts I am writing about NuoDB. NuoDB is very innovative and easy-to-use product. I can clearly see how one can scale-out NuoDB with so much ease and confidence. In my very first blog post we discussed how we can install NuoDB (link), and in my second post I discussed how we can manage the NuoDB database transaction engines and storage managers with a few clicks (link). Note: You can Download NuoDB from here. In this post, we will learn how we can use the Explorer feature of NuoDB to do various SQL operations. NuoDB has a browser-based Explorer, which is very powerful and has many of the features any IDE would normally have. Let us see how it works in the following step-by-step tutorial. Let us go to the NuoDBNuoDB Console by typing the following URL in your browser: http://localhost:8080/ It will bring you to the QuickStart screen. Make sure that you have created the sample database. If you have not created sample database, click on Create Database and create it successfully. Now go to the NuoDB Explorer by clicking on the main tab, and it will ask you for your domain username and password. Enter the username as a domain and password as a bird. Alternatively you can also enter username as a quickstart and password as a quickstart. Once you enter the password you will be able to see the databases. In our example we have installed the Sample Database hence you will see the Test database in our Database Hierarchy screen. When you click on database it will ask for the database login. Note that Database Login is different from Domain login and you will have to enter your database login over here. In our case the database username is dba and password is goalie. Once you enter a valid username and password it will display your database. Further expand your database and you will notice various objects in your database. Once you explore various objects, select any database and click on Open. When you click on execute, it will display the SQL script to select the data from the table. The autogenerated script displays entire result set from the database. The NuoDB Explorer is very powerful and makes the life of developers very easy. If you click on List SQL Statements it will list all the available SQL statements right away in Query Editor. You can see the popup window in following image. Here is the cool thing for geeks. You can even click on Query Plan and it will display the text based query plan as well. In case of a SELECT, the query plan will be much simpler, however, when we write complex queries it will be very interesting. We can use the query plan tab for performance tuning of the database. Here is another feature, when we click on List Tables in NuoDB Explorer.  It lists all the available tables in the query editor. This is very helpful when we are writing a long complex query. Here is a relatively complex example I have built using Inner Join syntax. Right below I have displayed the Query Plan. The query plan displays all the little details related to the query. Well, we just wrote multi-table query and executed it against the NuoDB database. You can use the NuoDB Admin section and do various analyses of the query and its performance. NuoDB is a distributed database built on a patented emergent architecture with full support for SQL and ACID guarantees.  It allows you to add Transaction Engine processes to a running system to improve the performance of your system.  You can also add a second Storage Engine to your running system for redundancy purposes.  Conversely, you can shut down processes when you don’t need the extra database resources. NuoDB also provides developers and administrators with a single intuitive interface for centrally monitoring deployments. If you have read my blog posts and have not tried out NuoDB, I strongly suggest that you download it today and catch up with the learnings with me. Trust me though the product is very powerful, it is extremely easy to learn and use. Reference: Pinal Dave (http://blog.sqlauthority.com)   Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

    Read the article

  • Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service.  Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia:   Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence  they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big Data – What is Big Data – 3 Vs of Big Data – Volume, Velocity and Variety – Day 2 of 21

    - by Pinal Dave
    Data is forever. Think about it – it is indeed true. Are you using any application as it is which was built 10 years ago? Are you using any piece of hardware which was built 10 years ago? The answer is most certainly No. However, if I ask you – are you using any data which were captured 50 years ago, the answer is most certainly Yes. For example, look at the history of our nation. I am from India and we have documented history which goes back as over 1000s of year. Well, just look at our birthday data – atleast we are using it till today. Data never gets old and it is going to stay there forever.  Application which interprets and analysis data got changed but the data remained in its purest format in most cases. As organizations have grown the data associated with them also grew exponentially and today there are lots of complexity to their data. Most of the big organizations have data in multiple applications and in different formats. The data is also spread out so much that it is hard to categorize with a single algorithm or logic. The mobile revolution which we are experimenting right now has completely changed how we capture the data and build intelligent systems.  Big organizations are indeed facing challenges to keep all the data on a platform which give them a  single consistent view of their data. This unique challenge to make sense of all the data coming in from different sources and deriving the useful actionable information out of is the revolution Big Data world is facing. Defining Big Data The 3Vs that define Big Data are Variety, Velocity and Volume. Volume We currently see the exponential growth in the data storage as the data is now more than text data. We can find data in the format of videos, musics and large images on our social media channels. It is very common to have Terabytes and Petabytes of the storage system for enterprises. As the database grows the applications and architecture built to support the data needs to be reevaluated quite often. Sometimes the same data is re-evaluated with multiple angles and even though the original data is the same the new found intelligence creates explosion of the data. The big volume indeed represents Big Data. Velocity The data growth and social media explosion have changed how we look at the data. There was a time when we used to believe that data of yesterday is recent. The matter of the fact newspapers is still following that logic. However, news channels and radios have changed how fast we receive the news. Today, people reply on social media to update them with the latest happening. On social media sometimes a few seconds old messages (a tweet, status updates etc.) is not something interests users. They often discard old messages and pay attention to recent updates. The data movement is now almost real time and the update window has reduced to fractions of the seconds. This high velocity data represent Big Data. Variety Data can be stored in multiple format. For example database, excel, csv, access or for the matter of the fact, it can be stored in a simple text file. Sometimes the data is not even in the traditional format as we assume, it may be in the form of video, SMS, pdf or something we might have not thought about it. It is the need of the organization to arrange it and make it meaningful. It will be easy to do so if we have data in the same format, however it is not the case most of the time. The real world have data in many different formats and that is the challenge we need to overcome with the Big Data. This variety of the data represent  represent Big Data. Big Data in Simple Words Big Data is not just about lots of data, it is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data. It makes any business more agile and robust so it can adapt and overcome business challenges. Tomorrow In tomorrow’s blog post we will try to answer discuss Evolution of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – An Efficiency Tool to Compare and Synchronize SQL Server Databases

    - by Pinal Dave
    There is no need to reinvent the wheel if it is already invented and if the wheel is already available at ease, there is no need to wait to grab it. Here is the similar situation. I came across a very interesting situation and I had to look for an efficient tool which can make my life easier and solve my business problem. Here is the scenario. One of the developers had deleted few rows from the very important mapping table of our development server (thankfully, it was not the production server). Though it was a development server, the entire development team had to stop working as the application started to crash on every page. Think about the lost of manpower and efficiency which we started to loose.  Pretty much every department had to stop working as our internal development application stopped working. Thankfully, we even take a backup of our development server and we had access to full backup of the entire database at 6 AM morning. We do not take as a frequent backup of development server as production server (naturally!). Even though we had a full backup, the solution was not to restore the database. Think about it, there were plenty of the other operations since the last good full backup and if we restore a full backup, we will pretty much overwrite on the top of the work done by developers since morning. Now, as restoring the full backup was not an option we decided to restore the same database on another server. Once we had restored our database to another server, the challenge was to compare the table from where the database was deleted. The mapping table from where the data were deleted contained over 5000 rows and it was humanly impossible to compare both the tables manually. Finally we decided to use efficiency tool dbForge Data Compare for SQL Server from DevArt. dbForge Data Compare for SQL Server is a powerful, fast and easy to use SQL compare tool, capable of using native SQL Server backups as metadata source. (FYI we Downloaded dbForge Data Compare) Once we discovered the product, we immediately downloaded the product and installed on our development server. After we installed the product, we were greeted with the following screen. We clicked on the New Data Comparision to start our new comparison project. It brought up following screen. Here is the best part of the product, we just had to enter our database connection username and password along with source and destination details and we are done. The entire process is very simple and self intuiting. The best part was that for the source, we can either select database or even backup. This was indeed fantastic feature. Think about this, if you have a very big database, it will take long time to restore on the server. Once it is restored, you will be able to work with it. However, when you are working with dbForge Data Compare it will accept database backup as your source or destination. Once I click on the execute it brought up following screen where it displayed an excellent summary of the data compare. It has dedicated tabs for the what is changing in what table as well had details of the changed data. The best part is that, once we had reviewed the change. We click on the Synchronize button in the menu bar and it brought up following screen. You can see that the screen has very simple straight forward but very powerful features. You can generate a script to synchronize from target to source or even from source to target. Additionally, the database is a very complicated world and there are extensive options to configure various database options on the next screen. We also have the option to either generate script or directly execute the script to target server. I like to play on the safe side and I generated the script for my synchronization and later on after review I deployed the scripts on the server. Well, my team and we were able to get going from our disaster in less than 10 minutes. There were few people in our team were indeed disappointed as they were thinking of going home early that day but in less than 10 minutes they had to get back to work. There are so many other features in  dbForge Data Compare for SQL Server, I am already planning to make this product company wide recommended product for Data Compare tool. Hats off to the team who have build this product. Here are few of the features salient features of the dbForge Data Compare for SQL Server Perform SQL Server database comparison to detect changes Compare SQL Server backups with live databases Analyze data differences between two databases Synchronize two databases that went out of sync Restore data of a particular table from the backup Generate data comparison reports in Excel and HTML formats Copy look-up data from development database to production Automate routine data synchronization tasks with command-line interface Go Ahead and Download the dbForge Data Compare for SQL Server right away. It is always a good idea to get familiar with the important tools before hand instead of learning it under pressure of disaster. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

    Read the article

  • SQL SERVER – Auditing and Profiling Database Made Easy with SQL Audit and Comply

    - by Pinal Dave
    Do you like auditing your database, or can you think of about a million other things you’d rather do?  Unfortunately, auditing is incredibly important.  As with tax audits, it is important to audit databases to ensure they are following all the rules, but they are also important for troubleshooting and security. There are several ways to audit SQL Server.  There is manual auditing, which is going through your database “by hand,” and obviously takes a long time and is quite inefficient.  SQL Server also provides programs to help you audit your systems.  Different administrators will have different opinions about best practices and which tools to use, and each one will be perfected for certain systems and certain users. Today, though, I would like to talk about Apex SQL Audit.  It is an auditing tool that acts like “track changes” in a word processing document.  It will log what has changed on the database, who made the changes, and what effects these changes have had (i.e. what objects were affected down the line).  All this information is logged, and can be easily viewed or printed for easy access. One of the best features of Apex is that it is so customizable (and easy to use!).  First, start Apex.  Then you can connect to the database you would like to monitor. Once you select your database, you can select which table you want to audit. You can customize right down to the field you’d like to audit, and then select which types of actions you’d like tracked – insert, delete, or update.  Repeat these steps for every database you want monitored. To create the logs, choose “Create triggers” in the menu.  The script written here will be what logs each insert, delete, and update function.  Press F5 to execute.  All this tracking information will be stored in AUDIT_LOG_DATA and AUDIT_LOG_TRANSACTIONS tables.  View these tables using ApexSQL Audit reports. These transaction logs can be extremely detailed – especially on very busy servers, where every move it traced.  Reading them can be overwhelming, to say the least.  Apex has tried to make things easier for the average DBA, though. You can read these tracking logs in Apex, and it will display data and objects that affect your server – even things that were happening on your server before you installed Apex! To read these logs, open Apex, and connect to that database you want to audit. Go to the Transaction Logs tab, and add the logs you want to read. To narrow down what results you want to see, you can use the Filter tab to choose time, operation type, name, users, and more. Click Open, and you can see the results in a grid (as shown below).  You can export these results to CSV, HTML, XML or SQL files and save on the hard disk. One of the advantages is that since there are no triggers here, there are no other processes that will affect SQL Server performance.  Using this method is also how to view history from your database that occurred before Apex was installed.  This type of tracking does require storage space for the data sources, as the database must be fully running, and the transaction logs must exist (things not stored in the transactions logs will not be recoverable). Apex can also replace SQL Server Profiler and SQL Server Traces – which are much more complex and error-prone – with its ApexSQL Comply.  It can do fault tolerant auditing, centralized reporting, and “who saw what” information in an easy-to-use interface.  The tracking settings can be altered by the user, or the default options will provide solutions to the most common auditing problems. To get started: open ApexSQL Comply, and selected Database Filter Settings to choose which database you’d like to audit.  You can select which tracking you’re like in Operation Types – DML, DDL, queries executed, execute statements, and more.  To get started, click Start Auditing. After this, every action will be stored in the central repository database (ApexSQLCrd).  You can view the audit and create a report (or view the standard default report) using a wizard. You can see how easy it is to use ApexSQL Comply.  You can easily set audits, including the type and time, and create customized reports.  Remote users can easily access the reports through the user interface (available online, as well), and security concerns are all taken care of by the program.  Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

    Read the article

  • SQL SERVER – Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2

    - by Pinal Dave
    This is the second part of the series Incremental Statistics. Here is the index of the complete series. What is Incremental Statistics? – Performance improvements in SQL Server 2014 – Part 1 Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2 DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3 In part 1 we have understood what is incremental statistics and now in this second part we will see a simple example of incremental statistics. This blog post is heavily inspired from my friend Balmukund’s must read blog post. If you have partitioned table and lots of data, this feature can be specifically very useful. Prerequisite Here are two things you must know before you start with the demonstrations. AdventureWorks – For the demonstration purpose I have installed AdventureWorks 2012 as an AdventureWorks 2014 in this demonstration. Partitions – You should know how partition works with databases. Setup Script Here is the setup script for creating Partition Function, Scheme, and the Table. We will populate the table based on the SalesOrderDetails table from AdventureWorks. -- Use Database USE AdventureWorks2014 GO -- Create Partition Function CREATE PARTITION FUNCTION IncrStatFn (INT) AS RANGE LEFT FOR VALUES (44000, 54000, 64000, 74000) GO -- Create Partition Scheme CREATE PARTITION SCHEME IncrStatSch AS PARTITION [IncrStatFn] TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]) GO -- Create Table Incremental_Statistics CREATE TABLE [IncrStatTab]( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [ModifiedDate] [datetime] NOT NULL) ON IncrStatSch(SalesOrderID) GO -- Populate Table INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate]) SELECT     [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate] FROM       [Sales].[SalesOrderDetail] WHERE      SalesOrderID < 54000 GO Check Details Now we will check details in the partition table IncrStatSch. -- Check the partition SELECT * FROM sys.partitions WHERE OBJECT_ID = OBJECT_ID('IncrStatTab') GO You will notice that only a few of the partition are filled up with data and remaining all the partitions are empty. Now we will create statistics on the Table on the column SalesOrderID. However, here we will keep adding one more keyword which is INCREMENTAL = ON. Please note this is the new keyword and feature added in SQL Server 2014. It did not exist in earlier versions. -- Create Statistics CREATE STATISTICS IncrStat ON [IncrStatTab] (SalesOrderID) WITH FULLSCAN, INCREMENTAL = ON GO Now we have successfully created statistics let us check the statistical histogram of the table. Now let us once again populate the table with more data. This time the data are entered into a different partition than earlier populated partition. -- Populate Table INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate]) SELECT     [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate] FROM       [Sales].[SalesOrderDetail] WHERE      SalesOrderID > 54000 GO Let us check the status of the partition once again with following script. -- Check the partition SELECT * FROM sys.partitions WHERE OBJECT_ID = OBJECT_ID('IncrStatTab') GO Statistics Update Now here has the new feature come into action. Previously, if we have to update the statistics, we will have to FULLSCAN the entire table irrespective of which partition got the data. However, in SQL Server 2014 we can just specify which partition we want to update in terms of Statistics. Here is the script for the same. -- Update Statistics Manually UPDATE STATISTICS IncrStatTab (IncrStat) WITH RESAMPLE ON PARTITIONS(3, 4) GO Now let us check the statistics once again. -- Show Statistics DBCC SHOW_STATISTICS('IncrStatTab', IncrStat) WITH HISTOGRAM GO Upon examining statistics histogram, you will notice that now the distribution has changed and there is way more rows in the histogram. Summary The new feature of Incremental Statistics is indeed a boon for the scenario where there are partitions and statistics needs to be updated frequently on the partitions. In earlier version to update statistics one has to do FULLSCAN on the entire table which was wasting too many resources. With the new feature in SQL Server 2014, now only those partitions which are significantly changed can be specified in the script to update statistics. Cleanup You can clean up the database by executing following scripts. -- Clean up DROP TABLE [IncrStatTab] DROP PARTITION SCHEME [IncrStatSch] DROP PARTITION FUNCTION [IncrStatFn] GO Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SQL Statistics, Statistics

    Read the article

  • SQL SERVER – Parsing SSIS Catalog Messages – Notes from the Field #030

    - by Pinal Dave
    [Note from Pinal]: This is a new episode of Notes from the Field series. SQL Server Integration Service (SSIS) is one of the most key essential part of the entire Business Intelligence (BI) story. It is a platform for data integration and workflow applications. The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data. In this episode of the Notes from the Field series I requested SSIS Expert Andy Leonard to discuss one of the most interesting concepts of SSIS Catalog Messages. There are plenty of interesting and useful information captured in the SSIS catalog and we will learn together how to explore the same. The SSIS Catalog captures a lot of cool information by default. Here’s a query I use to parse messages from the catalog.operation_messages table in the SSISDB database, where the logged messages are stored. This query is set up to parse a default message transmitted by the Lookup Transformation. It’s one of my favorite messages in the SSIS log because it gives me excellent information when I’m tuning SSIS data flows. The message reads similar to: Data Flow Task:Information: The Lookup processed 4485 rows in the cache. The processing time was 0.015 seconds. The cache used 1376895 bytes of memory. The query: USE SSISDB GO DECLARE @MessageSourceType INT = 60 DECLARE @StartOfIDString VARCHAR(100) = 'The Lookup processed ' DECLARE @ProcessingTimeString VARCHAR(100) = 'The processing time was ' DECLARE @CacheUsedString VARCHAR(100) = 'The cache used ' DECLARE @StartOfIDSearchString VARCHAR(100) = '%' + @StartOfIDString + '%' DECLARE @ProcessingTimeSearchString VARCHAR(100) = '%' + @ProcessingTimeString + '%' DECLARE @CacheUsedSearchString VARCHAR(100) = '%' + @CacheUsedString + '%' SELECT operation_id , SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1))) AS LookupRowsCount , SUBSTRING(MESSAGE, (PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1)) - (PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))) AS LookupProcessingTime , CASE WHEN (CONVERT(numeric(3,3),SUBSTRING(MESSAGE, (PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1)) - (PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1))))) = 0 THEN 0 ELSE CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))) / CONVERT(numeric(3,3),SUBSTRING(MESSAGE, (PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@ProcessingTimeSearchString,MESSAGE) + LEN(@ProcessingTimeString) + 1)) - (PATINDEX(@ProcessingTimeSearchString, MESSAGE) + LEN(@ProcessingTimeString) + 1)))) END AS LookupRowsPerSecond , SUBSTRING(MESSAGE, (PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1)) - (PATINDEX(@CacheUsedSearchString, MESSAGE) + LEN(@CacheUsedString) + 1))) AS LookupBytesUsed ,CASE WHEN (CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))))= 0 THEN 0 ELSE CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@CacheUsedSearchString,MESSAGE) + LEN(@CacheUsedString) + 1)) - (PATINDEX(@CacheUsedSearchString, MESSAGE) + LEN(@CacheUsedString) + 1)))) / CONVERT(bigint,SUBSTRING(MESSAGE, (PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1), ((CHARINDEX(' ', MESSAGE, PATINDEX(@StartOfIDSearchString,MESSAGE) + LEN(@StartOfIDString) + 1)) - (PATINDEX(@StartOfIDSearchString, MESSAGE) + LEN(@StartOfIDString) + 1)))) END AS LookupBytesPerRow FROM [catalog].[operation_messages] WHERE message_source_type = @MessageSourceType AND MESSAGE LIKE @StartOfIDSearchString GO Note that you have to set some parameter values: @MessageSourceType [int] – represents the message source type value from the following results: Value     Description 10           Entry APIs, such as T-SQL and CLR Stored procedures 20           External process used to run package (ISServerExec.exe) 30           Package-level objects 40           Control Flow tasks 50           Control Flow containers 60           Data Flow task 70           Custom execution message Note: Taken from Reza Rad’s (excellent!) helper.MessageSourceType table found here. @StartOfIDString [VarChar(100)] – use this to uniquely identify the message field value you wish to parse. In this case, the string ‘The Lookup processed ‘ identifies all the Lookup Transformation messages I desire to parse. @ProcessingTimeString [VarChar(100)] – this parameter is message-specific. I use this parameter to specifically search the message field value for the beginning of the Lookup Processing Time value. For this execution, I use the string ‘The processing time was ‘. @CacheUsedString [VarChar(100)] – this parameter is also message-specific. I use this parameter to specifically search the message field value for the beginning of the Lookup Cache  Used value. It returns the memory used, in bytes. For this execution, I use the string ‘The cache used ‘. The other parameters are built from variations of the parameters listed above. The query parses the values into text. The string values are converted to numeric values for ratio calculations; LookupRowsPerSecond and LookupBytesPerRow. Since ratios involve division, CASE statements check for denominators that equal 0. Here are the results in an SSMS grid: This is not the only way to retrieve this information. And much of the code lends itself to conversion to functions. If there is interest, I will share the functions in an upcoming post. If you want to get started with SSIS with the help of experts, read more over at Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Backup and Restore, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSIS

    Read the article

  • SQL SERVER – SSMS: Top Object and Batch Execution Statistics Reports

    - by Pinal Dave
    The month of June till mid of July has been the fever of sports. First, it was Wimbledon Tennis and then the Soccer fever was all over. There is a huge number of fan followers and it is great to see the level at which people sometimes worship these sports. Being an Indian, I cannot forget to mention the India tour of England later part of July. Following these sports and as the events unfold to the finals, there are a number of ways the statisticians can slice and dice the numbers. Cue from soccer I can surely say there is a team performance against another team and then there is individual member fairs against a particular opponent. Such statistics give us a fair idea to how a team in the past or in the recent past has fared against each other, head-to-head stats during World cup and during other neutral venue games. All these statistics are just pointers. In reality, they don’t reflect the calibre of the current team because the individuals who performed in each of these games are totally different (Typical example being the Brazil Vs Germany semi-final match in FIFA 2014). So at times these numbers are misleading. It is worth investigating and get the next level information. Similar to these statistics, SQL Server Management studio is also equipped with a number of reports like a) Object Execution Statistics report and b) Batch Execution Statistics reports. As discussed in the example, the team scorecard is like the Batch Execution statistics and individual stats is like Object Level statistics. The analogy can be taken only this far, trust me there is no correlation between SQL Server functioning and playing sports – It is like I think about diet all the time except while I am eating. Performance – Batch Execution Statistics Let us view the first report which can be invoked from Server Node -> Reports -> Standard Reports -> Performance – Batch Execution Statistics. Most of the values that are displayed in this report come from the DMVs sys.dm_exec_query_stats and sys.dm_exec_sql_text(sql_handle). This report contains 3 distinctive sections as outline below.   Section 1: This is a graphical bar graph representation of Average CPU Time, Average Logical reads and Average Logical Writes for individual batches. The Batch numbers are indicative and the details of individual batch is available in section 3 (detailed below). Section 2: This represents a Pie chart of all the batches by Total CPU Time (%) and Total Logical IO (%) by batches. This graphical representation tells us which batch consumed the highest CPU and IO since the server started, provided plan is available in the cache. Section 3: This is the section where we can find the SQL statements associated with each of the batch Numbers. This also gives us the details of Average CPU / Average Logical Reads and Average Logical Writes in the system for the given batch with object details. Expanding the rows, I will also get the # Executions and # Plans Generated for each of the queries. Performance – Object Execution Statistics The second report worth a look is Object Execution statistics. This is a similar report as the previous but turned on its head by SQL Server Objects. The report has 3 areas to look as above. Section 1 gives the Average CPU, Average IO bar charts for specific objects. The section 2 is a graphical representation of Total CPU by objects and Total Logical IO by objects. The final section details the various objects in detail with the Avg. CPU, IO and other details which are self-explanatory. At a high-level both the reports are based on queries on two DMVs (sys.dm_exec_query_stats and sys.dm_exec_sql_text) and it builds values based on calculations using columns in them: SELECT * FROM    sys.dm_exec_query_stats s1 CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 WHERE   s2.objectid IS NOT NULL AND DB_NAME(s2.dbid) IS NOT NULL ORDER BY  s1.sql_handle; This is one of the simplest form of reports and in future blogs we will look at more complex reports. I truly hope that these reports can give DBAs and developers a hint about what is the possible performance tuning area. As a closing point I must emphasize that all above reports pick up data from the plan cache. If a particular query has consumed a lot of resources earlier, but plan is not available in the cache, none of the above reports would show that bad query. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: SQL, SQL Authority, SQL Query, SQL Server, SQL Server Management Studio, SQL Tips and Tricks, T SQL Tagged: SQL Reports

    Read the article

< Previous Page | 19 20 21 22 23 24 25 26 27 28 29 30  | Next Page >