Search Results

Search found 86974 results on 3479 pages for 'visualsvn server'.

Page 593/3479 | < Previous Page | 589 590 591 592 593 594 595 596 597 598 599 600 | Next Page >

Big Data – Beginning Big Data – Day 1 of 21

- by Pinal Dave

What is Big Data? I want to learn Big Data. I have no clue where and how to start learning about it. Does Big Data really means data is big? What are the tools and software I need to know to learn Big Data? I often receive questions which I mentioned above. They are good questions and honestly when we search online, it is hard to find authoritative and authentic answers. I have been working with Big Data and NoSQL for a while and I have decided that I will attempt to discuss this subject over here in the blog. In the next 21 days we will understand what is so big about Big Data. Big Data – Big Thing! Big Data is becoming one of the most talked about technology trends nowadays. The real challenge with the big organization is to get maximum out of the data already available and predict what kind of data to collect in the future. How to take the existing data and make it meaningful that it provides us accurate insight in the past data is one of the key discussion points in many of the executive meetings in organizations. With the explosion of the data the challenge has gone to the next level and now a Big Data is becoming the reality in many organizations. Big Data – A Rubik’s Cube I like to compare big data with the Rubik’s cube. I believe they have many similarities. Just like a Rubik’s cube it has many different solutions. Let us visualize a Rubik’s cube solving challenge where there are many experts participating. If you take five Rubik’s cube and mix up the same way and give it to five different expert to solve it. It is quite possible that all the five people will solve the Rubik’s cube in fractions of the seconds but if you pay attention to the same closely, you will notice that even though the final outcome is the same, the route taken to solve the Rubik’s cube is not the same. Every expert will start at a different place and will try to resolve it with different methods. Some will solve one color first and others will solve another color first. Even though they follow the same kind of algorithm to solve the puzzle they will start and end at a different place and their moves will be different at many occasions. It is nearly impossible to have a exact same route taken by two experts. Big Market and Multiple Solutions Big Data is exactly like a Rubik’s cube – even though the goal of every organization and expert is same to get maximum out of the data, the route and the starting point are different for each organization and expert. As organizations are evaluating and architecting big data solutions they are also learning the ways and opportunities which are related to Big Data. There is not a single solution to big data as well there is not a single vendor which can claim to know all about Big Data. Honestly, Big Data is too big a concept and there are many players – different architectures, different vendors and different technology. What is Next? In this 31 days series we will be exploring many essential topics related to big data. I do not claim that you will be master of the subject after 31 days but I claim that I will be covering following topics in easy to understand language. Architecture of Big Data Big Data a Management and Implementation Different Technologies – Hadoop, Mapreduce Real World Conversations Best Practices Tomorrow In tomorrow’s blog post we will try to answer one of the very essential questions – What is Big Data? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
BIP BIServer Query Debug

- by Tim Dexter

With some help from Bryan, I have uncovered a way of being able to debug or at least log what BIServer is doing when BIP sends it a query request. This is not for those of you querying the database directly but if you are using the BIServer and its datamodel to fetch data for a BIP report. If you have written or used the query builder against BIServer and when you run the report it chokes with a cryptic message, that you have no clue about, read on. When BIP runs a piece of BIServer logical SQL to fetch data. It does not appear to validate it, it just passes it through, so what is BIServer doing on its end? As you may know, you are not writing regular physical sql its actually logical sql e.g. select Jobs."Job Title" as "Job Title", Employees."Last Name" as "Last Name", Employees.Salary as Salary, Locations."Department Name" as "Department Name", Locations."Country Name" as "Country Name", Locations."Region Name" as "Region Name" from HR.Locations Locations, HR.Employees Employees, HR.Jobs Jobs The tables might not even be a physical tables, we don't care, that's what the BIServer and its model are for. You have put all the effort into building the model, just go get me the data from where ever it might be. The BIServer takes the logical sql and uses its vast brain to work out what the physical SQL is, executes it and passes the result back to BIP. select distinct T32556.JOB_TITLE as c1, T32543.LAST_NAME as c2, T32543.SALARY as c3, T32537.DEPARTMENT_NAME as c4, T32532.COUNTRY_NAME as c5, T32577.REGION_NAME as c6 from JOBS T32556, REGIONS T32577, COUNTRIES T32532, LOCATIONS T32569, DEPARTMENTS T32537, EMPLOYEES T32543 where ( T32532.COUNTRY_ID = T32569.COUNTRY_ID and T32532.REGION_ID = T32577.REGION_ID and T32537.DEPARTMENT_ID = T32543.DEPARTMENT_ID and T32537.LOCATION_ID = T32569.LOCATION_ID and T32543.JOB_ID = T32556.JOB_ID ) Not a very tough example I know but you get the idea. How do I know what the BIServer is up to? How can I find out what the issue might be if BIServer chokes on my query? There are a couple of steps: In the Administrator tool you need to set the logging level for the Administrator user to something greater than the default '0'. '7' is going to give you the max. Just remember to take it back down after you have finished the debug. I needed to bounce my BIServer service Now here's the secret sauce. Prefix the following to your BIP query set variable LOGLEVEL = 7; Set the log level to that you have in the admin tool Now run your BIP report. With the prefix in place; BIServer will write to the NQQuery.log file. This is located in the ./OracleBI/server/Log directory. In there you are going to find the complete process the BIServer has gone through to try and get the data back for you A quick note, if the BIServer can, its going to hit that great BIEE cache to get your data and you may not see the full log. IF this is the case. Get inot hte Administration page (via the browser login) and clear out your BIP report cursor. Then re-run. This will hopefully help out if you are trying to debug that annoying BIP report that will not run or is getting some strange data. Don't forget to turn that logging level back down once you are done. This will avoid the DBA screaming at you for sucking up all the disk space on the system.

Read the article
Problem with apt-get update: failed to fetch error

- by user171447

I run an Ubuntu Server 12.04.3 LTS. Today, I wanted to update it, but I did not managed it (yes...), however upgrading worked well. I don't want you to solve my problem but it would be greatful if you could give me some hints. I googled hours, I fould a lot of this kind of errors, but not exactly this. Here is the output of apt-get update: Hit http://filepile.fastit.net precise Release.gpg Hit http://filepile.fastit.net precise Release Hit http://filepile.fastit.net precise/main amd64 Packages Hit http://filepile.fastit.net precise/restricted amd64 Packages Hit http://filepile.fastit.net precise/universe amd64 Packages Hit http://filepile.fastit.net precise/multiverse amd64 Packages Hit http://filepile.fastit.net precise/main i386 Packages Hit http://filepile.fastit.net precise/restricted i386 Packages Hit http://filepile.fastit.net precise/universe i386 Packages Hit http://filepile.fastit.net precise/multiverse i386 Packages Ign http://filepile.fastit.net precise/main TranslationIndex Ign http://filepile.fastit.net precise/multiverse TranslationIndex Ign http://filepile.fastit.net precise/restricted TranslationIndex Ign http://filepile.fastit.net precise/universe TranslationIndex Ign http://filepile.fastit.net precise/main Translation-en_GB Ign http://filepile.fastit.net precise/main Translation-en Ign http://filepile.fastit.net precise/main Translation-en_GB.UTF-8 Hit http://archive.canonical.com precise Release.gpg Ign http://filepile.fastit.net precise/multiverse Translation-en_GB Ign http://filepile.fastit.net precise/multiverse Translation-en Ign http://filepile.fastit.net precise/multiverse Translation-en_GB.UTF-8 Ign http://filepile.fastit.net precise/restricted Translation-en_GB Ign http://filepile.fastit.net precise/restricted Translation-en Ign http://filepile.fastit.net precise/restricted Translation-en_GB.UTF-8 Ign http://filepile.fastit.net precise/universe Translation-en_GB Ign http://filepile.fastit.net precise/universe Translation-en Ign http://filepile.fastit.net precise/universe Translation-en_GB.UTF-8 Hit http://archive.ubuntu.com precise Release.gpg Hit http://archive.canonical.com precise Release Hit http://archive.ubuntu.com precise Release Hit http://archive.ubuntu.com precise/main Sources Hit http://archive.ubuntu.com precise/restricted Sources Hit http://archive.ubuntu.com precise/main i386 Packages Hit http://archive.ubuntu.com precise/restricted i386 Packages Hit http://archive.ubuntu.com precise/multiverse i386 Packages Hit http://archive.ubuntu.com precise/main TranslationIndex Hit http://archive.ubuntu.com precise/multiverse TranslationIndex Hit http://archive.ubuntu.com precise/restricted TranslationIndex Hit http://archive.ubuntu.com precise/main Translation-en_GB Hit http://archive.ubuntu.com precise/main Translation-en Hit http://archive.ubuntu.com precise/multiverse Translation-en_GB Hit http://archive.ubuntu.com precise/multiverse Translation-en Hit http://archive.ubuntu.com precise/restricted Translation-en_GB Hit http://archive.ubuntu.com precise/restricted Translation-en Hit http://fr.archive.ubuntu.com precise-security Release.gpg Hit http://fr.archive.ubuntu.com precise-updates Release.gpg Hit http://fr.archive.ubuntu.com precise-security Release Hit http://fr.archive.ubuntu.com precise-updates Release Hit http://fr.archive.ubuntu.com precise-security/main amd64 Packages Hit http://fr.archive.ubuntu.com precise-security/restricted amd64 Packages Hit http://fr.archive.ubuntu.com precise-security/universe amd64 Packages Hit http://fr.archive.ubuntu.com precise-security/multiverse amd64 Packages :W: Failed to fetch http://archive.canonical.com/dists/precise/Release Unable to find expected entry 'main/binary-amd64/Packages' in Release file (Wrong sources.list entry or malformed file) E: Some index files failed to download. They have been ignored, or old ones used instead. Hit http://fr.archive.ubuntu.com precise-security/main i386 Packages Hit http://fr.archive.ubuntu.com precise-security/restricted i386 Packages Hit http://fr.archive.ubuntu.com precise-security/universe i386 Packages Hit http://fr.archive.ubuntu.com precise-security/multiverse i386 Packages Hit http://fr.archive.ubuntu.com precise-security/main TranslationIndex Hit http://fr.archive.ubuntu.com precise-security/multiverse TranslationIndex Hit http://fr.archive.ubuntu.com precise-security/restricted TranslationIndex Hit http://fr.archive.ubuntu.com precise-security/universe TranslationIndex Hit http://fr.archive.ubuntu.com precise-updates/main amd64 Packages Hit http://fr.archive.ubuntu.com precise-updates/restricted amd64 Packages Hit http://fr.archive.ubuntu.com precise-updates/universe amd64 Packages Hit http://fr.archive.ubuntu.com precise-updates/multiverse amd64 Packages Hit http://fr.archive.ubuntu.com precise-updates/main i386 Packages Hit http://fr.archive.ubuntu.com precise-updates/restricted i386 Packages Hit http://fr.archive.ubuntu.com precise-updates/universe i386 Packages Hit http://fr.archive.ubuntu.com precise-updates/multiverse i386 Packages Hit http://fr.archive.ubuntu.com precise-updates/main TranslationIndex Hit http://fr.archive.ubuntu.com precise-updates/multiverse TranslationIndex Hit http://fr.archive.ubuntu.com precise-updates/restricted TranslationIndex Hit http://fr.archive.ubuntu.com precise-updates/universe TranslationIndex Hit http://fr.archive.ubuntu.com precise-security/main Translation-en Hit http://fr.archive.ubuntu.com precise-security/multiverse Translation-en Hit http://fr.archive.ubuntu.com precise-security/restricted Translation-en Hit http://fr.archive.ubuntu.com precise-security/universe Translation-en Hit http://fr.archive.ubuntu.com precise-updates/main Translation-en_GB Hit http://fr.archive.ubuntu.com precise-updates/main Translation-en Hit http://fr.archive.ubuntu.com precise-updates/multiverse Translation-en_GB Hit http://fr.archive.ubuntu.com precise-updates/multiverse Translation-en Hit http://fr.archive.ubuntu.com precise-updates/restricted Translation-en_GB Hit http://fr.archive.ubuntu.com precise-updates/restricted Translation-en Hit http://fr.archive.ubuntu.com precise-updates/universe Translation-en_GB Hit http://fr.archive.ubuntu.com precise-updates/universe Translation-en And here is my /etc/apt/sources.list: ###### Ubuntu Main Repos deb http://filepile.fastit.net/ubuntu/ precise main restricted universe multiverse # deb http://de.archive.ubuntu.com/ubuntu/ precise main restricted universe multiverse deb http://archive.canonical.com/ precise main restricted universe multiverse ###### Ubuntu Update Repos deb http://fr.archive.ubuntu.com/ubuntu/ precise-security main restricted universe multiverse deb http://fr.archive.ubuntu.com/ubuntu/ precise-updates main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ precise main restricted multiverse deb-src http://archive.ubuntu.com/ubuntu precise main restricted multiverse Thanks for your help!

Read the article
Archiving SQLHelp tweets

- by jamiet

#SQLHelp is a Twitter hashtag that can be used by any Twitter user to get help from the SQL Server community. I think its fair to say that in its first year of being it has proved to be a very useful resource however Kendra Little (@kendra_little) made a very salient point yesterday when she tweeted: Is there a way to search the archives of #sqlhelp Trying to remember answer to a question I know I saw a couple months ago http://twitter.com/#!/Kendra_Little/status/15538234184441856 This highlights an inherent problem with Twitter’s search capability – it simply does not reach far enough back in time. I have made steps to remedy that situation by putting into place two initiatives to archive Tweets that contain the #sqlhelp hashtag. The Archivist http://archivist.visitmix.com/ is a free service that, quite simply, archives a history of tweets that contain a given search term by periodically polling Twitter’s search service with that search term and subsequently displaying a dashboard providing an aggregate view of those tweets for things like tweet volume over time, top users and top words (Archivist FAQ). I have set up an archive on The Archivist for “sqlhelp” which you can view at http://archivist.visitmix.com/jamiet/7. Here is a screenshot of the SQLHelp dashboard 36 minutes after I set it up: There is lots of good information in there, including the fact that Jonathan Kehayias (@SQLSarg) is the most active SQLHelp tweeter (I suspect as an answerer rather than a questioner ) and that SSIS has proven to be a rather (ahem) popular subject!! Datasift The Archivist has its uses though for our purposes it has a couple of downsides. For starters you cannot search through an archive (which is what Kendra was after) and nor can you export the contents of the archive for offline analysis. For those functions we need something a bit more heavyweight and for that I present to you Datasift. Datasift is a tool (currently an alpha release) that allows you to search for tweets and provide them through an object called a Datasift stream. That sounds very similar to normal Twitter search though it has one distinct advantage that other Twitter search tools do not – Datasift has access to Twitter’s Streaming API (aka the Twitter Firehose). In addition it has access to a lot of other rather nice features: It provides the Datasift API that allows you to consume the output of a Datasift stream in your tool of choice (bring on my favourite ultimate mashup tool J ) It has a query language (called Filtered Stream Definition Language – FSDL for short) A Datasift stream can consume (and filter) other Datasift streams Datasift can (and does) consume services other than Twitter If I refer to Datasift as “ETL for tweets” then you may get some sort of idea what it is all about. Just as I did with The Archivist I have set up a publicly available Datasift stream for “sqlhelp” at http://datasift.net/stream/1581/sqlhelp. Here is the FSDL query that provides the data: twitter.text contains "sqlhelp" Pretty simple eh? At the current time it provides little more than a rudimentary dashboard but as Datasift is currently an alpha release I think this may be worth keeping an eye on. The real value though is the ability to consume the output of a stream via Datasift’s RESTful API, observe: http://api.datasift.net/stream.xml?stream_identifier=c7015255f07e982afdeebdf1ae6e3c0d&username=jamiet&api_key=XXXXXXX (Note that an api_key is required during the alpha period so, given that I’m not supplying my api_key, this URI will not work for you) Just to prove that a Datasift stream can indeed consume data from another stream I have set up a second stream that further filters the first one for tweets containing “SSIS”. That one is at http://datasift.net/stream/1586/ssis-sqlhelp and here is the FSDL query: rule "414c9845685ff8d2548999cf3162e897" and (interaction.content contains "ssis") When Datasift moves beyond alpha I’ll re-assess how useful this is going to be and post a follow-up blog. @Jamiet

Read the article
SQLAuthority News – History of the Database – 5 Years of Blogging at SQLAuthority

- by pinaldave

Don’t miss the Contest:Participate in 5th Anniversary Contest Today is this blog’s birthday, and I want to do a fun, informative blog post. Five years ago this day I started this blog. Intention – my personal web blog. I wrote this blog for me and still today whatever I learn I share here. I don’t want to wander too far off topic, though, so I will write about two of my favorite things – history and databases. And what better way to cover these two topics than to talk about the history of databases. If you want to be technical, databases as we know them today only date back to the late 1960’s and early 1970’s, when computers began to keep records and store memories. But the idea of memory storage didn’t just appear 40 years ago – there was a history behind wanting to keep these records. In fact, the written word originated as a way to keep records – ancient man didn’t decide they suddenly wanted to read novels, they needed a way to keep track of the harvest, of their flocks, and of the tributes paid to the local lord. And that is how writing and the database began. You could consider the cave paintings from 17,0000 years ago at Lascaux, France, or the clay token from the ancient Sumerians in 8,000 BC to be the first instances of record keeping – and thus databases. If you prefer, you can consider the advent of written language to be the first database. Many historians believe the first written language appeared in the 37th century BC, with Egyptian hieroglyphics. The ancient Sumerians, not to be outdone, also created their own written language within a few hundred years. Databases could be more closely described as collections of information, in which case the Sumerians win the prize for the first archive. A collection of 20,000 stone tablets was unearthed in 1964 near the modern day city Tell Mardikh, in Syria. This ancient database is from 2,500 BC, and appears to be a sort of law library where apprentice-scribes copied important documents. Further archaeological digs hope to uncover the palace library, and thus an even larger database. Of course, the most famous ancient database would have to be the Royal Library of Alexandria, the great collection of records and wisdom in ancient Egypt. It was created by Ptolemy I, and existed from 300 BC through 30 AD, when Julius Caesar effectively erased the hard drives when he accidentally set fire to it. As any programmer knows who has forgotten to hit “save” or has experienced a sudden power outage, thousands of hours of work was lost in a single instant. Databases existed in very similar conditions up until recently. Cuneiform tablets gave way to papyrus, which led to vellum, and eventually modern paper and the printing press. Someday the databases we rely on so much today will become another chapter in the history of record keeping. Who knows what the databases of tomorrow will look like! Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, Database, Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Database – Beginning with Cloud Database As A Service

- by Pinal Dave

I love my weekend projects. Everybody does different activities in their weekend – like traveling, reading or just nothing. Every weekend I try to do something creative and different in the database world. The goal is I learn something new and if I enjoy my learning experience I share with the world. This weekend, I decided to explore Cloud Database As A Service – Morpheus. In my career I have managed many databases in the cloud and I have good experience in managing them. I should highlight that today’s applications use multiple databases from SQL for transactions and analytics, NoSQL for documents, In-Memory for caching to Indexing for search. Provisioning and deploying these databases often require extensive expertise and time. Often these databases are also not deployed on the same infrastructure and can create unnecessary latency between the application layer and the databases. Not to mention the different quality of service based on the infrastructure and the service provider where they are deployed. Moreover, there are additional problems that I have experienced with traditional database setup when hosted in the cloud: Database provisioning & orchestration Slow speed due to hardware issues Poor Monitoring Tools High network latency Now if you have a great software and expert network engineer, you can continuously work on above problems and overcome them. However, not every organization have the luxury to have top notch experts in the field. Now above issues are related to infrastructure, but there are a few more problems which are related to software/application as well. Here are the top three things which can be problems if you do not have application expert: Replication and Clustering Simple provisioning of the hard drive space Automatic Sharding Well, Morpheus looks like a product build by experts who have faced similar situation in the past. The product pretty much addresses all the pain points of developers and database administrators. What is different about Morpheus is that it offers a variety of databases from MySQL, MongoDB, ElasticSearch to Reddis as a service. Thus users can pick and chose any combination of these databases. All of them can be provisioned in a matter of minutes with a simple and intuitive point and click user interface. The Morpheus cloud is built on Solid State Drives (SSD) and is designed for high-speed database transactions. In addition it offers a direct link to Amazon Web Services to minimize latency between the application layer and the databases. Here are the few steps on how one can get started with Morpheus. Follow along with me. First go to http://www.gomorpheus.com and register for a new and free account. Step 1: Signup It is very simple to signup for Morpheus. Step 2: Select your database I use MySQL for my daily routine, so I have selected MySQL. Upon clicking on the big red button to add Instance, it prompted a dialogue of creating a new instance. Step 3: Create User Now we just have to create a user in our portal which we will use to connect to a database hosted at Morpheus. Click on your database instance and it will bring you to User Screen. Over here you will notice once again a big red button to create a new user. I created a user with my first name. Step 4: Configure your MySQL client I used MySQL workbench and connected to MySQL instance, which I had created with an IP address and user. That’s it! You are connecting to MySQL instance. Now you can create your objects just like you would create on your local box. You will have all the features of the Morpheus when you are working with your database. Dashboard While working with Morpheus, I was most impressed with its dashboard. In future blog posts, I will write more about this feature. Also with Morpheus you use the same process for provisioning and connecting with other databases: MongoDB, ElasticSearch and Reddis. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: MySQL, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Beginning Big Data Series Next Month in 21 Parts

- by Pinal Dave

Big Data is the next big thing. There was a time when we used to talk in terms of MB and GB of the data. However, the industry is changing and we are now moving to a conversation where we discuss about data in Petabyte, Exabyte and Zettabyte. It seems that the world is now talking about increased Volume of the data. In simple world we all think that Big Data is nothing but plenty of volume. In reality Big Data is much more than just a huge volume of the data. When talking about the data we need to understand about variety and volume along with volume. Though Big data look like a simple concept, it is extremely complex subject when we attempt to start learning the same. My Journey I have recently presented on Big Data in quite a few organizations and I have received quite a few questions during this roadshow event. I have collected all the questions which I have received and decided to post about them on the blog. In the month of October 2013, on every weekday we will be learning something new about Big Data. Every day I will share a concept/question and in the same blog post we will learn the answer of the same. Big Data – Plenty of Questions I received quite a few questions during my road trip. Here are few of the questions. I want to learn Big Data – where should I start? Do I need to know SQL to learn Big Data? What is Hadoop? There are so many organizations talking about Big Data, and every one has a different approach. How to start with big Data? Do I need to know Java to learn about Big Data? What is different between various NoSQL languages. I will attempt to answer most of the questions during the month long series in the next month. Big Data – Big Subject Big Data is a very big subject and I no way claim that I will be covering every single big data concept in this series. However, I promise that I will be indeed sharing lots of basic concepts which are revolving around Big Data. We will discuss from fundamentals about Big Data and continue further learning about it. I will attempt to cover the concept so simple that many of you might have wondered about it but afraid to ask. Your Role! During this series next month, I need your one help. Please keep on posting questions you might have related to big data as blog post comments and on Facebook Page. I will monitor them closely and will try to answer them as well during this series. Now make sure that you do not miss any single blog post in this series as every blog post will be linked to each other. You can subscribe to my feed or like my Facebook page or subscribe via email (by entering email in the blog post). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Big Data, PostADay, SQL, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Operational Databases Supporting Big Data – Key-Value Pair Databases and Document Databases – Day 13 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Relational Database and NoSQL database in the Big Data Story. In this article we will understand the role of Key-Value Pair Databases and Document Databases Supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (Yesterday’s post) NoSQL Databases (Yesterday’s post) Key-Value Pair Databases (This post) Document Databases (This post) Columnar Databases (Tomorrow’s post) Graph Databases (Tomorrow’s post) Spatial Databases (Tomorrow’s post) Key Value Pair Databases Key Value Pair Databases are also known as KVP databases. A key is a field name and attribute, an identifier. The content of that field is its value, the data that is being identified and stored. They have a very simple implementation of NoSQL database concepts. They do not have schema hence they are very flexible as well as scalable. The disadvantages of Key Value Pair (KVP) database are that they do not follow ACID (Atomicity, Consistency, Isolation, Durability) properties. Additionally, it will require data architects to plan for data placement, replication as well as high availability. In KVP databases the data is stored as strings. Here is a simple example of how Key Value Database will look like: Key Value Name Pinal Dave Color Blue Twitter @pinaldave Name Nupur Dave Movie The Hero As the number of users grow in Key Value Pair databases it starts getting difficult to manage the entire database. As there is no specific schema or rules associated with the database, there are chances that database grows exponentially as well. It is very crucial to select the right Key Value Pair Database which offers an additional set of tools to manage the data and provides finer control over various business aspects of the same. Riak Rick is one of the most popular Key Value Database. It is known for its scalability and performance in high volume and velocity database. Additionally, it implements a mechanism for collection key and values which further helps to build manageable system. We will further discuss Riak in future blog posts. Key Value Databases are a good choice for social media, communities, caching layers for connecting other databases. In simpler words, whenever we required flexibility of the data storage keeping scalability in mind – KVP databases are good options to consider. Document Database There are two different kinds of document databases. 1) Full document Content (web pages, word docs etc) and 2) Storing Document Components for storage. The second types of the document database we are talking about over here. They use Javascript Object Notation (JSON) and Binary JSON for the structure of the documents. JSON is very easy to understand language and it is very easy to write for applications. There are two major structures of JSON used for Document Database – 1) Name Value Pairs and 2) Ordered List. MongoDB and CouchDB are two of the most popular Open Source NonRelational Document Database. MongoDB MongoDB databases are called collections. Each collection is build of documents and each document is composed of fields. MongoDB collections can be indexed for optimal performance. MongoDB ecosystem is highly available, supports query services as well as MapReduce. It is often used in high volume content management system. CouchDB CouchDB databases are composed of documents which consists fields and attachments (known as description). It supports ACID properties. The main attraction points of CouchDB are that it will continue to operate even though network connectivity is sketchy. Due to this nature CouchDB prefers local data storage. Document Database is a good choice of the database when users have to generate dynamic reports from elements which are changing very frequently. A good example of document usages is in real time analytics in social networking or content management system. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL – Step by Step Guide to Download and Install NuoDB – Getting Started with NuoDB

- by Pinal Dave

Let us take a look at the application you own at your business. If you pay attention to the underlying database for that application you will be amazed. Every successful business these days processes way more data than they used to process before. The number of transactions and the amount of data is growing at an exponential rate. Every single day there is way more data to process than before. Big data is no longer a concept; it is now turning into reality. If you look around there are so many different big data solutions and it can be a quite difficult task to figure out where to begin. Personally, I have been experimenting with a lot of different solutions which allow my database to scale immediately without much hassle while maintaining optimal database performance. There are for sure some solutions out there, but for many I even have to learn their specific language and there is a lot of new exploration to do. Honestly, what I prefer is a product, which works with the language I know (SQL) and follows all the RDBMS concepts which I am familiar with (ACID etc.). NuoDB is one such solution. It is an operational NewSQL database built on a patented emergent architecture with full support for SQL and ACID guarantees. In this blog post, I will explore how one can download and install NuoDB database. Step 1: Follow me and go to the NuoDB download page. Simply fill out the form, accept the online license agreement, and you will be taken directly to a page where you can select any platform you prefer to install NuoDB. In my example below, I select the Windows 64-bit platform as it is one of the most popular NuoDB platforms. (You can also run NuoDB on Amazon Web Services but I prefer to install it on my local machine for the purposes of this blog). Step 2: Once you have downloaded the NuoDB installer, double click on it to install it on the Windows platform. Here is the enlarged the icon of the installer. Step 3: Follow the wizard installation, as it is pretty straight forward and easy to do so. I have selected all the options to install as the overall installation is very simple and it does not take up much space. I have installed it on my C drive but you can select your preferred drive. It is quite possible that if you do not have 64 bit Java, it will throw following error. If you face following error, I suggest you to download 64-bit Java from here. Make sure that you download 64-bit Java from following link: http://java.com/en/download/manual.jsp If already have Java 64-bit installed, you can continue with the installation as described in following image. Otherwise, install Java and start from with Step 1. As in my case, I already have 64-bit Java installed – and you won’t believe me when I say that the entire installation of NuoDB only took me around 90 seconds. Click on Finish to end to exit the installation. Step 4: Once the installation is successful, NuoDB will automatically open the following two tabs – Console and DevCenter — in your preferred browser. On the Console tab you can explore various components of the NuoDB solution, e.g. QuickStart, Admin, Explorer, Storefront and Samples. We will see various components and their usage in future blog posts. If you follow these steps in this post, which I have followed to install NuoDB, you will agree that the installation of NuoDB is extremely smooth and it was indeed a pleasure to install a database product with such ease. If you have installed other database products in the past, you will absolutely agree with me. So download NuoDB and install it today, and in tomorrow’s blog post I will take the installation to the next level. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

Read the article
Big Data – Basics of Big Data Architecture – Day 4 of 21

- by Pinal Dave

In yesterday’s blog post we understood how Big Data evolution happened. Today we will understand basics of the Big Data Architecture. Big Data Cycle Just like every other database related applications, bit data project have its development cycle. Though three Vs (link) for sure plays an important role in deciding the architecture of the Big Data projects. Just like every other project Big Data project also goes to similar phases of the data capturing, transforming, integrating, analyzing and building actionable reporting on the top of the data. While the process looks almost same but due to the nature of the data the architecture is often totally different. Here are few of the question which everyone should ask before going ahead with Big Data architecture. Questions to Ask How big is your total database? What is your requirement of the reporting in terms of time – real time, semi real time or at frequent interval? How important is the data availability and what is the plan for disaster recovery? What are the plans for network and physical security of the data? What platform will be the driving force behind data and what are different service level agreements for the infrastructure? This are just basic questions but based on your application and business need you should come up with the custom list of the question to ask. As I mentioned earlier this question may look quite simple but the answer will not be simple. When we are talking about Big Data implementation there are many other important aspects which we have to consider when we decide to go for the architecture. Building Blocks of Big Data Architecture It is absolutely impossible to discuss and nail down the most optimal architecture for any Big Data Solution in a single blog post, however, we can discuss the basic building blocks of big data architecture. Here is the image which I have built to explain how the building blocks of the Big Data architecture works. Above image gives good overview of how in Big Data Architecture various components are associated with each other. In Big Data various different data sources are part of the architecture hence extract, transform and integration are one of the most essential layers of the architecture. Most of the data is stored in relational as well as non relational data marts and data warehousing solutions. As per the business need various data are processed as well converted to proper reports and visualizations for end users. Just like software the hardware is almost the most important part of the Big Data Architecture. In the big data architecture hardware infrastructure is extremely important and failure over instances as well as redundant physical infrastructure is usually implemented. NoSQL in Data Management NoSQL is a very famous buzz word and it really means Not Relational SQL or Not Only SQL. This is because in Big Data Architecture the data is in any format. It can be unstructured, relational or in any other format or from any other data source. To bring all the data together relational technology is not enough, hence new tools, architecture and other algorithms are invented which takes care of all the kind of data. This is collectively called NoSQL. Tomorrow Next four days we will answer the Buzz Words – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Basics of Big Data Analytics – Day 18 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the various components in Big Data Story. In this article we will understand what are the various analytics tasks we try to achieve with the Big Data and the list of the important tools in Big Data Story. When you have plenty of the data around you what is the first thing which comes to your mind? “What do all these data means?” Exactly – the same thought comes to my mind as well. I always wanted to know what all the data means and what meaningful information I can receive out of it. Most of the Big Data projects are built to retrieve various intelligence all this data contains within it. Let us take example of Facebook. When I look at my friends list of Facebook, I always want to ask many questions such as - On which date my maximum friends have a birthday? What is the most favorite film of my most of the friends so I can talk about it and engage them? What is the most liked placed to travel my friends? Which is the most disliked cousin for my friends in India and USA so when they travel, I do not take them there. There are many more questions I can think of. This illustrates that how important it is to have analysis of Big Data. Here are few of the kind of analysis listed which you can use with Big Data. Slicing and Dicing: This means breaking down your data into smaller set and understanding them one set at a time. This also helps to present various information in a variety of different user digestible ways. For example if you have data related to movies, you can use different slide and dice data in various formats like actors, movie length etc. Real Time Monitoring: This is very crucial in social media when there are any events happening and you wanted to measure the impact at the time when the event is happening. For example, if you are using twitter when there is a football match, you can watch what fans are talking about football match on twitter when the event is happening. Anomaly Predication and Modeling: If the business is running normal it is alright but if there are signs of trouble, everyone wants to know them early on the hand. Big Data analysis of various patterns can be very much helpful to predict future. Though it may not be always accurate but certain hints and signals can be very helpful. For example, lots of data can help conclude that if there is lots of rain it can increase the sell of umbrella. Text and Unstructured Data Analysis: unstructured data are now getting norm in the new world and they are a big part of the Big Data revolution. It is very important that we Extract, Transform and Load the unstructured data and make meaningful data out of it. For example, analysis of lots of images, one can predict that people like to use certain colors in certain months in their cloths. Big Data Analytics Solutions There are many different Big Data Analystics Solutions out in the market. It is impossible to list all of them so I will list a few of them over here. Tableau – This has to be one of the most popular visualization tools out in the big data market. SAS – A high performance analytics and infrastructure company IBM and Oracle – They have a range of tools for Big Data Analysis Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Data Scientist. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Trying to install Team viewer on Ubuntu 12.04

- by Teknikk

I recently got Ubuntu installed on my server, I wanted to install TeamViewer so i could easy manage the virtual machines, However, I get errors when installing it from App store?, And I also get errors, but more detailed on the terminal. Error output: tek@tek-G53SW:~/Download$ sudo dpkg -i ipts teamviewer_linux_x64.deb dpkg: error processing ipts (--install): cannot access archive: No such file or directory (Reading database ... 142115 files and directories currently installed.) Preparing to replace teamviewer7 7.0.9360 (using teamviewer_linux_x64.deb) ... Unpacking replacement teamviewer7 ... dpkg: dependency problems prevent configuration of teamviewer7: teamviewer7 depends on libc6-i386 (>= 2.7); however: Package libc6-i386 is not installed. teamviewer7 depends on lib32asound2; however: Package lib32asound2 is not installed. teamviewer7 depends on lib32z1; however: Package lib32z1 is not installed. teamviewer7 depends on ia32-libs; however: Package ia32-libs is not installed. dpkg: error processing teamviewer7 (--install): dependency problems - leaving unconfigured Errors were encountered while processing: ipts teamviewer7 I tried to install it manually, but with no luck, I heard some others has this problems. I am running Ubuntu 12.04 x64. Error @ sudo apt-get install libc6-i386 lib32asound2 lib32z1 ia32-libs : tek@tek-G53SW:~/Download$ sudo apt-get install libc6-i386 lib32asound2 lib32z1 ia32-libs Reading package lists... Done Building dependency tree Reading state information... Done You might want to run 'apt-get -f install' to correct these: The following packages have unmet dependencies: ia32-libs : Depends: ia32-libs-multiarch E: Unmet dependencies. Try 'apt-get -f install' with no packages (or specify a solution). tek@tek-G53SW:~/Download$ More errors tek@tek-G53SW:~/Download$ sudo apt-get -f install [sudo] password for tek: Reading package lists... Done Building dependency tree Reading state information... Done Correcting dependencies... Done The following packages will be REMOVED: teamviewer7 0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded. 1 not fully installed or removed. After this operation, 81.9 MB disk space will be freed. Do you want to continue [Y/n]? y (Reading database ... 142441 files and directories currently installed.) Removing teamviewer7 ... tek@tek-G53SW:~/Download$ sudo apt-get install libc6-i386 lib32asound2 lib32z1 ia32-libs Reading package lists... Done Building dependency tree Reading state information... Done lib32z1 is already the newest version. libc6-i386 is already the newest version. lib32asound2 is already the newest version. Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: ia32-libs : Depends: ia32-libs-multiarch E: Unable to correct problems, you have held broken packages. tek@tek-G53SW:~/Download$ sudo apt-get install ia32-libs-multiarch Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: ia32-libs-multiarch:i386 : Depends: gstreamer0.10-plugins-good:i386 but it is not going to be installed Depends: gtk2-engines:i386 but it is not going to be installed Depends: gtk2-engines-murrine:i386 but it is not going to be installed Depends: gtk2-engines-pixbuf:i386 but it is not going to be installed Depends: gtk2-engines-oxygen:i386 but it is not going to be installed Depends: ibus-gtk:i386 but it is not going to be installed Depends: libcanberra-gtk-module:i386 but it is not going to be installed Depends: libcups2:i386 but it is not going to be installed Depends: libcupsimage2:i386 but it is not going to be installed Depends: libfontconfig1:i386 but it is not going to be installed Depends: libgail-common:i386 but it is not going to be installed Depends: libgphoto2-2:i386 but it is not going to be installed Depends: libgtk2.0-0:i386 but it is not going to be installed Depends: libnss3:i386 but it is not going to be installed Depends: libqt4-opengl:i386 but it is not going to be installed Depends: libqt4-qt3support:i386 but it is not going to be installed Depends: libqt4-scripttools:i386 but it is not going to be installed Depends: libqt4-svg:i386 but it is not going to be installed Depends: libqtgui4:i386 but it is not going to be installed Depends: libqtwebkit4:i386 but it is not going to be installed Depends: librsvg2-common:i386 but it is not going to be installed Depends: libsane:i386 but it is not going to be installed E: Unable to correct problems, you have held broken packages. tek@tek-G53SW:~/Download$

Read the article
SQL – Business Intelligence: Derive Data or Information?

- by Pinal Dave

We all know the value of information in our lives. Whether it’s a personal decision or a business initiated one, people need it. But the question is: who is to make the distinction between data and information? We all come across a whole lot of data daily, that may be significant or not. We filter what’s required and forget about the rest. Information is filtered and distilled data. Filtering and distillation can also alter its actual meaning and natural state. Therefore, in this blog we discover some ways to ensure that we’re using business intelligence derived from the right information for making critical management decisions. Four key questions managers must ask themselves before making a decision: 1. Am I working with data or information? 2. What is it’s context? 3. How recent is it? 4. How was it derived or what is the source? The first question is probably the most important. You must know what you’re dealing with here. If you see use of adjectives and conclusions drawn, it’s information. Not raw data. You very next concern must be whether this is guised to present a particular viewpoint or perspective. It makes a lot of difference if you take a decision based on someone’s propaganda to distort real facts. Therefore, the context and the intentions of the distillation process must be clear to you. The next consideration is whether data is recent enough to hold any value. Since it has a very short shelf life, you must ensure that its context and value is not lost out of time. The last and the most important consideration is how was it derived in the first place. The observer effect is what calls the shots here. The source can change the context to a great extent if the collection methodology and purpose is not clear. Gathering intelligence for decision making requires users to be keen observers and not take the information provided on its face value alone. These probing questions will allow you to make sure that you’re working with clean and accurate data devoid of any influence or manipulations. Only then can you be sure of deriving true business intelligence for your organization. BI technology is also a great way to ensure accuracy of reports. SQL BI Platform provides advanced tools and techniques for all your BI needs and concerns. Koenig Solutions offers this course along with a host of other Business Intelligence and IT courses on all latest technologies available in the market today. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Can't get bonding and bridging to work for KVM

- by user9546

Hi everyone. I can't for the life of me get bonding and bridging to work for the KVM setup I'm building. I'm using a fresh install (not an upgrade) of Ubuntu Server 10.10. I have 4 NICs on the same subnet (two intended for each of my two VMs). I'm trying to achieve the setup that Uthark describes here. But following his guidelines didn't work for me. My eth0 and eth1 did not come up, and "brctl show" showed that br0 didn't have any interfaces (the bond). I assumed it didn't work because he's using 10.4, and this article says there's a recent change in bonding: [I can't post more than one hyperlink per post because I'm a newbie.] I had to use this article to get my interfaces to work at all on the same subnet, which is why I have the post-up lines on some of my interfaces: [I can't post more than one hyperlink per post because I'm a newbie.] I installed ifenslave and ethtool. I also created /etc/modprobe.d/aliases.conf with the following content: alias bond0 bonding options bonding mode=6 miimon=100 downdelay=200 updelay=200 And I included "bonding" in /etc/modules So, after several approaches, here is my latest interfaces file: auto lo iface lo inet loopback auto eth5 iface eth5 inet manual auto br5 iface br5 inet static post-up /sbin/ip rule add from [network].79 lookup 10 post-up /sbin/ip route add table 10 default via [network].1 src [network].79 dev br5 address [network].79 netmask 255.255.255.0 network [network].0 broadcast [network].255 gateway [network].1 bridge_ports eth5 bridge_stp off bridge_fd 0 bridge_maxwait 0 auto eth2 iface eth2 inet manual auto br2 iface br2 inet static post-up /sbin/ip rule add from [network].78 lookup 11 post-up /sbin/ip route add table 11 default via [network].1 src [network].78 dev br2 address [network].78 netmask 255.255.255.0 network [network].0 broadcast [network].255 gateway [network].1 bridge_ports eth2 bridge_stp off bridge_fd 0 bridge_maxwait 0 iface eth0 inet manual iface eth1 inet manual auto bond0 iface bond0 inet static bond_miimon 100 bond_mode balance-alb up /sbin/ifenslave bond0 eth0 eth1 down /sbin/ifenslave -d bond0 eth0 eth1 auto br0 iface br0 inet static address [network].60 netmask 255.255.255.0 network [network].0 broadcast [network].255 gateway [network].1 bridge_ports bond0 eth2, eth5, br2, and br5 all seem to be working fine. The only other thing I could find that looked suspicious is an error regarding bonding in /var/log/messages: kernel: [ 3.828684] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details. even though there is a bond-miimon line in /etc/network/interfaces (if that's what they're talking about). Also, the bond seems to go in and out of promiscuous mode several times on boot: Jan 20 14:19:02 kvmhost kernel: [ 3.902378] device bond0 entered promiscuous mode Jan 20 14:19:02 kvmhost kernel: [ 3.902390] device bond0 left promiscuous mode Jan 20 14:19:02 kvmhost kernel: [ 3.902393] device bond0 entered promiscuous mode Jan 20 14:19:02 kvmhost kernel: [ 3.902397] device bond0 left promiscuous mode Jan 20 14:19:03 kvmhost kernel: [ 4.998990] device bond0 entered promiscuous mode Jan 20 14:19:03 kvmhost kernel: [ 4.999005] device bond0 left promiscuous mode Jan 20 14:19:03 kvmhost kernel: [ 4.999008] device bond0 entered promiscuous mode Jan 20 14:19:03 kvmhost kernel: [ 4.999012] device bond0 left promiscuous mode Any advice would be greatly appreciated. It seems that this must be possible, based on other posts, but I can't see what I'm doing wrong. Thanks.

Read the article
Big Data – How to become a Data Scientist and Learn Data Science? – Day 19 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the analytics in Big Data Story. In this article we will understand how to become a Data Scientist for Big Data Story. Data Scientist is a new buzz word, everyone seems to be wanting to become Data Scientist. Let us go over a few key topics related to Data Scientist in this blog post. First of all we will understand what is a Data Scientist. In the new world of Big Data, I see pretty much everyone wants to become Data Scientist and there are lots of people I have already met who claims that they are Data Scientist. When I ask what is their role, I have got a wide variety of answers. What is Data Scientist? Data scientists are the experts who understand various aspects of the business and know how to strategies data to achieve the business goals. They should have a solid foundation of various data algorithms, modeling and statistics methodology. What do Data Scientists do? Data scientists understand the data very well. They just go beyond the regular data algorithms and builds interesting trends from available data. They innovate and resurrect the entire new meaning from the existing data. They are artists in disguise of computer analyst. They look at the data traditionally as well as explore various new ways to look at the data. Data Scientists do not wait to build their solutions from existing data. They think creatively, they think before the data has entered into the system. Data Scientists are visionary experts who understands the business needs and plan ahead of the time, this tremendously help to build solutions at rapid speed. Besides being data expert, the major quality of Data Scientists is “curiosity”. They always wonder about what more they can get from their existing data and how to get maximum out of future incoming data. Data Scientists do wonders with the data, which goes beyond the job descriptions of Data Analysist or Business Analysist. Skills Required for Data Scientists Here are few of the skills a Data Scientist must have. Expert level skills with statistical tools like SAS, Excel, R etc. Understanding Mathematical Models Hands-on with Visualization Tools like Tableau, PowerPivots, D3. j’s etc. Analytical skills to understand business needs Communication skills On the technology front any Data Scientists should know underlying technologies like (Hadoop, Cloudera) as well as their entire ecosystem (programming language, analysis and visualization tools etc.) . Remember that for becoming a successful Data Scientist one require have par excellent skills, just having a degree in a relevant education field will not suffice. Final Note Data Scientists is indeed very exciting job profile. As per research there are not enough Data Scientists in the world to handle the current data explosion. In near future Data is going to expand exponentially, and the need of the Data Scientists will increase along with it. It is indeed the job one should focus if you like data and science of statistics. Courtesy: emc Tomorrow In tomorrow’s blog post we will discuss about various Big Data Learning resources. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQLAuthority News – Stay Connected and Social Media

- by pinaldave

I think I have finally gotten back my faith in social media. If you are following my blog I am sure you are aware of my views on social media – SQLAuthority News – Social Media Confusion – Twitter, FaceBook, LinkedIn and Me. I was not happy about how social media was evolving. Whenever I go to Twitter, LinkedIn or Facebook, I noticed the same updates everywhere. I just thought I was wasting my time doing the same thing everywhere. I strongly believe that there is no dictator on internet. Nobody has authority over others, everybody can express their ideas as long as it is not violating others privacy and it is not morally wrong. I have decided that instead of trying to improve the world, I should change myself and adjust my needs. Here are few things I have done to relieve my social media confusion. Twitter I un-followed people who were taking up my time with too many updates. I un-followed people who hardly updated at all. I did not follow anybody else’s list, as I have no control over who other people follow. I follow not only serious SQL people but some fun stuff as well. I removed all my friends who were on Facebook and repeating the same updates on Twitter. I engage with them on Facebook. I followed people who are very conversational on Twitter. I let anybody follow me. I update all my blog posts through at least five tweets online. I decided to re-tweet at least five of my favorite tweets of the day, this way I force myself to remain active in the community. Follow me on Twitter! LinkedIn I updated my career and professional info on LinkedIn. I keep my LinkedIn profile updated with my latest jobs and career news. I let anybody connect with me on LinkedIn. I specify my email address in my profile, keeping it easy for those who want to add me. I read all the profile related updates of my connections – it is very valuable to know who is where and what changes are happening. I do not add my personal tweets or comments in LinkedIn profile. I just keep it professional. Link with me at LinkedIn Facebook I use Facebook only for personal friends. I visit all of my friends at regular intervals and make sure that they are really my friends. I often remove my friends from my Twitter list who are sending duplicate updates. I upload my family photos as well as family updates on Facebook, making sure that only my approved friends are able to read my updates. I keep my Facebook very personal and I often chat with my friends on Facebook chat. I am no longer confused about social media and I think I am using it appropriately. As I said, one cannot decide for others how to use social media, you can only decide for yourself. I have finally found my peace with social media. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Professional Development – Difference Between Bio, CV and Resume

- by Pinal Dave

Applying for work can be very stressful – you want to put your best foot forward, and it can be very hard to sell yourself to a potential employer while highlighting your best characteristics and answering questions. On top of that, some jobs require different application materials – a biography (or bio), a curriculum vitae (or CV), or a resume. These things seem so interchangeable, so what is the difference? Let’s start with the one most of us have heard of – the resume. A resume is a summary of your job and education history. If you have ever applied for a job, you will have used a resume. The ability to write a good resume that highlights your best characteristics and emphasizes your qualifications for a specific job is a skill that will take you a long way in the world. For such an essential skill, unfortunately it is one that many people struggle with. RESUME So let’s discuss what makes a great resume. First, make sure that your name and contact information are at the top, in large print (slightly larger font than the rest of the text, size 14 or 16 if the rest is size 12, for example). You need to make sure that if you catch the recruiter’s attention and they know how to get a hold of you. As for qualifications, be quick and to the point. Make your job title and the company the headline, and include your skills, accomplishments, and qualifications as bullet points. Use good action verbs, like “finished,” “arranged,” “solved,” and “completed.” Include hard numbers – don’t just say you “changed the filing system,” say that you “revolutionized the storage of over 250 files in less than five days.” Doesn’t that sentence sound much more powerful? Curriculum Vitae (CV) Now let’s talk about curriculum vitae, or “CVs”. A CV is more like an expanded resume. The same rules are still true: put your name front and center, keep your contact info up to date, and summarize your skills with bullet points. However, CVs are often required in more technical fields – like science, engineering, and computer science. This means that you need to really highlight your education and technical skills. Difference between Resume and CV Resumes are expected to be one or two pages long – CVs can be as many pages as necessary. If you are one of those people lucky enough to feel limited by the size constraint of resumes, a CV is for you! On a CV you can expand on your projects, highlight really exciting accomplishments, and include more educational experience – including GPA and test scores from the GRE or MCAT (as applicable). You can also include awards, associations, teaching and research experience, and certifications. A CV is a place to really expand on all your experience and how great you will be in this particular position. Biography (Bio) Chances are, you already know what a bio is, and you have even read a few of them. Think about the one or two paragraphs that every author includes in the back flap of a book. Think about the sentences under a blogger’s photo on every “About Me” page. That is a bio. It is a way to quickly highlight your life experiences. It is essentially the way you would introduce yourself at a party. Where a bio is required for a job, chances are they won’t want to know about where you were born and how many pets you have, though. This is a way to summarize your entire job history in quick-to-read format – and sometimes during a job hunt, being able to get to the point and grab the recruiter’s interest is the best way to get your foot in the door. Think of a bio as your entire resume put into words. Most bios have a standard format. In paragraph one, talk about your most recent position and accomplishments there, specifically how they relate to the job you are applying for. If you have teaching or research experience, training experience, certifications, or management experience, talk about them in paragraph two. Paragraph three and four are for highlighting publications, education, certifications, associations, etc. To wrap up your bio, provide your contact info and availability (dates and times). Where to use What? For most positions, you will know exactly what kind of application to use, because the job announcement will state what materials are needed – resume, CV, bio, cover letter, skill set, etc. If there is any confusion, choose whatever the industry standard is (CV for technical fields, resume for everything else) or choose which of your documents is the strongest. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: About Me, PostADay, Professional Development, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
How Visual Studio 2010 and Team Foundation Server enable Compliance

- by Martin Hinshelwood

One of the things that makes Team Foundation Server (TFS) the most powerful Application Lifecycle Management (ALM) platform is the traceability it provides to those that use it. This traceability is crucial to enable many companies to adhere to many of the Compliance regulations to which they are bound (e.g. CFR 21 Part 11 or Sarbanes–Oxley.) From something as simple as relating Tasks to Check-in’s or being able to see the top 10 files in your codebase that are causing the most Bugs, to identifying which Bugs and Requirements are in which Release. All that information is available and more in TFS. Although all of this tradability is available within TFS you do need to understand that it is not for free. Well… I say that, but if you are using TFS properly you will have this information with no additional work except for firing up the reporting. Using Visual Studio ALM and Team Foundation Server you can relate every line of code changes all the way up to requirements and back down through Test Cases to the Test Results. Figure: The only thing missing is Build In order to build the relationship model below we need to examine how each of the relationships get there. Each member of your team from programmer to tester and Business Analyst to Business have their roll to play to knit this together. Figure: The relationships required to make this work can get a little confusing If Build is added to this to relate Work Items to Builds and with knowledge of which builds are in which environments you can easily identify what is contained within a Release. Figure: How are things progressing Along with the ability to produce the progress and trend reports the tractability that is built into TFS can be used to fulfil most audit requirements out of the box, and augmented to fulfil the rest. In order to understand the relationships, lets look at each of the important Artifacts and how they are associated with each other… Requirements – The root of all knowledge Requirements are the thing that the business cares about delivering. These could be derived as User Stories or Business Requirements Documents (BRD’s) but they should be what the Business asks for. Requirements can be related to many of the Artifacts in TFS, so lets look at the model: Figure: If the centre of the world was a requirement We can track which releases Requirements were scheduled in, but this can change over time as more details come to light. Figure: Who edited the Requirement and when There is also the ability to query Work Items based on the History of changed that were made to it. This is particularly important with Requirements. It might not be enough to say what Requirements were completed in a given but also to know which Requirements were ever assigned to a particular release. Figure: Some magic required, but result still achieved As an augmentation to this it is also possible to run a query that shows results from the past, just as if we had a time machine. You can take any Query in the system and add a “Asof” clause at the end to query historical data in the operational store for TFS. select <fields> from WorkItems [where <condition>] [order by <fields>] [asof <date>] Figure: Work Item Query Language (WIQL) format In order to achieve this you do need to save the query as a *.wiql file to your local computer and edit it in notepad, but one imported into TFS you run it any time you want. Figure: Saving Queries locally can be useful All of these Audit features are available throughout the Work Item Tracking (WIT) system within TFS. Tasks – Where the real work gets done Tasks are the work horse of the development team, but they only as useful as Excel if you do not relate them properly to other Artifacts. Figure: The Task Work Item Type has its own relationships Requirements should be broken down into Tasks that the development team work from to build what is required by the business. This may be done by a small dedicated group or by everyone that will be working on the software team but however it happens all of the Tasks create should be a Child of a Requirement Work Item Type. Figure: Tasks are related to the Requirement Tasks should be used to track the day-to-day activities of the team working to complete the software and as such they should be kept simple and short lest developers think they are more trouble than they are worth. Figure: Task Work Item Type has a narrower purpose Although the Task Work Item Type describes the work that will be done the actual development work involves making changes to files that are under Source Control. These changes are bundled together in a single atomic unit called a Changeset which is committed to TFS in a single operation. During this operation developers can associate Work Item with the Changeset. Figure: Tasks are associated with Changesets Changesets – Who wrote this crap Changesets themselves are just an inventory of the changes that were made to a number of files to complete a Task. Figure: Changesets are linked by Tasks and Builds Figure: Changesets tell us what happened to the files in Version Control Although comments can be changed after the fact, the inventory and Work Item associations are permanent which allows us to Audit all the way down to the individual change level. Figure: On Check-in you can resolve a Task which automatically associates it Because of this we can view the history on any file within the system and see how many changes have been made and what Changesets they belong to. Figure: Changes are tracked at the File level What would be even more powerful would be if we could view these changes super imposed over the top of the lines of code. Some people call this a blame tool because it is commonly used to find out which of the developers introduced a bug, but it can also be used as another method of Auditing changes to the system. Figure: Annotate shows the lines the Annotate functionality allows us to visualise the relationship between the individual lines of code and the Changesets. In addition to this you can create a Label and apply it to a version of your version control. The problem with Label’s is that they can be changed after they have been created with no tractability. This makes them practically useless for any sort of compliance audit. So what do you use? Branches – And why we need them Branches are a really powerful tool for development and release management, but they are most important for audits. Figure: One way to Audit releases The R1.0 branch can be created from the Label that the Build creates on the R1 line when a Release build was created. It can be created as soon as the Build has been signed of for release. However it is still possible that someone changed the Label between this time and its creation. Another better method can be to explicitly link the Build output to the Build. Builds – Lets tie some more of this together Builds are the glue that helps us enable the next level of tractability by tying everything together. Figure: The dashed pieces are not out of the box but can be enabled When the Build is called and starts it looks at what it has been asked to build and determines what code it is going to get and build. Figure: The folder identifies what changes are included in the build The Build sets a Label on the Source with the same name as the Build, but the Build itself also includes the latest Changeset ID that it will be building. At the end of the Build the Build Agent identifies the new Changesets it is building by looking at the Check-ins that have occurred since the last Build. Figure: What changes have been made since the last successful Build It will then use that information to identify the Work Items that are associated with all of the Changesets Changesets are associated with Build and change the “Integrated In” field of those Work Items . Figure: Find all of the Work Items to associate with The “Integrated In” field of all of the Work Items identified by the Build Agent as being integrated into the completed Build are updated to reflect the Build number that successfully integrated that change. Figure: Now we know which Work Items were completed in a build Now that we can link a single line of code changed all the way back through the Task that initiated the action to the Requirement that started the whole thing and back down to the Build that contains the finished Requirement. But how do we know wither that Requirement has been fully tested or even meets the original Requirements? Test Cases – How we know we are done The only way we can know wither a Requirement has been completed to the required specification is to Test that Requirement. In TFS there is a Work Item type called a Test Case Test Cases enable two scenarios. The first scenario is the ability to track and validate Acceptance Criteria in the form of a Test Case. If you agree with the Business a set of goals that must be met for a Requirement to be accepted by them it makes it both difficult for them to reject a Requirement when it passes all of the tests, but also provides a level of tractability and validation for audit that a feature has been built and tested to order. Figure: You can have many Acceptance Criteria for a single Requirement It is crucial for this to work that someone from the Business has to sign-off on the Test Case moving from the “Design” to “Ready” states. The Second is the ability to associate an MS Test test with the Test Case thereby tracking the automated test. This is useful in the circumstance when you want to Track a test and the test results of a Unit Test designed to test the existence of and then re-existence of a a Bug. Figure: Associating a Test Case with an automated Test Although it is possible it may not make sense to track the execution of every Unit Test in your system, there are many Integration and Regression tests that may be automated that it would make sense to track in this way. Bug – Lets not have regressions In order to know wither a Bug in the application has been fixed and to make sure that it does not reoccur it needs to be tracked. Figure: Bugs are the centre of their own world If the fix to a Bug is big enough to require that it is broken down into Tasks then it is probably a Requirement. You can associate a check-in with a Bug and have it tracked against a Build. You would also have one or more Test Cases to prove the fix for the Bug. Figure: Bugs have many associations This allows you to track Bugs / Defects in your system effectively and report on them. Change Request – I am not a feature In the CMMI Process template Change Requests can also be easily tracked through the system. In some cases it can be very important to track Change Requests separately as an Auditor may want to know what was changed and who authorised it. Again and similar to Bugs, if the Change Request is big enough that it would require to be broken down into Tasks it is in reality a new feature and should be tracked as a Requirement. Figure: Make sure your Change Requests only Affect Requirements and not rewrite them Conclusion Visual Studio 2010 and Team Foundation Server together provide an exceptional Application Lifecycle Management platform that can help your team comply with even the harshest of Compliance requirements while still enabling them to be Agile. Most Audits are heavy on required documentation but most of that information is captured for you as long a you do it right. You don’t even need every team member to understand it all as each of the Artifacts are relevant to a different type of team member. Business Analysts manage Requirements and Change Requests Programmers manage Tasks and check-in against Change Requests and Bugs Testers manage Bugs and Test Cases Build Masters manage Builds Although there is some crossover there are still rolls or “hats” that are worn. Do you thing this is all achievable? Have I missed anything that you think should be there?

Read the article
SQL Table stored as a Heap - the dangers within

- by MikeD

Nearly all of the time I create a table, I include a primary key, and often that PK is implemented as a clustered index. Those two don't always have to go together, but in my world they almost always do. On a recent project, I was working on a data warehouse and a set of SSIS packages to import data from an OLTP database into my data warehouse. The data I was importing from the business database into the warehouse was mostly new rows, sometimes updates to existing rows, and sometimes deletes. I decided to use the MERGE statement to implement the insert, update or delete in the data warehouse, I found it quite performant to have a stored procedure that extracted all the new, updated, and deleted rows from the source database and dump it into a working table in my data warehouse, then run a stored proc in the warehouse that was the MERGE statement that took the rows from the working table and updated the real fact table. Use Warehouse CREATE TABLE Integration.MergePolicy (PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date, Operation varchar(5)) CREATE TABLE fact.Policy (PolicyKey int identity primary key, PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date) CREATE PROC Integration.MergePolicy as begin begin tran Merge fact.Policy as tgtUsing Integration.MergePolicy as SrcOn (tgt.PolicyId = Src.PolicyId) When not matched by Target then Insert (PolicyId, PolicyTypeKey, Premium, Deductible, EffectiveDate)values (src.PolicyId, src.PolicyTypeKey, src.Premium, src.Deductible, src.EffectiveDate) When matched and src.Operation = 'U' then Update set PolicyTypeKey = src.PolicyTypeKey,Premium = src.Premium,Deductible = src.Deductible,EffectiveDate = src.EffectiveDate When matched and src.Operation = 'D' then Delete ;delete from Integration.WorkPolicy commit end Notice that my worktable (Integration.MergePolicy) doesn't have any primary key or clustered index. I didn't think this would be a problem, since it was relatively small table and was empty after each time I ran the stored proc. For one of the work tables, during the initial loads of the warehouse, it was getting about 1.5 million rows inserted, processed, then deleted. Also, because of a bug in the extraction process, the same 1.5 million rows (plus a few hundred more each time) was getting inserted, processed, and deleted. This was being sone on a fairly hefty server that was otherwise unused, and no one was paying any attention to the time it was taking. This week I received a backup of this database and loaded it on my laptop to troubleshoot the problem, and of course it took a good ten minutes or more to run the process. However, what seemed strange to me was that after I fixed the problem and happened to run the merge sproc when the work table was completely empty, it still took almost ten minutes to complete. I immediately looked back at the MERGE statement to see if I had some sort of outer join that meant it would be scanning the target table (which had about 2 million rows in it), then turned on the execution plan output to see what was happening under the hood. Running the stored procedure again took a long time, and the plan output didn't show me much - 55% on the MERGE statement, and 45% on the DELETE statement, and table scans on the work table in both places. I was surprised at the relative cost of the DELETE statement, because there were really 0 rows to delete, but I was expecting to see the table scans. (I was beginning now to suspect that my problem was because the work table was being stored as a heap.) Then I turned on STATS_IO and ran the sproc again. The output was quite interesting.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Policy'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'MergePolicy'. Scan count 1, logical reads 433276, physical reads 60, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. I've reproduced the above from memory, the details aren't exact, but the essential bit was the very high number of logical reads on the table stored as a heap. Even just doing a SELECT Count(*) from Integration.MergePolicy incurred that sort of output, even though the result was always 0. I suppose I should research more on the allocation and deallocation of pages to tables stored as a heap, but I haven't, and my original assumption that a table stored as a heap with no rows would only need to read one page to answer any query was definitely proven wrong. It's likely that some sort of physical defragmentation of the table may have cleaned that up, but it seemed that the easiest answer was to put a clustered index on the table. After doing so, the execution plan showed a cluster index scan, and the IO stats showed only a single page read. (I aborted my first attempt at adding a clustered index on the table because it was taking too long - instead I ran TRUNCATE TABLE Integration.MergePolicy first and added the clustered index, both of which took very little time). I suspect I may not have noticed this if I had used TRUNCATE TABLE Integration.MergePolicy instead of DELETE FROM Integration.MergePolicy, since I'm guessing that the truncate operation does some rather quick releasing of pages allocated to the heap table. In the future, I will likely be much more careful to have a clustered index on every table I use, even the working tables. Mike

Read the article
SQL Saturday 27 (Portland, Oregon)

- by BuckWoody

I’m sitting in the Seattle airport, waiting for my flight to Silicon Valley California for the SQL Server 2008 R2 Launch Event. By some quirk of nature, they are asking me to Emcee the event – but that’s another post entirely. I’m reflecting on the SQL Saturday 27 event that was just held in Portland, Oregon this last Saturday. These are not Microsoft-sponsored events – it’s truly the community at work. Think of a big user-group meeting – I mean REALLY big – held in a central location, like at a college (as ours was) or some larger, inexpensive venue like that. Everyone there is volunteering – it’s my own money and time to drive several hours to a hotel for the night, feed myself and present. It’s their own time and money for the folks that organize the event – unless a vendor or two steps in to help. It’s their own time and money for the attendees to drive a long way, spend the night and their Saturday to listen to the speakers. Why do all this? Because everybody benefits. Every speaker learns something new, meets new people, and reaches a new audience. Every volunteer does the same. And the attendees? Well, it’s pretty obvious what they get. A 7Am to 10PM extravaganza of knowledge from every corner of the product. In fact, this year the Portland group hooked up with the CodeCamp folks and held a combined event. We had over 850 people, and I had everyone from data professionals to developers in my sessions. So I’ll take this opportunity to do two things: to say “thank you” to all of the folks who attended, from those who spoke to those who worked and those who came to listen, and to challenge you to attend the next SQL Saturday anywhere near you. You can find the list here: http://www.sqlsaturday.com/. Don’t see anything in your area? Start one! The PASS folks have a package that will show you how. Sure, it’s a big job, but the key is to get as many people helping you as possible. Even if you have only a few dozen folks show up the first time, no worries. The first events I presented at had about 20 in the room. But not this week. See you at the Launch Event if you’re near the San Francisco area tomorrow, and see you at the Redmond SQL Saturday and TechEd if not. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SQLAuthority News – Advantages of Distance Learning

- by Pinal Dave

Distance education is extremely popular – almost overnight, it seems. Almost everyone has taken an online course, or knows someone who has, or is considering joining an online school. There are many advantages and disadvantages to attending an online school – but the same can be said of attending a physical school! Let’s take a look at the top reasons to use distance education. 1) Flexibility. Physical universities are usually willing to make some concessions to student – like night classes, study hours, and online networks. However, nothing is going to beat the flexibility of distance education. You can attend classes and take notes anytime, anywhere, wearing anything you’d like! 2) Affordability. We don’t need to get into hard numbers to understand how an expensive university can be. Students are taking on more and more debt just to get an education. Many of these fees pay for room, board, and facilities. Distance education cuts out all these costs, and makes attending school much more affordable for the average student. 3) Try before you buy. Did you know that the average college student changes his or her major 10 times before they graduate? You can imagine that this kind of indecision plays a huge part in WHEN you graduate – not being able to make up your mind can cost you big bucks if you have to stay in school for extra years! Distance education allows you to take different classes from a wide range of disciplines. Do you want to study forensic science or English literature? Now you don’t have to pay for classes you can’t afford just to find out. 4) Pace yourself. Some students struggle in a traditional classroom setting – classes can be taught too fast, too slow, or there are too many distractions. Distance education allows mature students to set the pace themselves. They can rewatch lectures they didn’t catch the first time, or go through classes quickly if they are already familiar with the material – cutting out the chance of burning out or getting bored. 5) Lifelong learning. Maybe you already have a degree, but would like to learn more about your field, or a related field, or maybe even about something completely unrelated – just because you are curious! Distance education allows you to learn whatever you want ,whenever you want (and yes, wearing anything you’d like!). 6) Attend whatever college you want. Because of the popularity of distance education, physical campuses are getting in on the game by offering online courses – often just uploaded versions of classes already taught at their campus. Ever wanted to attend Harvard, but knew you couldn’t get in? Take a class online! Of course, you probably should not attempt to lie and say you have a Harvard degree, but Ivy League colleges are prestigious because they are the best in their field – take advantage of the best by taking an online course! I am a big believer in continuing education, whether it is online courses, returning to school, or even take informal classes online. Distance education can be a great way to accomplish these goals and become a lifelong learner. My friends at provides training through virtual classrooms for students who want to avoid travelling. Distance learning course allows IT aspirants to connect with trainers using the internet. I encourage everyone to check it out! Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Training, T SQL, Technology

Read the article
Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Search Results

Search found 86974 results on 3479 pages for 'visualsvn server'.

Page 593/3479 | < Previous Page | 589 590 591 592 593 594 595 596 597 598 599 600 | Next Page >

- by Pinal Dave

- by Tim Dexter

- by user171447

- by jamiet

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Pinal Dave

- by Teknikk

- by Pinal Dave

- by user9546

- by Pinal Dave

- by pinaldave

- by Pinal Dave

- by Pinal Dave

- by Martin Hinshelwood

- by MikeD

- by BuckWoody

- by Pinal Dave

- by Pinal Dave

< Previous Page | 589 590 591 592 593 594 595 596 597 598 599 600 | Next Page >