Search Results

Search found 58965 results on 2359 pages for 'ssis data transformations'.

Page 7/2359 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

Abstract Data Type and Data Structure

- by mark075

It's quite difficult for me to understand these terms. I searched on google and read a little on Wikipedia but I'm still not sure. I've determined so far that: Abstract Data Type is a definition of new type, describes its properties and operations. Data Structure is an implementation of ADT. Many ADT can be implemented as the same Data Structure. If I think right, array as ADT means a collection of elements and as Data Structure, how it's stored in a memory. Stack is ADT with push, pop operations, but can we say about stack data structure if I mean I used stack implemented as an array in my algorithm? And why heap isn't ADT? It can be implemented as tree or an array.

Read the article
More Value From Data Using Data Mining Presentation

Here is a presentation I gave at the SQLBits conference in September which was recorded by Microsoft. Usually I speak about SSIS but on this particular event I thought people would like to hear something different from me. Microsoft are making a big play for making Data Mining more accessible to everyone and not just boffins. In this presentation I give an overview of data mining and then do some demonstrations using the excellent Excel Add-Ins available from Microsoft SQL Server 2008 SQL Server 2005 I hope you enjoy this presentation http://go.microsoft.com/?linkid=9633764

Read the article
Updates to Stairway to Integration Services

- by andyleonard

The Stairway to integration Services has been updated! I added content to Step 1 to provide more detail about creating a first SSIS project and corrected a typo in Step 2 that referred to an older name for the Step 1 article. I also made the corrected Step 1 article name a link to help. Thanks to Steve Jones ( blog | @way0utwest ) for all his hard work editing and corralling trifling authors. :{>...(read more)

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Presenting Designing an SSIS Execution Framework to Steel City SQL 18 Jan 2011!

- by andyleonard

I'm honored to present Designing an SSIS Execution Framework (Level 300) to Steel City SQL - the Birmingham Alabama chapter of PASS - on 18 Jan 2011! The meeting starts at 6:00 PM 18 Jan 2011 and will be held at: New Horizons Computer Learning Center 601 Beacon Pkwy. West Suite 106 Birmingham, Alabama, 35209 ( Map for directions ) Abstract In this “demo-tastic” presentation, SSIS trainer, author, and consultant Andy Leonard explains the what, why, and how of an SSIS framework that delivers metadata-driven...(read more)

Read the article
Data Storage Options

- by Kenneth

When I was working as a website designer/engineer I primarily used databases for storage of much of my dynamic data. It was very easy and convenient to use this method and seemed like a standard practice from my research on the matter. I'm now working on shifting away from websites and into desktop applications. What are the best practices for data storage for desktop applications? I ask because I have noticed that most programs I use on a personal level don't appear to use a database for data storage unless its embedded in the program. (I'm not thinking of an application like a word processor where it makes sense to have data stored in individual files as defined by the user. Rather I'm thinking of something more along the lines of a calendar application which would need to store dates and event info and such where accessing that information would be much easier if stored in a database... at least as far as my experience would indicate.) Thanks for the input!

Read the article
What is a Data Warehouse?

Typically Data Warehouses are considered to be non-volatile in comparison to traditional databasesdue to the fact that data within the warehouse does not change that often. In addition, Data Warehouses typically represent data through the use of Multidimensional Conceptual Views that allow data to be extracted based on the view and the current position within the view. Common Data Warehouse Traits Relatively Non-volatile Data Supports Data Extraction and Analysis Optimized for Data Retrieval and Analysis Multidimensional Views of Data Flexible Reporting Multi User Support Generic Dimensionality Transparent Accessible Unlimited Dimensions of Data Unlimited Aggregation levels of Data Normally, Data Warehouses are much larger then there traditional database counterparts due to the fact that they store the basis data along with derived data via Multidimensional Conceptual Views. As companies store larger and larger amounts of data, they will need a way to effectively and accurately extract analysis information that can be used to aide in formulating current and future business decisions. This process can be done currently through data mining within a Data Warehouse. Data Warehouses provide access to data derived through complex analysis, knowledge discovery and decision making. Secondly, they support the demands for high performance in regards to analyzing an organization’s existing and current data. Data Warehouses provide support for an organization’s data and acquired business knowledge. Within a Data Warehouse multiple types of operations/sub systems are supported. Common Data Warehouse Sub Systems Online Analytical Processing (OLAP) Decision –Support Systems (DSS) Online Transaction Processing (OLTP)

Read the article
How is the gimbal locked problem solved using accumulative matrix transformations

- by Luke San Antonio

I am reading the online "Learning Modern 3D Graphics Programming" book by Jason L. McKesson As of now, I am up to the gimbal lock problem and how to solve it using quaternions. However right here, at the Quaternions page. Part of the problem is that we are trying to store an orientation as a series of 3 accumulated axial rotations. Orientations are orientations, not rotations. And orientations are certainly not a series of rotations. So we need to treat the orientation of the ship as an orientation, as a specific quantity. I guess this is the first spot I start to get confused, the reason is because I don't see the dramatic difference between orientations and rotations. I also don't understand why an orientation cannot be represented by a series of rotations... Also: The first thought towards this end would be to keep the orientation as a matrix. When the time comes to modify the orientation, we simply apply a transformation to this matrix, storing the result as the new current orientation. This means that every yaw, pitch, and roll applied to the current orientation will be relative to that current orientation. Which is precisely what we need. If the user applies a positive yaw, you want that yaw to rotate them relative to where they are current pointing, not relative to some fixed coordinate system. The concept, I understand, however I don't understand how if accumulating matrix transformations is a solution to this problem, how the code given in the previous page isn't just that. Here's the code: void display() { glClearColor(0.0f, 0.0f, 0.0f, 0.0f); glClearDepth(1.0f); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glutil::MatrixStack currMatrix; currMatrix.Translate(glm::vec3(0.0f, 0.0f, -200.0f)); currMatrix.RotateX(g_angles.fAngleX); DrawGimbal(currMatrix, GIMBAL_X_AXIS, glm::vec4(0.4f, 0.4f, 1.0f, 1.0f)); currMatrix.RotateY(g_angles.fAngleY); DrawGimbal(currMatrix, GIMBAL_Y_AXIS, glm::vec4(0.0f, 1.0f, 0.0f, 1.0f)); currMatrix.RotateZ(g_angles.fAngleZ); DrawGimbal(currMatrix, GIMBAL_Z_AXIS, glm::vec4(1.0f, 0.3f, 0.3f, 1.0f)); glUseProgram(theProgram); currMatrix.Scale(3.0, 3.0, 3.0); currMatrix.RotateX(-90); //Set the base color for this object. glUniform4f(baseColorUnif, 1.0, 1.0, 1.0, 1.0); glUniformMatrix4fv(modelToCameraMatrixUnif, 1, GL_FALSE, glm::value_ptr(currMatrix.Top())); g_pObject->Render("tint"); glUseProgram(0); glutSwapBuffers(); } To my understanding, isn't what he is doing (modifying a matrix on a stack) considered accumulating matrices, since the author combined all the individual rotation transformations into one matrix which is being stored on the top of the stack. My understanding of a matrix is that they are used to take a point which is relative to an origin (let's say... the model), and make it relative to another origin (the camera). I'm pretty sure this is a safe definition, however I feel like there is something missing which is blocking me from understanding this gimbal lock problem. One thing that doesn't make sense to me is: If a matrix determines the difference relative between two "spaces," how come a rotation around the Y axis for, let's say, roll, doesn't put the point in "roll space" which can then be transformed once again in relation to this roll... In other words shouldn't any further transformations to this point be in relation to this new "roll space" and therefore not have the rotation be relative to the previous "model space" which is causing the gimbal lock. That's why gimbal lock occurs right? It's because we are rotating the object around set X, Y, and Z axes rather than rotating the object around it's own, relative axes. Or am I wrong? Since apparently this code I linked in isn't an accumulation of matrix transformations can you please give an example of a solution using this method. So in summary: What is the difference between a rotation and an orientation? Why is the code linked in not an example of accumulation of matrix transformations? What is the real, specific purpose of a matrix, if I had it wrong? How could a solution to the gimbal lock problem be implemented using accumulation of matrix transformations? Also, as a bonus: Why are the transformations after the rotation still relative to "model space?" Another bonus: Am I wrong in the assumption that after a transformation, further transformations will occur relative to the current? Also, if it wasn't implied, I am using OpenGL, GLSL, C++, and GLM, so examples and explanations in terms of these are greatly appreciated, if not necessary. The more the detail the better! Thanks in advance...

Read the article
How Do You Actually Model Data?

Since the 1970’s Developers, Analysts and DBAs have been able to represent concepts and relations in the form of data through the use of generic symbols. But what is data modeling? The first time I actually heard this term I could not understand why anyone would want to display a computer on a fashion show runway. Hey, what do you expect? At that time I was a freshman in community college, and obviously this was a long time ago. I have since had the chance to learn what data modeling truly is through using it. Data modeling is a process of breaking down information and/or requirements in to common categories called objects. Once objects start being defined then relationships start to form based on dependencies found amongst other existing objects. Currently, there are several tools on the market that help data designer actually map out objects and their relationships through the use of symbols and lines. These diagrams allow for designs to be review from several perspectives so that designers can ensure that they have the optimal data design for their project and that the design is flexible enough to allow for potential changes and/or extension in the future. Additionally these basic models can always be further refined to show different levels of details depending on the target audience through the use of three different types of models. Conceptual Data Model(CDM)Conceptual Data Models include all key entities and relationships giving a viewer a high level understanding of attributes. Conceptual data model are created by gathering and analyzing information from various sources pertaining to a project during the typical planning phase of a project. Logical Data Model (LDM)Logical Data Models are conceptual data models that have been expanded to include implementation details pertaining to the data that it will store. Additionally, this model typically represents an origination’s business requirements and business rules by defining various attribute data types and relationships regarding each entity. This additional information can be directly translated to the Physical Data Model which reduces the actual time need to implement it. Physical Data Model(PDMs)Physical Data Model are transformed Logical Data Models that include the necessary tables, columns, relationships, database properties for the creation of a database. This model also allows for considerations regarding performance, indexing and denormalization that are applied through database rules, data integrity. Further expanding on why we actually use models in modern application/database development can be seen in the benefits that data modeling provides for data modelers and projects themselves, Benefits of Data Modeling according to Applied Information Science Abstraction that allows data designers remove concepts and ideas form hard facts in the form of data. This gives the data designers the ability to express general concepts and/or ideas in a generic form through the use of symbols to represent data items and the relationships between the items. Transparency through the use of data models allows complex ideas to be translated in to simple symbols so that the concept can be understood by all viewpoints and limits the amount of confusion and misunderstanding. Effectiveness in regards to tuning a model for acceptable performance while maintaining affordable operational costs. In addition it allows systems to be built on a solid foundation in terms of data. I shudder at the thought of a world without data modeling, think about it? Data is everywhere in our lives. Data modeling allows for optimizing a design for performance and the reduction of duplication. If one was to design a database without data modeling then I would think that the first things to get impacted would be database performance due to poorly designed database and there would be greater chances of unnecessary data duplication that would also play in to the excessive query times because unneeded records would need to be processed. You could say that a data designer designing a database is like a box of chocolates. You will never know what kind of database you will get until after it is built.

Read the article
Want to Learn SQL Server 2012?

- by andyleonard

Or SSIS 2012? SSRS 2012? SSAS 2012? There’s no substitute for getting your hands on the product, in my opinion. I can hear you thinking, “But Andy, I can’t afford to purchase a copy of SQL Server 2012.” Are you sure? What if I told you that you can get a full-feature version of SQL Server 2012 Enterprise Edition for $50? Well, you cannot… it’s actually less than $50! SQL Server 2012 Developer Edition is available at Amazon on the day of this writing for $41.24USD. That’s about the price of eight...(read more)

Read the article
Data Structures usage and motivational aspects

- by Aubergine

For long student life I was always wondering why there are so many of them yet there seems to be lack of usage at all in many of them. The opinion didn't really change when I got a job. We have brilliant books on what they are and their complexities, but I never encounter resources which would actually give a good hint of practical usage. I perfectly understand that I have to look at problem , analyse required operations, look for data structure that does them efficiently. However in practice I never do that, not because of human laziness syndrome, but because when it comes to work I acknowledge time priority over self-development. Over time I thought that when I would be better developer I will automatically use more of them - that didn't happen at all or maybe I just didn't. Then I found that the colleagues usually in the same plate as me - knowing more or less some three of data structures and being totally happy about it and refusing to discuss this matter further with me, coming back to conversations about 'cool new languages' 'libraries that do jobs for you' and the joy to work under scrumban etc. I am stuck with ArrayLists, Arrays and SortedMap , which no matter what I do always suffice or either I tweak them to be capable of fulfilling my task. Yes, it might be inefficient but do we really have to care if Intel increases performance over years no matter if we improve our skills? Does new Xeon or IBM machines really care what we use? What if I like build things, but I am not particularly excited whether it is n log(n) or just n? Over twenty years the processing power increased enormously, which gives us freedom of not being critical about which one to use? On top of that new more optimized languages appear which support multiple cores more efficiently. To be more specific: I would like to find motivational material on complex real areas/cases of possible effective usages of data structures. I would be really grateful if you would provide relevant resources. There is similar question ,but in the end the links again mostly describe or do dumb example(vehicles, students or holy grail quest - yes, very relevant) them and people keep referring to the "scenario decides the data structure to use". I want to know these complex scenarios to be able to identify similarities to my scenario and then use them. The complex scenarios where it really matters and not necessarily of quantitive nature. It seems that data structures only concern is efficiency and nothing else? There seems to be no particular convenience for developer in use one over another. (only when I found scientific resources on why exactly simple carbohydrates are evil I stopped eating sugar and candies completely replacing it with less harmful fruits - I hope you can see the analogy)

Read the article
Unleash AutoVue on Your Unmanaged Data

- by [email protected]

Over the years, I've spoken to hundreds of customers who use AutoVue to collaborate on their "managed" data stored in content management systems, product lifecycle management systems, etc. via our many integrations. Through these conversations I've also learned a harsh reality - we will never fully move away from unmanaged data (desktops, file servers, emails, etc). If you use AutoVue today you already know that even if your primary use is viewing content stored in a content management system, you can still open files stored locally on your computer. But did you know that AutoVue actually has - built-in - a great solution for viewing, printing and redlining your data stored on file servers? Using the 'Server protocol' you can point AutoVue directly to a top-level location on any networked file server and provide your users with a link or shortcut to access an interface similar to the sample page shown below. Many customers link to pages just like this one from their internal company intranets. Through this webpage, users can easily search and browse through file server data with a 'click-and-view' interface to find the specific image, document, drawing or model they're looking for. Any markups created on a document will be accessible to everyone else viewing that document and of course real-time collaboration is supported as well. Customers on maintenance can consult the AutoVue Admin guide or My Oracle Support Doc ID 753018.1 for an introduction to the server protocol. Contact your local AutoVue Solutions Consultant for help setting up the sample shown above.

Read the article
SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

- by pinaldave

Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

Read the article
Consolidate Data in Private Clouds, But Consider Security and Regulatory Issues

- by Troy Kitch

The January 13 webcast Security and Compliance for Private Cloud Consolidation will provide attendees with an overview of private cloud computing based on Oracle's Maximum Availability Architecture and how security and regulatory compliance affects implementations. Many organizations are taking advantage of Oracle's Maximum Availability Architecture to drive down the cost of IT by deploying private cloud computing environments that can support downtime and utilization spikes without idle redundancy. With two-thirds of sensitive and regulated data in organizations' databases private cloud database consolidation means organizations must be more concerned than ever about protecting their information and addressing new regulatory challenges. Join us for this webcast to learn about greater risks and increased threats to private cloud data and how Oracle Database Security Solutions can assist in securely consolidating data and meet compliance requirements. Register Now.

Read the article
Validating Data Using Data Annotation Attributes in ASP.NET MVC

- by bipinjoshi

The data entered by the end user in various form fields must be validated before it is saved in the database. Developers often use validation HTML helpers provided by ASP.NET MVC to perform the input validations. Additionally, you can also use data annotation attributes from the System.ComponentModel.DataAnnotations namespace to perform validations at the model level. Data annotation attributes are attached to the properties of the model class and enforce some validation criteria. They are capable of performing validation on the server side as well as on the client side. This article discusses the basics of using these attributes in an ASP.NET MVC application.http://www.bipinjoshi.net/articles/0a53f05f-b58c-47b1-a544-f032f5cfca58.aspx

Read the article
Replication Services as ETL extraction tool

- by jorg

In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication… The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button. In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines! On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication. Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions The New Subscription Wizard appears Select the publisher on which you just created your publication and select the database and publication (SnapshotTest) You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful. You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here Click Next Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot: Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot: Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data: Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

Read the article
Move data from others user accounts in my user account

- by user118136

I had problems with compiz setting and I make multiple accounts, now I want to transfer my information from all deleted users in my current account, some data I can not copy because I am not right to read, I type in terminal "sudo nautilus" and I get the permission for read, but the copied data is available only for superusers and I must charge the permissions for each file and each folder. How I can copy the information with out the superuser rights OR how I can charge the permissions for selected folder and all files and folders included in it?

Read the article
SSIS Catalog: How to use environment in every type of package execution

- by Kevin Shyr

Here is a good blog on how to create a SSIS Catalog and setting up environments. http://sqlblog.com/blogs/jamie_thomson/archive/2010/11/13/ssis-server-catalogs-environments-environment-variables-in-ssis-in-denali.aspx Here I will summarize 3 ways I know so far to execute a package while using variables set up in SSIS Catalog environment. First way, we have SSIS project having reference to environment, and having one of the project parameter using a value set up in the environment called "Development". With this set up, you are limited to calling the packages by right-clicking on the packages in the SSIS catalog list and select Execute, but you are free to choose absolute or relative path of the environment. The following screenshot shows the 2 available paths to your SSIS environments. Personally, I use absolute path because of Option 3, just to keep everything simple for myself. The second option is to call through SQL Job. This does require you to configure your project to already reference an environment and use its variable. When a job step is set up, the configuration part will require you to select that reference again. This is more useful when you want to automate the same package that needs to be run in different environments. The third option is the most important to me as I have a SSIS framework that calls hundreds of packages. The main part of the stored procedure is in this post (http://geekswithblogs.net/LifeLongTechie/archive/2012/11/14/time-to-stop-using-ldquoexecute-package-taskrdquondash-a-way-to.aspx). But the top part had to be modified to include the logic to use environment reference. CREATE PROCEDURE [AUDIT].[LaunchPackageExecutionInSSISCatalog] @PackageName NVARCHAR(255) , @ProjectFolder NVARCHAR(255) , @ProjectName NVARCHAR(255) , @AuditKey INT , @DisableNotification BIT , @PackageExecutionLogID INT , @EnvironmentName NVARCHAR(128) = NULL , @Use32BitRunTime BIT = FALSE AS BEGIN TRY DECLARE @execution_id BIGINT = 0; -- Create a package execution IF @EnvironmentName IS NULL BEGIN EXEC [SSISDB].[catalog].[create_execution] @package_name=@PackageName, @execution_id=@execution_id OUTPUT, @folder_name=@ProjectFolder, @project_name=@ProjectName, @use32bitruntime=@Use32BitRunTime; END ELSE BEGIN DECLARE @EnvironmentID AS INT SELECT @EnvironmentID = [reference_id] FROM SSISDB.[internal].[environment_references] WITH(NOLOCK) WHERE [environment_name] = @EnvironmentName AND [environment_folder_name] = @ProjectFolder EXEC [SSISDB].[catalog].[create_execution] @package_name=@PackageName, @execution_id=@execution_id OUTPUT, @folder_name=@ProjectFolder, @project_name=@ProjectName, @reference_id=@EnvironmentID, @use32bitruntime=@Use32BitRunTime; END

Read the article
Did you know documentation is built-in to usp_ssiscatalog?

- by jamiet

I am still working apace on updates to my open source project SSISReportingPack, specifically I am working on improvements to usp_ssiscatalog which is a stored procedure that eases the querying and exploration of the data in the SSIS Catalog. In this blog post I want to share a titbit of information about usp_ssiscatalog, that all the actions that you can take when you execute usp_ssiscatalog are documented within the stored procedure itself. For example if you simply execute EXEC usp_ssiscatalog @action='exec' in SSMS then switch over to the messages tab you will see some information about the action: OK, that’s kinda cool. But what if you only want to see the documentation and don’t actually want any action to take place. Well you can do that too using the @show_docs_only parameter like so: EXEC dbo.usp_ssiscatalog @a='exec',@show_docs_only=1; That will only show the documentation. Wanna read all of the documentation? That’s simply: EXEC dbo.usp_ssiscatalog @a='exec',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='execs',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='configure',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_created',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_running',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_canceled',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_failed',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_pending',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_ended_unexpectedly',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_succeeded',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_stopping',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_completed',@show_docs_only=1; I hope that comes in useful for you sometime. Have fun exploring the documentation on usp_ssiscatalog. If you think the documentation can be improved please do let me know. @jamiet

Read the article
Did you know documentation is built-in to usp_ssiscatalog?

- by jamiet

I am still working apace on updates to my open source project SSISReportingPack, specifically I am working on improvements to usp_ssiscatalog which is a stored procedure that eases the querying and exploration of the data in the SSIS Catalog. In this blog post I want to share a titbit of information about usp_ssiscatalog, that all the actions that you can take when you execute usp_ssiscatalog are documented within the stored procedure itself. For example if you simply execute EXEC usp_ssiscatalog @action='exec' in SSMS then switch over to the messages tab you will see some information about the action: OK, that’s kinda cool. But what if you only want to see the documentation and don’t actually want any action to take place. Well you can do that too using the @show_docs_only parameter like so: EXEC dbo.usp_ssiscatalog @a='exec',@show_docs_only=1; That will only show the documentation. Wanna read all of the documentation? That’s simply: EXEC dbo.usp_ssiscatalog @a='exec',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='execs',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='configure',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_created',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_running',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_canceled',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_failed',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_pending',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_ended_unexpectedly',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_succeeded',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_stopping',@show_docs_only=1; EXEC dbo.usp_ssiscatalog @a='exec_completed',@show_docs_only=1; I hope that comes in useful for you sometime. Have fun exploring the documentation on usp_ssiscatalog. If you think the documentation can be improved please do let me know. @jamiet

Read the article
SSIS - Bulk Update at Database Field Level

- by Adam

Hello, Here's our mission: Receive files from clients. Each file contains anywhere from 1 to 1,000,000 records. Records are loaded to a staging area and business-rule validation is applied. Valid records are then pumped into an OLTP database in a batch fashion, with the following rules: If record does not exist (we have a key, so this isn't an issue), create it. If record exists, optionally update each database field. The decision is made based on one of 3 factors...I don't believe it's important what those factors are. Our main problem is finding an efficient method of optionally updating the data at a field level. This is applicable across ~12 different database tables, with anywhere from 10 to 150 fields in each table (original DB design leaves much to be desired, but it is what it is). Our first attempt has been to introduce a table that mirrors the staging environment (1 field in staging for each system field) and contains a masking flag. The value of the masking flag represents the 3 factors. We've then put an UPDATE similar to... UPDATE OLTPTable1 SET Field1 = CASE WHEN Mask.Field1 = 0 THEN Staging.Field1 WHEN Mask.Field1 = 1 THEN COALESCE( Staging.Field1 , OLTPTable1.Field1 ) WHEN Mask.Field1 = 2 THEN COALESCE( OLTPTable1.Field1 , Staging.Field1 ) ... As you can imagine, the performance is rather horrendous. Has anyone tackled a similar requirement? We're a MS shop using a Windows Service to launch SSIS packages that handle the data processing. Unfortunately, we're pretty much novices at this stuff.

Read the article
Distributed and/or Parallel SSIS processing

- by Jeff

Background: Our company hosts SaaS DSS applications, where clients provide us data Daily and/or Weekly, which we process & merge into their existing database. During business hours, load in the servers are pretty minimal as it's mostly users running simple pre-defined queries via the website, or running drill-through reports that mostly hit the SSAS OLAP cube. I manage the IT Operations Team, and so far this has presented an interesting "scaling" issue for us. For our daily-refreshed clients, the server is only "busy" for about 4-6 hrs at night. For our weekly-refresh clients, the server is only "busy" for maybe 8-10 hrs per week! We've done our best to use some simple methods of distributing the load by spreading the daily clients evenly among the servers such that we're not trying to process daily clients back-to-back over night. But long-term this scaling strategy creates two notable issues. First, it's going to consume a pretty immense amount of hardware that sits idle for large periods of time. Second, it takes significant Production Support over-head to basically "schedule" the ETL such that they don't over-lap, and move clients/schedules around if they out-grow the resources on a particular server or allocated time-slot. As the title would imply, one option we've tried is running multiple SSIS packages in parallel, but in most cases this has yielded VERY inconsistent results. The most common failures are DTExec, SQL, and SSAS fighting for physical memory and throwing out-of-memory errors, and ETLs running 3,4,5x longer than expected. So from my practical experience thus far, it seems like running multiple ETL packages on the same hardware isn't a good idea, but I can't be the first person that doesn't want to scale multiple ETLs around manual scheduling, and sequential processing. One option we've considered is virtualizing the servers, which obviously doesn't give you any additional resources, but moves the resource contention onto the hypervisor, which (from my experience) seems to manage simultaneous CPU/RAM/Disk I/O a little more gracefully than letting DTExec, SQL, and SSAS battle it out within Windows. Question to the forum: So my question to the forum is, are we missing something obvious here? Are there tools out there that can help manage running multiple SSIS packages on the same hardware? Would it be more "efficient" in terms of parallel execution if instead of running DTExec, SQL, and SSAS same machine (with every machine running that configuration), we run in pairs of three machines with SSIS running on one machine, SQL on another, and SSAS on a third? Obviously that would only make sense if we could process more than the three ETL we were able to process on the machine independently. Another option we've considered is completely re-architecting our SSIS package to have one "master" package for all clients that attempts to intelligently chose a server based off how "busy" it already is in terms of CPU/Memory/Disk utilization, but that would be a herculean effort, and seems like we're trying to reinvent something that you would think someone would sell (although I haven't had any luck finding it). So in summary, are we missing an obvious solution for this, and does anyone know if any tools (for free or for purchase, doesn't matter) that facilitate running multiple SSIS ETL packages in parallel and on multiple servers? (What I would call a "queue & node based" system, but that's not an official term). Ultimately VMWare's Distributed Resource Scheduler addresses this as you simply run a consistent number of clients per VM that you know will never conflict scheduleing-wise, then leave it up to VMWare to move the VMs around to balance out hardware usage. I'm definitely not against using VMWare to do this, but since we're a 100% Microsoft app stack, it seems like -someone- out there would have solved this problem at the application layer instead of the hypervisor layer by checking on resource utilization at the OS, SQL, SSAS levels. I'm open to ANY discussion on this, and remember no suggestion is too crazy or radical! :-) Right now, VMWare is the only option we've found to get away from "manually" balancing our resources, so any suggestions that leave us on a pure Microsoft stack would be great. Thanks guys, Jeff

Read the article
SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

- by pinaldave

Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Data Mining Resources

- by Dejan Sarka

There are many different types of analyses, each one with its own pros and cons. Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points in time. One of the drawbacks is that report authoring is limited to IT pros and advanced users. Any kind of dynamic restructuring is very limited. If real-time data is used for a report, the report has a negative impact on the performance of the source system. Processing of the reports might be slow because the data comes from relational database management systems, which are not optimized for reporting only. If you create a semantic model of your data, your end users can create ad-hoc report structures. However, the development is more complex because a developer is needed to create these semantic models. For OLAP, you typically use specialized database management systems. You get lightning speed of analyses. End users can use rich and thin clients to interactively change the structure of the report. Typically, they do it graphically. However, the development of an OLAP system is many times quite complex. It involves the preparation and maintenance of an enterprise data warehouse and OLAP cubes. In order to exploit the possibility of real-time restructuring of reports, the users must be both active and educated. The data is usually stale, as it is loaded into data warehouses and OLAP cubes with a scheduled process. With data mining, a structure is not selected in advance; it searches for the structure. As a result, data mining can give you the most valuable results because you can discover patterns you did not expect. A data mining model structure is limited only by the attributes that you use to train the model. One of the drawbacks is that a lot of knowledge is needed for a successful data mining project. End users have to understand the results. Subject matter experts and IT professionals need to understand business problem thoroughly. The development might be sometimes even more complex than the development of OLAP cubes. Each type of analysis has its own place in an enterprise system. SQL Server has tools for all kinds of analyses. However, data mining is the most advanced way of analyzing the data; this is the “I” in BI. In order to get the most out of it, you need to learn quite a lot. In this blog post, I am gathering together resources for learning, including forthcoming events. Books Multiple authors: SQL Server MVP Deep Dives – I wrote an introductory data mining chapter there. Erik Veerman, Teo Lachev and Dejan Sarka: MCTS Self-Paced Training Kit (Exam 70-448): Microsoft SQL Server 2008 - Business Intelligence Development and Maintenance – you can find a good overview of a complete BI solution, including data mining, in this book. Jamie MacLennan, ZhaoHui Tang, and Bogdan Crivat: Data Mining with Microsoft SQL Server 2008 – can’t miss this book if you want to mine your data with SQL Server tools. Michael Berry, Gordon Linoff: Mastering Data Mining: The Art and Science of Customer Relationship Management – data mining from both, business and technical perspective. Dorian Pyle: Data Preparation for Data Mining – an in-depth book about data preparation. Thomas and Ronald Wonnacott: Introductory Statistics – if you thought that you could get away without statistics, then you are not serious about data mining. Jiawei Han and Micheline Kamber: Data Mining Concepts and Techniques – in-depth explanation of the most popular data mining algorithms. Michael Berry and Gordon Linoff: Data Mining Techniques – another book that explains data mining algorithms, more fro a business perspective. Paolo Guidici: Applied Data Mining – very mathematical book, only if you enjoy statistics and mathematics in general. Forthcoming presentations I am presenting two data mining related sessions during the PASS Summit in Charlotte, NC: Wednesday, October 16th, 2013 - Fraud Detection: Notes from the Field – I am showing how to use data mining for a specific business problem. The presentation is based on real-life projects. Friday, October 18th: Excel 2013 Advanced Analytics – I am focusing on Excel Data Mining Add-ins, and how to use them together with Power Pivot and other add-ins. This is the most you can get out of Excel. Sinergija 2013, Belgrade, Serbia Tuesday, October 22nd: Excel 2013 Analytics to the Max – another presentation focusing on the most advanced analytics you can get in Excel. SQL Rally Amsterdam, Netherlands Thursday, November 7th: Advanced Analytics in Excel 2013 – and again I am presenting about data mining in Excel. Why three different titles for the same presentation? I don’t know, I guess I forgot the name I proposed every time right after I sent the proposal. Courses Data Mining with SQL Server 2012 – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. OK, now you know: no more excuses, start learning data mining, get the most out of your data

Read the article
Master Data Services Employees Sample Model

- by Davide Mauri

I’ve been playing with Master Data Services quite a lot in those last days and I’m also monitoring the web for all available resources on it. Today I’ve found this freshly released sample available on MSDN Code Gallery: SQL Server Master Data Services Employee Sample Model http://code.msdn.microsoft.com/SSMDSEmployeeSample This sample shows how Recursive Hierarchies can be modeled in order to represent a typical organizational chart scenario where a self-relationship exists on the Employee entity. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

Search Results

Search found 58965 results on 2359 pages for 'ssis data transformations'.

Page 7/2359 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by mark075

- by andyleonard

- by Pinal Dave

- by andyleonard

- by Kenneth

- by Luke San Antonio

- by andyleonard

- by Aubergine

- by [email protected]

- by pinaldave

- by Troy Kitch

- by bipinjoshi

- by jorg

- by user118136

- by Kevin Shyr

- by jamiet

- by jamiet

- by Adam

- by Jeff

- by pinaldave

- by Dejan Sarka

- by Davide Mauri

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >