Search Results

Search found 109730 results on 4390 pages for 'microsoft sql server'.

Page 142/4390 | < Previous Page | 138 139 140 141 142 143 144 145 146 147 148 149  | Next Page >

  • SSIS - XML Source Script

    - by simonsabin
    The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml");   foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer")                        from order in customer.Elements("Orders").Elements("Order")                        select new { Account = customer.Attribute("AccountNumber").Value                                   , OrderDate = order.Attribute("OrderDate").Value }                        )) {     Output0Buffer.AddRow();     Output0Buffer.AccountNumber = xdata.Account;     Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer")                                from order in customer.Elements("Orders").Elements("Order")                                select new { Account = customer.Attribute("AccountNumber").Value                                           , OrderDate = order.Attribute("OrderDate").Value }                                ))         {             Output0Buffer.AddRow();             Output0Buffer.AccountNumber = xdata.Account;             Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate);         } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) {     using (XmlReader xr = XmlReader.Create(filename))     {         xr.MoveToContent();         while (xr.Read()) //Reads the first element         {             while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName)             {                 XElement node = (XElement)XElement.ReadFrom(xr);                   yield return node;             }         }         xr.Close();     } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.        

    Read the article

  • Continuous Integration for SQL Server Part II – Integration Testing

    - by Ben Rees
    My previous post, on setting up Continuous Integration for SQL Server databases using GitHub, Bamboo and Red Gate’s tools, covered the first two parts of a simple Database Continuous Delivery process: Putting your database in to a source control system, and, Running a continuous integration process, each time changes are checked in. However there is, of course, a lot more to to Continuous Delivery than that. Specifically, in addition to the above: Putting some actual integration tests in to the CI process (otherwise, they don’t really do much, do they!?), Deploying the database changes with a managed, automated approach, Monitoring what you’ve just put live, to make sure you haven’t broken anything. This post will detail how to set up a very simple pipeline for implementing the first of these (continuous integration testing). NB: A lot of the setup in this post is built on top of the configuration from before, so it might be difficult to implement this post without running through part I first. There’ll then be a third post on automated database deployment followed by a final post dealing with the last item – monitoring changes on the live system. In the previous post, I used a mixture of Red Gate products and other 3rd party software – GitHub and Atlassian Bamboo specifically. This was partly because I believe most people work in an heterogeneous environment, using software from different vendors to suit their purposes and I wanted to show how this could work for this process. For example, you could easily substitute Atlassian’s BitBucket or Stash for GitHub, depending on your needs, or use an alternative CI server such as TeamCity, TFS or Jenkins. However, in this, post, I’ll be mostly using Red Gate products only (other than tSQLt). I would do this, firstly because I work for Red Gate. However, I also think that in the area of Database Delivery processes, nobody else has the offerings to implement this process fully – so I didn’t have any choice!   Background on Continuous Delivery For me, a great source of information on what makes a proper Continuous Delivery process is the Jez Humble and David Farley classic: Continuous Delivery – Reliable Software Releases through Build, Test, and Deployment Automation This book is not of course, primarily about databases, and the process I outline here and in the previous article is a gross simplification of what Jez and David describe (not least because it’s that much harder for databases!). However, a lot of the principles that they describe can be equally applied to database development and, I would argue, should be. As I say however, what I describe here is a very simple version of what would be required for a full production process. A couple of useful resources on handling some of these complexities can be found in the following two references: Refactoring Databases – Evolutionary Database Design, by Scott J Ambler and Pramod J. Sadalage Versioning Databases – Branching and Merging, by Scott Allen In particular, I don’t deal at all with the issues of multiple branches and merging of those branches, an issue made particularly acute by the use of GitHub. The other point worth making is that, in the words of Martin Fowler: Continuous Delivery is about keeping your application in a state where it is always able to deploy into production.   I.e. we are not talking about continuously delivery updates to the production database every time someone checks in an amendment to a stored procedure. That is possible (and what Martin calls Continuous Deployment). However, again, that’s more than I describe in this article. And I doubt I need to remind DBAs or Developers to Proceed with Caution!   Integration Testing Back to something practical. The next stage, building on our set up from the previous article, is to add in some integration tests to the process. As I say, the CI process, though interesting, isn’t enormously useful without some sort of test process running. For this we’ll use the tSQLt framework, an open source framework designed specifically for running SQL Server tests. tSQLt is part of Red Gate’s SQL Test found on http://www.red-gate.com/products/sql-development/sql-test/ or can be downloaded separately from www.tsqlt.org - though I’ll provide a step-by-step guide below for setting this up. Getting tSQLt set up via SQL Test Click on the link http://www.red-gate.com/products/sql-development/sql-test/ and click on the blue Download button to download the Red Gate SQL Test product, if not already installed. Follow the install process for SQL Test to install the SQL Server Management Studio (SSMS) plugin on to your machine, if not already installed. Open SSMS. You should now see SQL Test under the Tools menu:   Clicking this link will give you the basic SQL Test dialogue: As yet, though we’ve installed the SQL Test product we haven’t yet installed the tSQLt test framework on to any particular database. To do this, we need to add our RedGateApp database using this dialogue, by clicking on the + Add Database to SQL Test… link, selecting the RedGateApp database and clicking the Add Database link:   In the next screen, SQL Test describes what will be installed on the database for the tSQLt framework. Also in this dialogue, uncheck the “Add SQL Cop tests” option (shown below). SQL Cop is a great set of pre-defined tests that work within the tSQLt framework to check the general health of your SQL Server database. However, we won’t be using them in this particular simple example: Once you’ve clicked on the OK button, the changes described in the dialogue will be made to your database. Some of these are shown in the left-hand-side below: We’ve now installed the framework. However, we haven’t actually created any tests, so this will be the next step. But, before we proceed, we’ve made an update to our database so should, again check this in to source control, adding comments as required:   Also worth a quick check that your build still runs with the new additions!: (And a quick check of the RedGateAppCI database shows that the changes have been made).   Creating and Testing a Unit Test There are, of course, a lot of very interesting unit tests that you could and should set up for a database. The great thing about the tSQLt framework is that you can write these in SQL. The example I’m going to use here is pretty Mickey Mouse – our database table is going to include some email addresses as reference data and I want to check whether these are all in a correct email format. Nothing clever but it illustrates the process and hopefully shows the method by which more interesting tests could be set up. Adding Reference Data to our Database To start, I want to add some reference data to my database, and have this source controlled (as well as the schema). First of all I need to add some data in to my solitary table – this can be done a number of ways, but I’ll do this in SSMS for simplicity: I then add some reference data to my table: Currently this reference data just exists in the database. For proper integration testing, this needs to form part of the source-controlled version of the database – and so needs to be added to the Git repository. This can be done via SQL Source Control, though first a Primary Key needs to be added to the table. Right click the table, select Design, then right-click on the first “id” row. Then click on “Set Primary Key”: NB: once this change is made, click Save to save the change to the table. Then, to source control this reference data, right click on the table (dbo.Email) and selecting the following option:   In the next screen, link the data in the Email table, by selecting it from the list and clicking “save and close”: We should at this point re-commit the changes (both the addition of the Primary Key, and the data) to the Git repo. NB: From here on, I won’t show screenshots for the GitHub side of things – it’s the same each time: whenever a change is made in SQL Source Control and committed to your local folder, you then need to sync this in the GitHub Windows client (as this is where the build server, Bamboo is taking it from). An interesting point to note here, when these changes are committed in SQL Source Control (right-click database and select “Commit Changes to Source Control..”): The display gives a warning about possibly needing a migration script for the “Add Primary Key” step of the changes. This isn’t actually necessary in this case, but this mechanism would allow you to create override scripts to replace the default change scripts created by the SQL Compare engine (which runs underneath SQL Source Control). Ignoring this message (!), we add a comment and commit the changes to Git. I then sync these, run a build (or the build gets run automatically), and check that the data is being deployed over to the target RedGateAppCI database:   Creating and Running the Test As I mention, the test I’m going to use here is a very simple one - are the email addresses in my reference table valid? This isn’t of course, a full test of email validation (I expect the email addresses I’ve chosen here aren’t really the those of the Fab Four) – but just a very basic check of format used. I’ve taken the relevant SQL from this Stack Overflow article. In SSMS select “SQL Test” from the Tools menu, then click on + New Test: In the next screen, give your new test a name, and also enter a name in the Test Class box (test classes are schemas that help you keep things organised). Also check that the database in which the test is going to be created is correct – RedGateApp in this example: Click “Create Test”. After closing a couple of subsequent dialogues, you’ll see a dummy script for the test, that needs filling in:   We now need to define the SQL for our test. As mentioned before, tSQLt allows you to write your unit tests in T-SQL, and the code I’m going to use here is as below. This needs to be copied and pasted in to the query window, to replace the default given by tSQLt: –  Basic email check test ALTER PROCEDURE [MyChecks].[test Check Email Addresses] AS BEGIN SET NOCOUNT ON         Declare @Output VarChar(max)     Set @Output = ”       SELECT  @Output = @Output + Email +Char(13) + Char(10) FROM dbo.Email WHERE email NOT LIKE ‘%_@__%.__%’       If @Output > ”         Begin             Set @Output = Char(13) + Char(10)                           + @Output             EXEC tSQLt.Fail@Output         End   END;   Once this script is entered, hit execute to add the Stored Procedure to the database. Before committing the test to source control,  it’s worth just checking that it works! For a positive test, click on “SQL Test” from the Tools menu, then click Run Tests. You should see output like the following: - a green tick to indicate success! But of course, what we also need to do is test that this is actually doing something by showing a failed test. Edit one of the email addresses in your table to an incorrect format: Now, re-run the same SQL Test as before and you’ll see the following: Great – we now know that our test is really doing something! You’ll also see a useful error message at the bottom of SSMS: (leave the email address as invalid for now, for the next steps). The next stage is to check this new test in to source control again, by right-clicking on the database and checking in the changes with a commit message (and not forgetting to sync in the GitHub client):   Checking that the Tests are Running as Integration Tests After the changes above are made, and after a build has run on Bamboo (manual or automatic), looking at the Stored Procedures for the RedGateAppCI, the SPROC for the new test has been moved over to the database. However this is not exactly what we were after. We didn’t want to just copy objects from one database to another, but actually run the tests as part of the build/integration test process. I.e. we’re continuously checking any changes we make (in this case, to the reference data emails), to ensure we’re not breaking a test that we’ve set up. The behaviour we want to see is that, if we check in static data that is incorrect (as we did in step 9 above) and we have the tSQLt test set up, then our build in Bamboo should fail. However, re-running the build shows the following: - sadly, a successful build! To make sure the tSQLt tests are run as part of the integration test, we need to amend a switch in the Red Gate CI config file. First, navigate to file sqlCI.targets in your working folder: Edit this document, make the following change, save the document, then commit and sync this change in the GitHub client: <!-- tSQLt tests --> <!-- Optional --> <!-- To run tSQLt tests in source control for the database, enter true. --> <enableTsqlt>true</enableTsqlt> Now, if we re-run the build in Bamboo (NB: I’ve moved to a new server here, hence different address and build number): - superb, a broken build!! The error message isn’t great here, so to get more detailed info, click on the full build log link on this page (below the fold). The interesting part of the log shown is towards the bottom. Pulling out this part:   21-Jun-2013 11:35:19 Build FAILED. 21-Jun-2013 11:35:19 21-Jun-2013 11:35:19 "C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj" (default target) (1) -> 21-Jun-2013 11:35:19 (sqlCI target) -> 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: RedGate.Deploy.SqlServerDbPackage.Shared.Exceptions.InvalidSqlException: Test Case Summary: 1 test case(s) executed, 0 succeeded, 1 failed, 0 errored. [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: [MyChecks].[test Check Email Addresses] failed: [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: ringo.starr@beatles [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: +----------------------+ [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj] 21-Jun-2013 11:35:19 EXEC : sqlCI error occurred: |Test Execution Summary| [C:\Users\Administrator\bamboo-home\xml-data\build-dir\RGA-RGP-JOB1\sqlCI.proj]   As a final check, we should make sure that, if we now fix this error, the build succeeds. So in SSMS, I’m going to correct the invalid email address, then check this change in to SQL Source Control (with a comment), commit to GitHub, and re-run the build:   This should have fixed the build: It worked! Summary This has been a very quick run through the implementation of CI for databases, including tSQLt tests to test whether your database updates are working. The next post in this series will focus on automated deployment – we’ve tested our database changes, how can we now deploy these to target sites?  

    Read the article

  • Change Tracking

    - by Ricardo Peres
    You may recall my last post on Change Data Control. This time I am going to talk about other option for tracking changes to tables on SQL Server: Change Tracking. The main differences between the two are: Change Tracking works with SQL Server 2008 Express Change Tracking does not require SQL Server Agent to be running Change Tracking does not keep the old values in case of an UPDATE or DELETE Change Data Capture uses an asynchronous process, so there is no overhead on each operation Change Data Capture requires more storage and processing Here's some code that illustrates it's usage: -- for demonstrative purposes, table Post of database Blog only contains two columns, PostId and Title -- enable change tracking for database Blog, for 2 days ALTER DATABASE Blog SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON); -- enable change tracking for table Post ALTER TABLE Post ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON); -- see current records on table Post SELECT * FROM Post SELECT * FROM sys.sysobjects WHERE name = 'Post' SELECT * FROM sys.sysdatabases WHERE name = 'Blog' -- confirm that table Post and database Blog are being change tracked SELECT * FROM sys.change_tracking_tables SELECT * FROM sys.change_tracking_databases -- see current version for table Post SELECT p.PostId, p.Title, c.SYS_CHANGE_VERSION, c.SYS_CHANGE_CONTEXT FROM Post AS p CROSS APPLY CHANGETABLE(VERSION Post, (PostId), (p.PostId)) AS c; -- update post UPDATE Post SET Title = 'First Post Title Changed' WHERE Title = 'First Post Title'; -- see current version for table Post SELECT p.PostId, p.Title, c.SYS_CHANGE_VERSION, c.SYS_CHANGE_CONTEXT FROM Post AS p CROSS APPLY CHANGETABLE(VERSION Post, (PostId), (p.PostId)) AS c; -- see changes since version 0 (initial) SELECT p.Title, c.PostId, SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION, SYS_CHANGE_COLUMNS, SYS_CHANGE_CONTEXT FROM CHANGETABLE(CHANGES Post, 0) AS c LEFT OUTER JOIN Post AS p ON p.PostId = c.PostId; -- is column Title of table Post changed since version 0? SELECT CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('Post'), 'Title', 'ColumnId'), (SELECT SYS_CHANGE_COLUMNS FROM CHANGETABLE(CHANGES Post, 0) AS c)) -- get current version SELECT CHANGE_TRACKING_CURRENT_VERSION() -- disable change tracking for table Post ALTER TABLE Post DISABLE CHANGE_TRACKING; -- disable change tracking for database Blog ALTER DATABASE Blog SET CHANGE_TRACKING = OFF; You can read about the differences between the two options here. Choose the one that best suits your needs! SyntaxHighlighter.config.clipboardSwf = 'http://alexgorbatchev.com/pub/sh/2.0.320/scripts/clipboard.swf'; SyntaxHighlighter.brushes.CSharp.aliases = ['c#', 'c-sharp', 'csharp']; SyntaxHighlighter.brushes.Xml.aliases = ['xml']; SyntaxHighlighter.all();

    Read the article

  • Using Find/Replace with regular expressions inside a SSIS package

    - by jamiet
    Another one of those might-be-useful-again-one-day-so-I’ll-share-it-in-a-blog-post blog posts I am currently working on a SQL Server Integration Services (SSIS) 2012 implementation where each package contains a parameter called ETLIfcHist_ID: During normal execution this will get altered when the package is executed from the Execute Package Task however we want to make sure that at deployment-time they all have a default value of –1. Of course, they tend to get changed during development so I wanted a way of easily changing them all back to the default value. Opening up each package in turn and editing them was an option but given that we have over 40 packages and we might want to carry out this reset fairly frequently I needed a more automated method so I turned to Visual Studio’s Find/Replace… feature Of course, we don’t know what value will be in that parameter so I can’t simply search for a particular value; hence I opted to use a regular expression to identify the value to be change. In the rest of this blog post I’ll explain how to do that. For demonstration purposes I have taken the contents of a .dtsx file and stripped out everything except the element containing the parameters (<DTS:PackageParameters>), if you want to play along at home you can copy-paste the XML document below into a new XML file and open it up in Visual Studio: <?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts">   <DTS:PackageParameters>     <DTS:PackageParameter       DTS:CreationName=""       DTS:DataType="3"       DTS:Description="InterfaceHistory_ID: used for Lineage"       DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}"       DTS:ObjectName="ETLIfcHist_ID">       <DTS:Property         DTS:DataType="3"         DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property>     </DTS:PackageParameter>     <DTS:PackageParameter       DTS:CreationName=""       DTS:DataType="3"       DTS:Description="Some other description"       DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25845C7E}"       DTS:ObjectName="SomeOtherObjectName">       <DTS:Property         DTS:DataType="3"         DTS:Name="ParameterValue">SomeOtherValue</DTS:Property>     </DTS:PackageParameter>   </DTS:PackageParameters> </DTS:Executable> We are trying to identify the value of the parameter whose name is ETLIfcHist_ID – notice that in the XML document above that value is VALUE_TO_BE_CHANGED. The following regular expression will find the appropriate portion of the XML document: {\<DTS\:PackageParameter[\n ]*DTS\:CreationName="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Description="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DTSID="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:ObjectName="ETLIfcHist_ID"\>[\n ]*\<DTS\:Property[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Name="ParameterValue"\>}[A-Za-z0-9\:_\{\}- ]*{\<\/DTS\:Property\>} I have highlighted the name of the parameter that we’re looking for. I have also highlighted two portions identified by pairs of curly braces “{…}”; these are important because they pick out the two portions either side of the value I want to replace, in other words the portions highlighted here: <DTS:PackageParameters>     <DTS:PackageParameter       DTS:CreationName=""       DTS:DataType="3"       DTS:Description="InterfaceHistory_ID: used for Lineage"       DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}"       DTS:ObjectName="ETLIfcHist_ID">       <DTS:Property         DTS:DataType="3"         DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property>     </DTS:PackageParameter> Those sections in the curly braces are termed tag expressions and can be identified in the replace expression using a backslash and a number identifying which tag expression you’re referring to according to its ordinal position. Hence, our replace expression is simply: \1-1\2 We’re saying the portion of our file identified by the regular expression should be replaced by the first curly brace section, then the literal –1, then the second curly brace section. Make sense? Give it a go yourself by plugging those two expressions into Visual Studio’s Find and Replace dialog. If you set it to look in “All Open Documents” then you can open up the code-behind of all your packages and change all of them at once. The Find and Replace dialog will look like this: That’s it! I realise that not everyone will be looking to change the value of a parameter but hopefully I have shown you a technique that you can modify to work for your own scenario. Given that this blog post is, y’know, on the web I have no doubt that someone is going to find a fault with my find regex expression and if that person is you….that’s OK. Let me know about it in the comments below and perhaps we can work together to come up with something better! Note that some parameters may have a different set of properties (for example some, but not all, of my parameters have a DTS:Required attribute) so your find regular expression may have to change accordingly. When researching this I found the following article to be invaluable: Visual Studio Find/Replace Regular Expression Usage @Jamiet

    Read the article

  • sql server doesn’t exist or access denied

    - by kareemsaad
    I had Win7 in my pc and I installed 2vmware .One of them (VM) had Win XP and I installed on It SQL 2000 and visual studio 2008.and other I installed Win XP and I installed on it SQL 2005 and visual studio 2008. and when I run SQL2000 this error appear sql server doesn't exist or access denied Pleas verify sql server is running ........

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • Oral Tradition Check: Two Hundred Meanings for "NULL" in SQL

    - by Thomas L Holaday
    Two decades ago, the topic of "NULL" came up in conversation with a scholarly colleague. As I remember it, he said that C.J.Date, in an essay critical of commercial implementations of SQL, had listed over two hundred meanings for NULL. To my regret, I did not persist the details; but finding that list has since been on my Bucket List. Has anyone else heard this legend? Was it perhaps not Date, but another critic of commercial implementations of SQL?

    Read the article

  • Hosting a web application on discountasp.net using sql ce 5

    - by David Stanley
    I am hoping that someone may have experience with this, since the discountasp site is very lacking in straightforward answers. I am building a lightweight web application and have decided to have sql ce as the database for it. Two questions regarding this: Do i need to get an actual database hosted as well as the site, in order for it to work? Do you know if discountasp supports the use of sql ce (not with webmatrix or any cms builds, completely custom)? If they don't, do you have any experience/recommendations with getting this done?

    Read the article

  • DBMS agnostic - What to name the COUNT column from a SQL Query

    - by cyberkiwi
    I have trouble naming the COUNT() column from SQL queries and will swap between various variants _Count [Count] (sql, or "count" or backticks for MySQL etc) C Cnt CountSomething (where "something" is the field being counted, or "CountAll") NoOfRows RowCount etc Has anyone come up with any name that you are happy with and always use without hesitation? This is bothering me because after joining SO just recently, my answers have shown this tendency of flip-flopping with no consistency. I need to get this sorted. Please help. (While we're at it, what do you use for SUM etc?) Note: Before you close this question, consider that this one was not: What's the best name for a non-mutating “add” method on an immutable collection?

    Read the article

  • Windows 8 with LiveID login authenticates as Guest to remote SQl Server

    - by Tim Long
    I have a network where several users are using Office Accounting 2009 in multi-user client/server mode. OA is built on SQL Server. One PC acts as the 'server' and has the SQl Server instance, the others have only the application installed and no SQL instance, all of the apps connect remotely to the SQL instance on the 'server'. I'm using the term 'server' loosely here, it is just a normal workstation that happens to be designated as the server and runs the SQL instance. There is no NT domain, all user accounts are local accounts. The way that OA works in multi-user mode is that each user is required to have a local account with the same username and password on both the client and 'server' PCs. This has been working well, no along comes Windows 8. I use my 'Microsoft Account' aka LiveID to log into Windows 8. Office Accounting runs fine and attempts to connect to the database, but fails, 'you do not have permission to perform this operation'. In the SQL logs, I get this error: 2012-10-28 17:54:01.32 Logon Error: 18456, Severity: 14, State: 11. 2012-10-28 17:54:01.32 Logon Login failed for user 'SERVER\Guest'. Reason: Token-based server access validation failed with an infrastructure SERVER is the hostname of the server. So it seems to be authenticating as 'Guest'?? To verify this, I enabled the Guest account on the 'server' PC and then added Guest as an allowed user within Office Accounting (this simply creates the user in SQL and gives it an appropriate database role). Sure enough, My Windows 8 PC was then able to connect to the database when using Office Accounting. Clearly, having users authenticate as 'Guest' stinks from a security and auditing standpoint. So what I need are some ideas for how to work around this. I've tried switching the Windows 8 PC to a 'local account' and that works too, but requires giving up significant functionality on the Windows 8 PC. What I really need is a way to force the Windows 8 PC to use a specific set of credentials when connecting to the remote SQL instance. Office Accounting takes the logged in username, which is my LiveID and doesn't correspond to any Windows user name. Anyone solved this issue?

    Read the article

  • SQL Server Add Primary Key

    - by Derek D.
    Adding a primary key can be done either after a table is created, or at the same a table is created. It is important to note, that by default a primary key is clustered. This may or may not be the preferred method of creation. For more information on clustered vs non [...]

    Read the article

  • Peforming an Audit for SQL Server 2008

    - by Nai
    Hi all, Do you guys have any good step by step type links for performing an SQL Server 2008 Performance Audit? I know Brad McGehee has written extensively on this but for SQL Server 2005 over at http://www.sql-server-performance.com. But are any such articles for SQL Server 2008? Thanks!

    Read the article

  • SQL 2008 R2 Clustering options

    - by JacksOrBetter99
    I am looking to setup SQL 2008 R2 clustering on Windows Server 2008 R2. Can someone give me some options available for installing SQL Server clustering or best practices? I thought SQL had clustering built in, but after doing research, it looks like you first have to install Windows clustering and then install SQL on top of that.

    Read the article

  • need example sql transaction procedures for sales tracking or financial database [closed]

    - by fa1c0n3r
    hi, i am making a database for an accounting/sales type system similar to a car sales database and would like to make some transactions for the following real world actions salesman creates new product shipped onto floor (itempk, car make, year, price).   salesman changes price.   salesman creates sale entry for product sold (salespk, itemforeignkey, price sold, salesman).   salesman cancels item for removed product.   salesman cancels sale for cancelled sale    the examples i have found online are too generic...like this is a transaction... i would like something resembling what i am trying to do to understand it.  anybody have some good similar or related sql examples i can look at to design these? do people use transactions for sales databases?  or if you have done this kind of sql transaction before could you make an outline for how these could be made?  thanks  my thread so far on stack overflow... http://stackoverflow.com/q/4975484/613799

    Read the article

  • How to upgrade to SQL 2008

    - by picflight
    What is the difference between SQL2008 and SQL2008 R2? Should I unintall SQL 2005 and install SQL 2008 Web Edition? Or Should I upgrade the SQL 2005 to SQL 2008 Web Edition? I will also need to make sure the Logins are transferred over as many of my web applications have a Login on SQL2005 server.

    Read the article

  • How to convert lots of database file from MSSQL 2000 to MSSQL 2005?

    - by Tech
    Hi all, I am moving the SQL Server from MSSQL 2000 to MSSQL 2005, and I found the article in the web like this: http://www.aspfree.com/c/a/MS-SQL-Server/Moving-Data-from-SQL-Server-2000-to-SQL-Server-2005/ It works, but the problem is, it only move database one by one. Because I have so many database, is there any easy way to do so? or is there provides any batches / untitlty allow me to do so? thz u.

    Read the article

  • PASS Summit for SQL Starters

    - by Davide Mauri
    I’ve received a buch of emails from PASS Summit “First Timers” that are also somehow new to SQL Server (for “somehow” I mean people with less than 6 month experience but with some basic knowledge of SQL Server engine) or are catching up from SQL Server 2000. The common question regards the session one should not miss to have a broad view of the entire SQL Server platform have some insight into some specific areas of SQL Server Given that I’m on (semi-)vacantion and that I have more free time (not true, I have to prepare slides & demos for several conferences, PASS Summit  - Building the Agile Data Warehouse with SQL Server 2012 - and PASS 24H - Agile Data Warehousing with SQL Server 2012 - among them…but let’s pretend it to be true), I’ve decided to make a post to answer to this common questions. Of course this is my personal point of view and given the fact that the number and quality of session that will be delivered at PASS Summit is so high that is very difficoult to make a choice, fell free to jump into the discussion and leave your feedback or – even better – answer with another post. I’m sure it will be very helpful to all the SQL Server beginners out there. I’ve imposed to myself to choose 6 session at maximum for each Track. Why 6? Because it’s the maximum number of session you can follow in one day, and given that all the session will be on the Summit DVD, they are the answer to the following question: “If I have one day to spend in training, which session I should watch?”. Of course a Summit is not like a Course so a lot of very basics concept of well-established technologies won’t be found here. Analysis Services, Integration Services, MDX are not part of the Summit this time (at least for the basic part of them). Enough with that, let’s start with the session list ideal to have a good Overview of all the SQL Server Platform: Geospatial Data Types in SQL Server 2012 Inside Unstructured Data: SQL Server 2012 FileTable and Semantic Search XQuery and XML in SQL Server: Common Problems and Best Practice Solutions Microsoft's Big Play for Big Data Dashboards: When to Choose Which MSBI Tool Microsoft BI End-User Tools 360° for what concern Database Development, I recommend the following sessions Understanding Transaction Isolation Levels What to Look for in Execution Plans Improve Query Performance by Fixing Bad Parameter Sniffing A Window into Your Data: Using SQL Window Functions Practical Uses and Optimization of New T-SQL Features in SQL Server 2012 Taking MERGE Beyond the Basics For Business Intelligence Information Delivery Analyzing SSAS Data with Excel Building Compelling Power View Reports Managed Self-Service BI PowerPivot 101  SharePoint for Business Intelligence The Best Microsoft BI Tools You've Never Heard Of and for Business Intelligence Architecture & Development BI Power Hour Building a Tabular Model Database Enterprise Information Management: Bringing Together SSIS, DQS, and MDS SSIS Design Patterns Storing Columnstore Indexes Hadoop and Its Ecosystem Components in Action Beside the listed sessions, First Timers should also take a look the the page PASS set up for them: http://www.sqlpass.org/summit/2012/Connect/FirstTimers.aspx See you at PASS Summit!

    Read the article

  • SQL SERVER Get Latest SQL Query for Sessions DMV

    In recent SQL Training I was asked, how can one figure out what was the last SQL Statement executed in sessions. The query for this is very simple. It uses two DMVs and created following quick script for the same. SELECT session_id, TEXT FROM sys.dm_exec_connections CROSS APPLYsys.dm_exec_sql_text(most_recent_sql_handle) AS ST While working with DMVs if you [...]...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • SQL Insert Into Statement

    - by Derek Dieter
    The “insert into” statement is used in order to insert data into an existing table. The syntax for this is fairly simple. In the first section of the statement, you specify the table name and column names in which you are inserting data into. The second part is where the source of [...]

    Read the article

  • SQL Express 2008 R2 on Amazon EC2 instance: tons of free memory, poor performance

    - by gravyface
    The old SQL Express 2005 was running on a low-end single Xeon CPU Dell server, RAID 5 7200 disks, 2 GB RAM (SBS 2003). I have not done any baseline measurements on the old physical server, but the Web app is used by half a dozen people (maybe 2 concurrently), so I figured "how bad can an Amazon EC2 instance be?". It's pretty horrible: a difference of 8 seconds of load time on one screen. First of all, I'm not a SQL guru, but here's what I've tried: Had a Small Instance, now running a c1.medium (High Cpu Medium) Windows 2008 32-bit R2 EBS-backed instance running IIS 7.5 and SQL Express 2008 R2. No noticeable improvement. Changed Page File from fixed 256 to Automatic. Setup a Striped Mirror from within Disk Management with two attached 1 GB EBS volumes. Moved database and transaction log, left everything else on the boot EBS volume. No noticeable change. Looked at memory, ~1000 MB of physical memory free (1.7 GB total). Changed SQL instance to use a minimum of 1024 RAM; restarted server, no change in memory usage. SQL still only using ~28MB of RAM(!). So I'm thinking: this database is tiny (28MB), why isn't the whole thing cached in RAM? Surely that would speed up performance. The transaction log is 241 MB. Seems kind of large in comparison -- has this not been committed? Is it a cause of performance degradation? I recall something about Recovery Models and log sizes somewhere in my travels, but not positive. Another thing: the old server was running SQL Express 2005. Not sure if that has any impact, but I tried changing the compatibility level from SQL 2000 to 2008, but that had no effect. Anyways, what else can I try here? Seems ridiculous to throw more virtual hardware at this thing. I know I/O is going to be rough on EBS volumes, but surely others are successfully running small .NET/SQL apps on reasonably priced instances?

    Read the article

  • Row Versioning Concurrency in SQL Server

    The optimistic concurrency model assumes that several concurrent transactions can usually complete without interfering with each other, and therefore do not require draconian locking on the resources they access. SQL Server 2005, and later, implements a form of this model called row versioning concurrency. It works by remembering the value of the data at the start of the transaction and checking that no other transaction has modified it before committing. If this optimism is justified for the pattern of activity within a database, it can improve performance by greatly reducing blocking. Kalen Delaney explains how it works in SQL Server.

    Read the article

  • Determining Azure SQL Database requirements

    - by Gerald
    I'm looking into moving an SQL Server database project to the cloud using Azure SQL Database. I'm just wondering what metrics I can use from SQL Server to help determine what my needs will be on Azure. The size of the database is around 150GB, so I understand what my needs are in terms of storage, I'm just not sure what metrics I can use to translate my database usage to the DTU benchmark metrics that the various service tiers on Azure SQL use.

    Read the article

  • SQL Server Express 2008

    - by robblot
    I am going to install Visual Studio 2010 Premium. I have SQL Server 2008 Express already installed on my laptop along with Visual Studio 2008 Professional. Do I have to uninstall SQL Server 2008 Express or will it be configured to operate in the 2010 Environment?

    Read the article

  • ERP system written in SQL Server, Trigger job not running

    - by user333809
    We are a manufacturing plant that runs off an ERP system written in SQL Server. I have never worked with SQL and therefore do not know the language. What I do know is that a trigger job that was running and updating data for us, is now NO longer running. Is anyone familiar enough to answer any questions about this for me??? Any help would certainly be appreciated! Thanks, Jana

    Read the article

< Previous Page | 138 139 140 141 142 143 144 145 146 147 148 149  | Next Page >