sql server 2008r2 - Page 147

SQL Server: Find Values that don't exist in a table

- by MacAnthony

I have a list or set of values that I would like to know which ones do not currently exist in a table. I know I can find out which ones do exist with: SELECT * FROM Table WHERE column1 IN (x,x,x,x,x) The set is the values I am checking against. Is there a way to find out which values in that set do not exist in column1? Basically, I'm looking for the inverse of the sql statement above. This is for a report, so all I need is the values that don't exist to be returned back. I have and could do this with a left join and putting the values in another table, but the values I check are always different and was hoping to find a solution that didn't involve clearing a table and inserting data first. Trying to find a better solution for me if one exists.

Read the article

Select rows with same column A but different column B

- by Flip Booth

ID Zip Room ----------- ---------- ------ 317 94087 S105 318 94087 L603 1739 94404-1801 L603 1823 94401-2129 L603 1824 94401-2129 L603 2135 94404-1801 L603 2268 95136-1459 S604 2269 95136-1459 S604 3704 92673-6417 L402 4479 93454-9670 L402 4480 93454-9670 L402 4782 92395-4681 L402 4783 92395-4681 L402 4852 92886-4411 L402 4853 92886-4411 L402 4959 92673-6417 L402 5153 91773-4028 L402 5202 91773-4028 L402 5211 91765-2959 L402 5212 91765-2959 L402 5388 92336-0605 L402 5392 92336-0605 L402 5727 92870 L402 5728 92870 L402 5831 92557 L402 5916 92557 L402 How do I select ID's that has THE SAME zip but different Room ? In the table above, I want the result to be: ID Zip Room ----------- ---------- ------ 317 94087 S105 318 94087 L603 Using SQL Server 2008

Read the article

sql server - how to execute tje second half of or only when first one fails

- by fn79

Suppose I have a table with following records value text company/about about Us company company company/contactus company contact I have a very simple query in sql server as below. I am having problem with the 'or' condition. In below query, I am trying to find text for value 'company/about'. If it is not found, then only I want to run the other side of 'or'. The below query returns two records as below value text company/about about Us company company Query select * from tbl where value='company/about' or value=substring('company/about',0,charindex('/','company/about')) How can I modify the query so the result set looks like value text company/about about Us

Read the article

Creating audit triggers in SQL Server

- by Mike C.

I need to implement change tracking on two tables in my SQL Server 2005 database. I need to audit additions, deletions, updates (with detail on what was updated). I was planning on using a trigger to do this, but after poking around on Google I found that it was incredibly easy to do this incorrectly, and I wanted to avoid that on the get-go. Can anybody post an example of an update trigger that accomplishes this successfully and in an elegant manner? I am hoping to end up with an audit table with the following structure: ID LogDate TableName TransactionType (update/insert/delete) RecordID FieldName OldValue NewValue ... but I am open for suggestions. Thanks!

Read the article

Simplifying CASE WHEN SQL statement

- by kateroh

Im trying to improve the following CASE statement to calculate the difference only once. I do it to avoid negative numbers: SELECT (CASE WHEN ((SELECT 100 - (SELECT COUNT(CustomerId) FROM Customers)) > 0) THEN (SELECT 100 - (SELECT COUNT(CustomerId) FROM Customers)) ELSE (0) END) This not only looks stupid, but also is not thread-safe. I tried the following, but I get an error message "Invalid column name 'diff'." SELECT (CASE WHEN ((SELECT 100 - (SELECT COUNT(CustomerId) FROM Customers) as diff) > 0) THEN (diff) ELSE (0) END) How can this be simplified? Is there an in-built SQL function that already does this job? EDIT: Sorry, forgot to mention that the select statement is inside of a view declaration, so I cant declare variables.

Read the article

How to find a between dates using sql?

- by rajeeshmenoth

How to reject without saving dates in database? Eg: the two columns in a database are from_date and to_date From date : 25/08/2014 To date : 29/08/2014 Problem: the above dates are saved in a two fields like from_date and to_date (room reservation booking), the next reservation details I don't want the date between 25/08/2014 to 29/08/2014. The between dates are not saved in database. Only the from date and to date are saving into the database. How to block between date using sql?

Read the article

Optimzing TSQL code

- by adopilot

My job is the maintain one application which heavy use SQL server (MSSQL2005). Until now middle server stores TSQL codes in XML and send dynamic TSQL queries without using stored procs. As I am able change those XML queries I want to migrate most of my queries to stored procs. Question is folowing: Most of my queries have same Where conditions against one table Sample: Select ..... from .... where .... and (a.vrsta_id = @vrsta_id or @vrsta_id = 0) and (a.podvrsta_id = @podvrsta_id or @podvrsta_id = 0) and (a.podgrupa_2 = @podgrupa2_id or @podgrupa2_id = 0) and ( (a.id in (select art_id from osobina_veze where podosobina_id in (select ado from dbo.fn_ado_param_int(@podosobina)) group by art_id having count(art_id)= @podosobina_count )) or ('0' = @podosobina) ) They also have same where conditions on other table. How I should organize my code ? What is proper way ? Should I make table valued function that I will use in all queries or use #Temp tables and simple inner join my query to that each time when proc executing? or use #temp filed by table valued function ? or leave all queries with this large where clause and hope that index is going to do their jobs. or use WITH(statement)

Read the article

Shrinking the transaction log of a mirrored SQL Server 2005 database

- by Peter Di Cecco

I've been looking all over the internet and I can't find an acceptable solution to my problem, I'm wondering if there even is a solution without a compromise... I'm not a DBA, but I'm a one man team working on a huge web site with no extra funding for extra bodies, so I'm doing the best I can. Our backup plan sucks, and I'm having a really hard time improving it. Currently, there are two servers running SQL Server 2005. I have a mirrored database (no witness) that seems to be working well. I do a full backup at noon and at midnight. These get backed up to tape by our service provider nightly, and I burn the backup files to dvd weekly to keep old records on hand. Eventually I'd like to switch to log shipping, since mirroring seems kinda pointless without a witness server. The issue is that the transaction log is growing non-stop. From the research I've done, it seems that I can't truncate a log file of a mirrored database. So how do I stop the file from growing!? Based on this web page, I tried this: USE dbname GO CHECKPOINT GO BACKUP LOG dbname TO DISK='NULL' WITH NOFORMAT, INIT, NAME = N'dbnameLog Backup', SKIP, NOREWIND, NOUNLOAD GO DBCC SHRINKFILE('dbname_Log', 2048) GO But that didn't work. Everything else I've found says I need to disable the mirror before running the backup log command in order for it to work. My Question (TL;DR) How can I shrink my transaction log file without disabling the mirror?

Read the article

SQL Developer Quick Tip: Reordering Columns

- by thatjeffsmith

Do you find yourself always scrolling and scrolling and scrolling to get to the column you want to see when looking at a table or view’s data? Don’t do that! Instead, just right-click on the column headers, select ‘Columns’, and reorder as desired. Access the Manage Columns dialog Then move up the columns you want to see first… Put them in the order you want – it won’t affect the database. Now I see the data I want to see, when I want to see it – no scrolling. This will only change how the data is displayed for you, and SQL Developer will remember this ordering until you ‘Delete Persisted Settings…’ What IS Remembered Via These ‘Persisted Settings?’ Column Widths Column Sorts Column Positions Find/Highlights This means if you manipulate one of these settings, SQL Developer will remember them the next time you open the tool and go to that table or view. Don’t know what I mean by ‘Find/Highlight?’ Find and highlight values in a grid with Ctrl+F

Read the article

SQL Server 2008 Best Practices Analyzer - keep an eye for the release

- by ssqa.net

What practice do you classify as a best practice? The answer is its not a rocket science, you don't need any specific formula to satisfy the need! Ok what if a tool can follow those common best practices & perform ... read from here ....(read more)

Read the article

SSIS - XML Source Script

- by simonsabin

The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml"); foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer") from order in customer.Elements("Orders").Elements("Order") select new { Account = customer.Attribute("AccountNumber").Value , OrderDate = order.Attribute("OrderDate").Value } )) { Output0Buffer.AddRow(); Output0Buffer.AccountNumber = xdata.Account; Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer") from order in customer.Elements("Orders").Elements("Order") select new { Account = customer.Attribute("AccountNumber").Value , OrderDate = order.Attribute("OrderDate").Value } )) { Output0Buffer.AddRow(); Output0Buffer.AccountNumber = xdata.Account; Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) { using (XmlReader xr = XmlReader.Create(filename)) { xr.MoveToContent(); while (xr.Read()) //Reads the first element { while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName) { XElement node = (XElement)XElement.ReadFrom(xr); yield return node; } } xr.Close(); } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.

Read the article

Trace Flag 610 – When should you use it?

- by simonsabin

Thanks to Marcel van der Holst for providing this great information on the use of Trace Flag 610. This trace flag can be used to have minimal logging into a b tree (i.e. clustered table or an index on a heap) that already has data. It is a trace flag because in testing they found some scenarios where it didn’t perform as well. Marcel explains why below. “ TF610 can be used to get minimal logging in a non-empty B-Tree. The idea is that when you insert a large amount of data, you don't want to...(read more)

Read the article

Joins in LINQ to SQL

- by rajbk

The following post shows how to write different types of joins in LINQ to SQL. I am using the Northwind database and LINQ to SQL for these examples. NorthwindDataContext dataContext = new NorthwindDataContext(); Inner Join var q1 = from c in dataContext.Customers join o in dataContext.Orders on c.CustomerID equals o.CustomerID select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID], [t1].[OrderDate]FROM [dbo].[Customers] AS [t0]INNER JOIN [dbo].[Orders] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID] Left Join var q2 = from c in dataContext.Customers join o in dataContext.Orders on c.CustomerID equals o.CustomerID into g from a in g.DefaultIfEmpty() select new { c.CustomerID, c.ContactName, a.OrderID, a.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID] AS [OrderID], [t1].[OrderDate] AS [OrderDate]FROM [dbo].[Customers] AS [t0]LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID] Inner Join on multiple //We mark our anonymous type properties as a and b otherwise//we get the compiler error "Type inferencce failed in the call to 'Join’var q3 = from c in dataContext.Customers join o in dataContext.Orders on new { a = c.CustomerID, b = c.Country } equals new { a = o.CustomerID, b = "USA" } select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID], [t1].[OrderDate]FROM [dbo].[Customers] AS [t0]INNER JOIN [dbo].[Orders] AS [t1] ON ([t0].[CustomerID] = [t1].[CustomerID]) AND ([t0].[Country] = @p0) Inner Join on multiple with ‘OR’ clause var q4 = from c in dataContext.Customers from o in dataContext.Orders.Where(a => a.CustomerID == c.CustomerID || c.Country == "USA") select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID], [t1].[OrderDate]FROM [dbo].[Customers] AS [t0], [dbo].[Orders] AS [t1]WHERE ([t1].[CustomerID] = [t0].[CustomerID]) OR ([t0].[Country] = @p0) Left Join on multiple with ‘OR’ clause var q5 = from c in dataContext.Customers from o in dataContext.Orders.Where(a => a.CustomerID == c.CustomerID || c.Country == "USA").DefaultIfEmpty() select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID] AS [OrderID], [t1].[OrderDate] AS [OrderDate]FROM [dbo].[Customers] AS [t0]LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON ([t1].[CustomerID] = [t0].[CustomerID]) OR ([t0].[Country] = @p0)

Read the article

Chris Date on "SQL and Relational Theory - How to Write Accurate SQL Code"

- by MartinBell

The importance of relational theory......(read more)

Read the article

SQL Sharding and SQL Azure…

- by Dave Noderer

Herve Roggero has just published a paper that outlines patterns for scaling using SQL Azure and the Blue Syntax (he and Scott Klein’s company) sharding api. You can find the paper at: http://www.bluesyntax.net/files/EnzoFramework.pdf Herve and Scott have also just released an Apress book Pro SQL Azure. The idea of being able to split (shard) database operations automatically and control them from a web based management console is very appealing. These ideas have been talked about for a long time and implemented in thousands of very custom ways that have been costly, complicated and fragile. Now, there is light at the end of the tunnel. Scaling database access will become easier and move into the mainstream of application development. The main cost is using an api whenever accessing the database. The api will direct the query to the correct database(s) which may be located locally or in the cloud. It is inevitable that the api will change in the future, perhaps incorporated into a Microsoft offering. Even if this is the case, your application has now been architected to utilize these patterns and details of the actual api will be less important. Herve does a great job of laying out the concepts which every developer and architect should be familiar with!

Read the article

Antivirus Configuration for dedicated SQL and dedicated IIS Servers

- by Wayne Arthurton

Our corporate standard is McAfee Enterprise, unfortunately this is non-negotiable. On two types of servers I'm responsible for, SQL & Web, we have noticed major performance issues with the corporate standard setup. Max scan time 45sec One policy for all processes Scan ALL files on write, read and open for backup Heuristics: Find unknown programs, trojans and macros Detect unwanted programs Exclude: EVT, LDF, LOG, MDF, VMD, , windows file protection) This of course still causes major slowdowns. IIS .NET recompiles are slow especially with SharePoint, SQL backups and restores, SQL Analysis Services, Integration Services and temp data from them as well. I have looked from time to time, for some best practices on setting up McAfee of SQL & SQL Analysis Service, SQL Integration Service, Visual Studio, Sharepoint, and .NET web servers in general. How do people setup McAfee enterprise on their corporate serves keeping security intact, but affecting performance as minimally as possible? Has anyone run across white papers on these setups? Obviously some are case by case, but there must be some best practices out there somewhere.

Read the article

Formatting Keywords to UPPERCASE In Oracle SQL Developer

- by thatjeffsmith

I received this question from a customer today, and it took me more than a few minutes to remember where this preference was located in SQL Developer. This tells me that the topic is ripe for blogging How do I go FROM: select * from scott.emp where ename like '%JEFF%' TO SELECT * FROM scott.emp WHERE ename LIKE '%JEFF%' It’s all in the formatting You need to access the formatting preferences under the Tools menu. It takes a bit of navigating to get there, so bear with me: Tools Database SQL Formatter Oracle Formatting Click ‘Edit’ on the profile Other Case change: ‘Keywords Uppercase’ It’s easy to find once you know where to look? You can tell it to leave the case alone, upper everything, upper only the keywords, lower everything. Accessing the Formatter Options We allow separate formatting options for different RDBMS. You need to make sure you’re accessing the ‘Oracle Formatting’ page in the preferences. You can then choose to edit the default options OR you can do what I have done – save the defaults as a new set of options. I’ve called my profile ‘JeffCustom.’ I can now switch back and forth now through different sets of formatting options. You need to hit the ‘Edit’ button to get to the formatting options editor. A good number of people seem to miss this. Select your profile, then hit the ‘Edit’ button

Read the article

MDX Studio download #mdx #ssas

- by Marco Russo (SQLBI)

Short version: the latest available version of MDX Studio can be downloaded from http://www.sqlbi.com/tools/mdx-studio/ Long version: Last week Stacia Misner twitted that the online version of MDX Studio was no longer available. It was hosted on http://mdx.mosha.com. It was a sad news, and it is also not good that nobody is maintaining the desktop version of MDX Studio. The latest release is the 0.4.14 and as I am writing it is still available on a SkyDrive link provided by Mosha Pasumansky, who wrote MDX Studio. Mosha does not work in Microsoft now and the entire BI community hopes that somebody will continue its work on this product. Unfortunately, it cannot be published on CodePlex because of some IP restrictions. Only bad news? Well, I hope no. The first good news is that MDX Studio also works with Analysis Services 2012 in Multidimensional mode. The second news is that, after having checked that we can do that, we created a web page on SQLBI web site to download the latest available release of MDX Studio. I hope it will be necessary to update it in the future, by now it is just a way to simplify the finding and download of this precious tool, and to grant that it will not disappear in case the current SkyDrive using to host the download would be discontinued, like it happened to the MDX Studio online version. Now a question to the BI Community: I know that there was some content available regarding tutorial on MDX Studio. I’d like to gather it and to put all in a single place. If you have such content, please contact me directly writing to marco (dot) russo (at) sqlbi [dot] com. Thanks!

Read the article

Connecting to SQL database using SQLCMD

- by kaleidoscope

As we all know, there are a number of ways you can connect to your SQL Azure Database. One of the quick options is to try to connect to SQL server is SQLCMD. To start the SQLCMD utility and connect to a named instance of SQL Server Open a Command Prompt window, and type sqlcmd -S myServer\instanceName. Replace myServer\instanceName with the name of the computer and the instance of SQL Server that you want to connect to. Press ENTER. The sqlcmd prompt (1>) indicates that you are connected to the specified instance of SQL Server. SQL Management Studio offers the facility to use SQLCMD from within SQL scripts by using SQLCMD Mode. How to: Enable SQLCMD mode in the Transact-SQL Editor (About how to start the editor, see How to: Start the Transact-SQL Editor.) To toggle SQLCMD mode from the Data menu 1. Open the query in the Transact-SQL editor. 2. On the Data menu, point to Transact-SQL Editor, and click SQLCMD Mode. To toggle SQLCMD mode from the toolbar 1. Open the query in the Transact-SQL editor. 2. On the Transact-SQL Editor toolbar, click SQLCMD Mode. To toggle SQLCMD mode from the shortcut menu 1. Open the query in the Transact-SQL editor. 2. Right-click anywhere in the editor window, and then click SQLCMD Mode. For more information follow below link http://msdn.microsoft.com/en-us/library/ms170207.aspx Geeta, G

Read the article

Can we have Linked Servers when using NTLM?

- by BlueRaja

I don't have access to the Active Directory settings, nor do I have access to change anything on the linked server. From everything I've read, it seems like this means I cannot use Kerberos - which is a big problem, because I don't know how to use a linked server without it. Is there any way to connect to a linked server without Kerberos? Exact problem description When I connect to the linked server while sitting in front of my server, it works fine; but when I try to connect to the linked server from any other computer (delegating through my server), it gives the error: Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. (Microsoft SQL Server, Error: 18456) It seems that this is the "double-hop problem," and the usual solution is to enable Kerberos, which requires access to AD and the linked server. I get the same error when I set security to "Be made using the login's current security context," and I can't use "Be made using this security context" because that appears to use SQL-authentication (which is not enabled on the linked server) instead of NTLM

Read the article

Can’t connect to SQL Server 2008 - looks like Shared Memory problem

- by user38556

I am unable to connect to my local instance of SQL Server 2008 Express using SQL Server Management Studio. I believe the problem is related to a change I made to the connection protocols. Before the error occurred, I had Shared Memory enabled and Named Pipes and TCP/IP disabled. I then enabled both Named Pipes and TCP/IP, and this is when I started experiencing the problem. When I try to connect to the server with SSMS (with either my SQL server sysadmin login or with windows authentication), I get the following error message: A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) Why is it returning a Named Pipes error? Why would it not just use Shared Memory, as this has a higher priority order in the list of connection protocols? It seems like it is not listening on Shared Memory for some reason? When I set Named Pipes to enabled and try to connect, I get the same error message. My windows account is does not have administrator priviliges on my computer - perhaps this is making a difference in some way (as some of the discussions in this post about an "SuperSocketNetLib\Lpc" registry key seems to suggest). I have tried restarting the SQL Server service, by the way, and also tried to get someone to log onto the machine with an admin account to restart the SQL Server service. Still no luck.

Read the article

Server side rules in Outlook 2007

- by AngryHacker

Using Outlook 2007 with an Exchange Server. I am trying to setup a server-side rule, but regardless of what I do, it always sets up a client-only rule. I am trying to copy a message to a folder if it comes from a certain person and the subject contains certain words. What am I missing? Update: It seems that setting the rule to mark the message read-only forces the message to be client-side. Anyway to get around that?

Read the article

SQL server queries are really slow only on first run

- by JoelFan

Somewhat strange problem... when I start my .NET app for the first time after rebooting my machine, the SQL Server queries are really slow... when I pause the debugger, I notice that it's hanging on getting the response from the query. This only happens when connecting to a remote SQL server (2008)... if I connect to one on my local machine, it's fine. Also, if I restart the app, it works fast, even off the remote SQL server, and subsequent runs are also fine. The only problem is when I connect to a remote SQL server for the first time after rebooting my machine. What's more, I have even noticed this same exact behavior with a 3rd party app (also .NET) that also connects to a remote SQL server. Another piece of info... this has only started hapenning since I upgraded my machine from XP to Win7 (64 bit). Also, other developers on my team who upgraded to Win7 are seeing the same behavior (both with the app we're developing and the 3rd party .NET app). (copied from http://stackoverflow.com/questions/2014814/sql-server-queries-are-really-slow-only-on-first-run )

Read the article

SSMS Tools Pack now supports Denali CTP1

- by AaronBertrand

Earlier today, Mladen Prajdic ( blog | twitter ) released an updated version of his SSMS Tools Pack (v.1.9.4), a free add-in for Management Studio that provides a ton of helpful functionality that isn't available with the native tools. I'm really glad this happened, because I've installed Denali on all of my VMs and have been using it for most of my work, and I've been missing some of the little things the tool adds. In addition to adding Denali support, Mladen also fixed a handful of minor bugs...(read more)

Read the article

The Data Scientist

- by BuckWoody

A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

Search Results

Search found 98454 results on 3939 pages for 'sql server 2008r2'.

Page 147/3939 | < Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154 | Next Page >

- by MacAnthony

- by Flip Booth

- by fn79

- by Mike C.

- by kateroh

- by rajeeshmenoth

- by adopilot

- by Peter Di Cecco

- by thatjeffsmith

- by ssqa.net

- by simonsabin

- by simonsabin

- by rajbk

- by MartinBell

- by Dave Noderer

- by Wayne Arthurton

- by thatjeffsmith

- by Marco Russo (SQLBI)

- by kaleidoscope

- by BlueRaja

- by user38556

- by AngryHacker

- by JoelFan

- by AaronBertrand

- by BuckWoody

< Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154 | Next Page >