Search Results

Search found 37012 results on 1481 pages for 'sql query'.

Page 139/1481 | < Previous Page | 135 136 137 138 139 140 141 142 143 144 145 146  | Next Page >

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • Connecting MS SQL using freetds and unixodbc: isql - no default driver specified

    - by Dejan
    I am trying to connect to the MS SQL database using freetds and unixodbc. I have read various guides how to do it, but no one works fine for me. When I try to connect to the database using isql tool, I get the following error: $ isql -v TS username password [IM002][unixODBC][Driver Manager]Data source name not found, and no default driver specified [ISQL]ERROR: Could not SQLConnect Have anybody already successfully established the connection to the MS SQL database using freetds and unixodbc on Ubuntu 12.04? I would really appreciate some help. Below is the procedure I used to configure the freetds and unixodbc. Thanks for your help in advance! Procedure First, I have installed the following packages sudo apt-get unixodbc unixodbc-dev freetds-dev tdsodbc and configured freetds as follows: --- /etc/freetds/freetds.conf --- [TS] host = SERVER port = 1433 tds version = 7.0 client charset = UTF-8 Using tsql tool I can successfully connect to the database by executing tsql -S TS -U username -P password As I need an odbc connection I configured odbcinst.ini as follows: --- /etc/odbcinst.ini --- [FreeTDS] Description = FreeTDS Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so FileUsage = 1 CPTimeout = CPResuse = client charset = utf-8 and odbc.ini as follows: --- /etc/odbc.ini --- [TS] Description = "test" Driver = FreeTDS Servername = SERVER Server = SERVER Port = 1433 Database = DBNAME Trace = No Trying to connect to the database using isql tool with such a configuration results the following error: $ isql -v TS username password [IM002][unixODBC][Driver Manager]Data source name not found, and no default driver specified [ISQL]ERROR: Could not SQLConnect

    Read the article

  • SQL in the City (Charlotte) Wrap Up

    - by drsql
    Ok, it has been quite a while since the event, two weeks and a day to be exact, but I needed a rest before hitting Windows Live Writer again. Speaking is exhausting, traveling is exhausting, and well, I replaced my laptop and had to get all of my software back together. (Between Windows 8.1 sync features, Dropbox and Skydrive, it has never been easier…but I digress.) There are plenty of great vendors out there, but one of my favorites has always been Red-Gate. I have written half of a book with them,...(read more)

    Read the article

  • SSMS Tools Pack now supports Denali CTP1

    - by AaronBertrand
    Earlier today, Mladen Prajdic ( blog | twitter ) released an updated version of his SSMS Tools Pack (v.1.9.4), a free add-in for Management Studio that provides a ton of helpful functionality that isn't available with the native tools. I'm really glad this happened, because I've installed Denali on all of my VMs and have been using it for most of my work, and I've been missing some of the little things the tool adds. In addition to adding Denali support, Mladen also fixed a handful of minor bugs...(read more)

    Read the article

  • Smart defaults [SSDT]

    - by jamiet
    I’ve just discovered a new, somewhat hidden, feature in SSDT that I didn’t know about and figured it would be worth highlighting here because I’ll bet not many others know it either; the feature is called Smart Defaults. It gets around the problem of adding a NOT NULLable column to an existing table that has got data in it – previous to SSDT you would need to define a DEFAULT constraint however it does feel rather cumbersome to create an object purely for the purpose of pushing through a deployment – that’s the situation that Smart Defaults is meant to alleviate. The Smart Defaults option exists in the advanced section of a Publish Profile file: The description of the setting is “Automatically provides a default value when updating a table that contains data with a column that does not allow null values”, in other words checking that option will cause SSDT to insert an arbitrary default value into your newly created NON NULLable column. In case you’re wondering how it does it, here’s how: SSDT creates a DEFAULT CONSTRAINT at the same time as the column is created and then immediately removes that constraint: ALTER TABLE [dbo].[T1]    ADD [C1] INT NOT NULL,         CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b] DEFAULT 0 FOR [C1];ALTER TABLE [dbo].[T1] DROP CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b]; You can then update the value as appropriate in a Post-Deployment script. Pretty cool! On the downside, you can only specify this option for the whole project, not for an individual table or even an individual column – I’m not sure that I’d want to turn this on for an entire project as it could hide problems that a failed deployment would highlight, in other words smart defaults could be seen to be “papering over the cracks”. If you think that should be improved go and vote (and leave a comment) at [SSDT] Allow us to specify Smart defaults per table or even per column. @Jamiet

    Read the article

  • T-SQL Tuesday #005: On Technical Reporting

    - by Adam Machanic
    Reports. They're supposed to look nice. They're supposed to be a method by which people can get vital information into their heads. And that's obvious, right? So obvious that you're undoubtedly getting ready to close this tab and go find something better to do with your life. "Why is Adam wasting my time with this garbage?" Because apparently, it's not obvious. In the world of reporting we have a number of different types of reports: business reports, status reports, analytical reports, dashboards,...(read more)

    Read the article

  • Always use dtexec.exe to test performance of your dataflows. No exceptions.

    - by jamiet
    Earlier this evening I posted a blog post entitled Investigation: Can different combinations of components effect Dataflow performance? where I compared the performance of three different dataflows all working to the same overall goal. I wanted to make one last point related to the results but I thought it warranted a blog post all of its own. Here is a screenshot of one of the dataflows that I was testing: Pretty complicated I’m sure you’ll agree. Now, when I executed this dataflow in the test it was executing in ~19seconds however in that case I was executing using the command-line tool dtexec. I also tried executing inside the BIDS development environment and in that case it took much longer – 139seconds. That’s more than seven times as long. The point I want to make is very simple. If you are testing your dataflows for performance please use dtexec. Nothing else will suffice. @Jamiet

    Read the article

  • SQL Saturday Birmingham #328 Database Design Precon In One Week

    - by drsql
    On September 22, I will be doing my "How to Design a Relational Database" pre-conference session in Birmingham, Alabama. You can see the abstract here if you are interested, and you can sign up there too, naturally. At just $100, which includes a free ebook copy of my database design book, it is a great bargain and I totally promise it will be a little over 7 hours of talking about and designing databases, which will certainly be better than what you do on a normal work day, even a Friday....(read more)

    Read the article

  • Does multiple files in SQL Server when using RAID help reduce conflicts in growth and file-locking?

    - by Dr Giles M
    I've been reading around and get the impression that if you are using RAID then using multiple SQL Server files within a filegroup won't yeild any more improvements, and the benefits are purely administrative (if you started to run out of space or wanted to partition off data into managable chunks for backups/balancing the data around your big server room). However, being a reasonably savvy software person, it's not unthinkable to hypothesise that, even for smaller databases that SQL Server will perform growth and locking operations (for writes) on a LOGICAL file basis, so even if you are using RAID, it seems to make sense to have multiple files in a file group to balance I/O, or does the time taken to reconstruct the data from distributed filegroups outweigh the benefits of reduced locking? I'm also aware that the behaviour and benefits may be different for tables/indeces/log. Is there a good site that distinguishes the benefits of multiple files when RAID is already in place?

    Read the article

  • Smart defaults [SSDT]

    - by jamiet
    I’ve just discovered a new, somewhat hidden, feature in SSDT that I didn’t know about and figured it would be worth highlighting here because I’ll bet not many others know it either; the feature is called Smart Defaults. It gets around the problem of adding a NOT NULLable column to an existing table that has got data in it – previous to SSDT you would need to define a DEFAULT constraint however it does feel rather cumbersome to create an object purely for the purpose of pushing through a deployment – that’s the situation that Smart Defaults is meant to alleviate. The Smart Defaults option exists in the advanced section of a Publish Profile file: The description of the setting is “Automatically provides a default value when updating a table that contains data with a column that does not allow null values”, in other words checking that option will cause SSDT to insert an arbitrary default value into your newly created NON NULLable column. In case you’re wondering how it does it, here’s how: SSDT creates a DEFAULT CONSTRAINT at the same time as the column is created and then immediately removes that constraint: ALTER TABLE [dbo].[T1]    ADD [C1] INT NOT NULL,         CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b] DEFAULT 0 FOR [C1];ALTER TABLE [dbo].[T1] DROP CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b]; You can then update the value as appropriate in a Post-Deployment script. Pretty cool! On the downside, you can only specify this option for the whole project, not for an individual table or even an individual column – I’m not sure that I’d want to turn this on for an entire project as it could hide problems that a failed deployment would highlight, in other words smart defaults could be seen to be “papering over the cracks”. If you think that should be improved go and vote (and leave a comment) at [SSDT] Allow us to specify Smart defaults per table or even per column. @Jamiet

    Read the article

  • Is the SAN dying???

    - by RickHeiges
    Is the SAN dying? The reason that I ask this question is that MSFT has unleashed technologies this year that point in that direction Always ON Availability Groups shuns shared storage Windows 2012 has Storage Replication Technology that does not require a SAN Windows 2012 has Hyper-V Replica Technology that does not require a SAN PDW v2 continues to reinforce the approach to avoid shared storage I'm not saying that SAN technology does not have its place or does not have benefits inherent to the beast....(read more)

    Read the article

  • Defaults for Exporting Data in Oracle SQL Developer

    - by thatjeffsmith
    I was testing a reported bug in SQL Developer today – so the bug I was looking for wasn’t there (YES!) but I found a different one (NO!) – and I was getting frustrated by having to check the same boxes over and over again. What I wanted was INSERT STATEMENTS to the CLIPBOARD. Not what I want! I’m always doing the same thing, over and over again. And I never go to FILE – that’s too permanent for my type of work. I either want stuff to the clipboard or to the worksheet. Surely there’s a way to tell SQL Developer how to behave? Oh yeah, check the preferences So you can set the defaults for this dialog. Go to: Tools – Preferences – Database – Utilities – Export Now I will always start with ‘INSERT’ and ‘Clipboard’ – woohoo! Now, I can also go INTO the preferences for each of the different formats to save me a few more clicks. I prefer pointy hats (^) for my delimiters, don’t you? So, spend a few minutes and set each of these to what you’re normally doing and save yourself a bunch of time going forward.

    Read the article

  • SQL Server 2008 R2 Service Pack 2 CTP is available

    - by AaronBertrand
    You can download the Service Pack 2 CTP from the following URL: http://www.microsoft.com/en-us/download/details.aspx?id=29848 The build # is 10.50.3720. This service pack contains all of the fixes from Service Pack 1 & Cumulative Updates 1 through 5, and a couple of other minor fixes (a couple of SSRS bugs and a bug about an ALTER TABLE batch not being cached correctly). It does not include fixes from Service Pack 1 Cumulative Update #6, which I mentioned recently . You should *NOT* install this...(read more)

    Read the article

  • Oracle Database 12c By Example – SQL Developer and Multitenant

    - by thatjeffsmith
    As you may have heard, Oracle Database 12c is now available. In addition to the binaries and docs going out, we also published a few new Oracle By Example (OBE) chapters. You can find those links here on our product page. Do you know who found these, practically the minute they were published? An enterprising DBA-extraordinaire who was just happening to be presenting at the ODTUG KScope13 conference in New Orleans. He thought it would be a good idea to download the new software over a hotel WIFI, install and create a new multitenant database, watch a few OBEs, and then demo that live for his ‘SQL Developer for DBAs‘ session. Pretty crazy, right? Well, he did it, and I was there to watch. Way cool. You can listen to @leight0nn tell his story in his own words via this ODTUG interview with @oraclenered. In case you’re too giddy to sit through the video, I’ll give you a preview – he succesfully cloned a pluggable database in about a minute with only a couple of clicks using Oracle SQL Developer 3.2.20.09 while connected to a 12c database.

    Read the article

  • SQL Server Add Primary Key

    - by Derek D.
    Adding a primary key can be done either after a table is created, or at the same a table is created. It is important to note, that by default a primary key is clustered. This may or may not be the preferred method of creation. For more information on clustered vs non [...]

    Read the article

  • SQL Server 2000 need to prevent logons whilst performing a backup for a side by side migration

    - by pigeon
    I'm looking for a way to prevent logons from occurring in order to take a full backup of a Database to migrate from its current SQL Server 2000 instance to a new SQL 2005 instance. A friend of mine suggested running a script which would put the DB into a rollback state. Not being a DBA my DDL is very poor and running a script that I don't understand may not be the best idea. One option which might be easier is to simply detach and copy, to the new server. Any suggestions would be greatly appreciated.

    Read the article

  • High Performance SQL Views Using WITH(NOLOCK)

    - by gt0084e1
    Every now and then you find a simple way to make everything much faster. We often find customers creating data warehouses or OLAP cubes even though they have a relatively small amount of data (a few gigs) compared to their server memory. If you have more server memory than the size of your database or working set, nearly any aggregate query should run in a second or less. In some situations there may be high traffic on from the transactional application and SQL server may wait for several other queries to run before giving you your results. The purpose of this is make sure you don’t get two versions of the truth. In an ATM system, you want to give the bank balance after the withdrawal, not before or you may get a very unhappy customer. So by default databases are rightly very conservative about this kind of thing. Unfortunately this split-second precision comes at a cost. The performance of the query may not be acceptable by today’s standards because the database has to maintain locks on the server. Fortunately, SQL Server gives you a simple way to ask for the current version of the data without the pending transactions. To better facilitate reporting, you can create a view that includes these directives. CREATE VIEW CategoriesAndProducts AS SELECT * FROM dbo.Categories WITH(NOLOCK) INNER JOIN dbo.Products WITH(NOLOCK) ON dbo.Categories.CategoryID = dbo.Products.CategoryID In some cases quires that are taking minutes end up taking seconds. Much easier than moving the data to a separate database and it’s still pretty much real time give or take a few milliseconds. You’ve been warned not to use this for bank balances though. More from Data Stream

    Read the article

  • A different interface for the Sql Server Reporting Service?

    - by AngryHacker
    I have a SQL Server 2005 SQL Reporting Services implementation. It seems that the only way to actually access the reports is for the users to use Internet Explorer. The web page uses an ActiveX control to do its printing (and probably other functions as well). Does SSRS have a different way to access its functionality via the web browser? Like maybe Java or HTML based? If so, how do I actually turn it on? The reason I am asking is because the security is being tightened and ActiveX controls will be banished, thus the users won't be able to print.

    Read the article

  • SQL Server Manageability Series: how to change the default path of .cache files of a data collector? #sql #mdw #dba

    - by ssqa.net
    How to change the default path of .cache files of a data collector after the Management Data Warehouse (MDW has been setup? This was the question asked by one of the DBAs in a client's place, instantly I enquired that were there any folder specified while setting up the MDW and obvious answer was no as there were left default. This means all the .CACHE files are stored under %C\TEMP directory which may post out of disk space problem on the server where the MDW is setup to collect. Going back...(read more)

    Read the article

  • Q&amp;A: Will my favourite ORM Foo work with SQL Azure?

    - by Eric Nelson
    short answer: Quite probably, as SQL Azure is very similar to SQL Server longer answer: Object Relational Mappers (ORMs) that work with SQL Server are likely but not guaranteed to work with SQL Azure. The differences between the RDBMS versions are small – but may cause problems, for example in tools used to create the mapping between objects and tables or in generated SQL from the ORM which expects “certain things” :-) More specifically: ADO.NET Entity Framework / LINQ to Entities can be used with SQL Azure, but the Visual Studio designer does not currently work. You will need to point the designer at a version of your database running of SQL Server to create the mapping, then change the connection details to run against SQL Azure. LINQ to SQL has similar issues to ADO.NET Entity Framework above NHibernate can be used against SQL Azure DevExpress XPO supports SQL Azure from version 9.3 DataObjects.Net supports SQL Azure Open Access from Telerik works “seamlessly”  - their words not mine :-) The list above is by no means comprehensive – please leave a comment with details of other ORMs that work (or do not work) with SQL Azure. Related Links: General guidelines and limitations of SQL Azure SQL Azure vs SQL Server

    Read the article

  • A little on speaking and evaluations...

    - by AaronBertrand
    Buck Woody ( blog | twitter ) just published a great post on session evaluations , and a lot of his points hit home for me. The premise is that the evaluations are not really meant for the attendee or the event organizers, but so that the speaker can get better and make the next session better. In light of this, at least in my opinion, the existing evaluation forms (and the way attendees tend to fill them out) do not achieve this at all. It may be a little more work for events to generate a more...(read more)

    Read the article

  • Denali CTP3 - Semantic Search 2 (Lots of documents)

    - by sqlartist
    Hi again, I thought I would improve on the previous post by actually putting a decent about of content into the Filetable - this time I used the opensource DMOZ Health document repository which contains 5,880 files inside 220 folders. The files are all html and are pretty small in size. The entire document collection is about 120Mb unzipped and 30Mb zipped. If any one is interested in testing this collection drop me a note and I will upload the dmoz_health repository archive to Skydrive. This time...(read more)

    Read the article

  • T-SQL Tuesday #19: Blind Spots

    - by merrillaldrich
    A while ago I wrote a post, Visualize Disaster , prompted by a real incident we had at my office. Fortunately we came through it OK from a business point of view, but I took away an important lesson: it’s very easy, whether your organization and your team is savvy about disaster recovery or not, to have significant blind spots with regard to recovery in the face of some large, unexpected outage. We have very clear direction and decent budgets to work with, and the safety and recoverability of applications...(read more)

    Read the article

< Previous Page | 135 136 137 138 139 140 141 142 143 144 145 146  | Next Page >