Search Results

Search found 1024 results on 41 pages for 'dba'.

Page 6/41 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • Free SQL Server training? Now you’re talking.

    - by Fatherjack
    SQL Server user groups are everywhere, literally all over the globe there are SQL Server professionals meeting on a regular basis, sharing ideas, solving problems, learning about how to do new stuff and new ways to do old stuff and it’s all for free. I don’t have detailed figures but of all the SQL Server professionals there are only a small number of them attend these user groups. Those people are the people that are taking the time and making then effort to make themselves better at their chosen trade, more employable and having a good time. For free. I don’t know why but there are many people that don’t seem to want to be the best they can be. Some of you enlightened people that do already attend could be doing more though. Have you ever spoken at  your group? Not just in the break while you have a mouthful of pizza and a drink in your hand but had the attention of the whole group listen to you speak. It doesn’t need to be a full hour, it doesn’t need to be some obscure deeply technical demonstration of SQL Server internals, just a few minutes on something that you do that might help other people with their daily work. A neat process that helps you get from Problem A to Solution B. There is no need to get concerned that becoming a speaker means that you suddenly have to know more than anyone else in the room. This is you talking about something that you experienced. What you did, what you would repeat, what you might do differently next time. No one in the audience can pick you up on a technicality. If someone comes out with a great idea that you hadn’t thought of, say “That’s a great idea, I didn’t think of that while we had the problem on our hands. I’ll try to remember that for next time”. If someone is looking to show you up for picking the wrong decision (and this, in my experience, is very uncommon indeed) then you simply give a reply like “Well, at the time we chose that option. Perhaps another time then we would tackle things differently but we were happy with how our solution worked”. It’s sharing things like this that makes user groups have a real value, talking about how you coped with or averted a disaster, a handy little section of code or using a tool in a particular way that you take for granted that might, just might, be something that other people haven’t thought of that solves a problem or saves some time for them. At the next meeting you might get the same benefit from a different person and so it goes on. As individuals benefits so the community benefits. For free. Things I encourage you to do; If you are a chapter or user group leader; encourage someone from your group who has never spoken before to start speaking. If you are a chapter or user group attendee that hasn’t spoken before; speak for at least 5 minutes on something related to SQL Server at any group meeting. If you don’t currently attend a user group; please go along to you nearest one when they are meeting next and invest in yourself and your future. UK user group details are here: http://sqlsouthwest.co.uk/national_ug.htm , PASS chapters outside the UK are found via http://www.sqlpass.org/PASSChapters/LocalChapters.aspx. If you are unsure of how you might achieve any of these things then get in touch with me*, I’ll give you specific advice on getting started on any of the above points and help you prove to yourself what you are capable of. SQL Community – be part of it and make it better. Let me know how you get on in the comments.

    Read the article

  • Troubleshooting High-CPU Utilization for SQL Server

    - by Susantha Bathige
    The objective of this FAQ is to outline the basic steps in troubleshooting high CPU utilization on  a server hosting a SQL Server instance. The first and the most common step if you suspect high CPU utilization (or are alerted for it) is to login to the physical server and check the Windows Task Manager. The Performance tab will show the high utilization as shown below: Next, we need to determine which process is responsible for the high CPU consumption. The Processes tab of the Task Manager will show this information: Note that to see all processes you should select Show processes from all user. In this case, SQL Server (sqlserver.exe) is consuming 99% of the CPU (a normal benchmark for max CPU utilization is about 50-60%). Next we examine the scheduler data. Scheduler is a component of SQLOS which evenly distributes load amongst CPUs. The query below returns the important columns for CPU troubleshooting. Note – if your server is under severe stress and you are unable to login to SSMS, you can use another machine’s SSMS to login to the server through DAC – Dedicated Administrator Connection (see http://msdn.microsoft.com/en-us/library/ms189595.aspx for details on using DAC) SELECT scheduler_id ,cpu_id ,status ,runnable_tasks_count ,active_workers_count ,load_factor ,yield_count FROM sys.dm_os_schedulers WHERE scheduler_id See below for the BOL definitions for the above columns. scheduler_id – ID of the scheduler. All schedulers that are used to run regular queries have ID numbers less than 1048576. Those schedulers that have IDs greater than or equal to 1048576 are used internally by SQL Server, such as the dedicated administrator connection scheduler. cpu_id – ID of the CPU with which this scheduler is associated. status – Indicates the status of the scheduler. runnable_tasks_count – Number of workers, with tasks assigned to them that are waiting to be scheduled on the runnable queue. active_workers_count – Number of workers that are active. An active worker is never preemptive, must have an associated task, and is either running, runnable, or suspended. current_tasks_count - Number of current tasks that are associated with this scheduler. load_factor – Internal value that indicates the perceived load on this scheduler. yield_count – Internal value that is used to indicate progress on this scheduler.                                                                 Now to interpret the above data. There are four schedulers and each assigned to a different CPU. All the CPUs are ready to accept user queries as they all are ONLINE. There are 294 active tasks in the output as per the current_tasks_count column. This count indicates how many activities currently associated with the schedulers. When a  task is complete, this number is decremented. The 294 is quite a high figure and indicates all four schedulers are extremely busy. When a task is enqueued, the load_factor  value is incremented. This value is used to determine whether a new task should be put on this scheduler or another scheduler. The new task will be allocated to less loaded scheduler by SQLOS. The very high value of this column indicates all the schedulers have a high load. There are 268 runnable tasks which mean all these tasks are assigned a worker and waiting to be scheduled on the runnable queue.   The next step is  to identify which queries are demanding a lot of CPU time. The below query is useful for this purpose (note, in its current form,  it only shows the top 10 records). SELECT TOP 10 st.text  ,st.dbid  ,st.objectid  ,qs.total_worker_time  ,qs.last_worker_time  ,qp.query_plan FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp ORDER BY qs.total_worker_time DESC This query as total_worker_time as the measure of CPU load and is in descending order of the  total_worker_time to show the most expensive queries and their plans at the top:      Note the BOL definitions for the important columns: total_worker_time - Total amount of CPU time, in microseconds, that was consumed by executions of this plan since it was compiled. last_worker_time - CPU time, in microseconds, that was consumed the last time the plan was executed.   I re-ran the same query again after few seconds and was returned the below output. After few seconds the SP dbo.TestProc1 is shown in fourth place and once again the last_worker_time is the highest. This means the procedure TestProc1 consumes a CPU time continuously each time it executes.      In this case, the primary cause for high CPU utilization was a stored procedure. You can view the execution plan by clicking on query_plan column to investigate why this is causing a high CPU load. I have used SQL Server 2008 (SP1) to test all the queries used in this article.

    Read the article

  • Backup Meta-Data

    - by BuckWoody
    I'm working on a PowerShell script to show me the trending durations of my backup activities. The first thing I need is the data, so I looked at the Standard Reports in SQL Server Management Studio, and found a report that suited my needs, so I pulled out the script that it runs and modified it to this T-SQL Script. A few words here - you need to be in the MSDB database for this to run, and you can add a WHERE clause to limit to a database, timeframe, type of backup, whatever. For that matter, I won't use all of the data in this query in my PowerShell script, but it gives me lots of avenues to graph: SELECT distinct t1.name AS 'DatabaseName' ,(datediff( ss,  t3.backup_start_date, t3.backup_finish_date)) AS 'DurationInSeconds' ,t3.user_name AS 'UserResponsible' ,t3.name AS backup_name ,t3.description ,t3.backup_start_date ,t3.backup_finish_date ,CASE WHEN t3.type = 'D' THEN 'Database' WHEN t3.type = 'L' THEN 'Log' WHEN t3.type = 'F' THEN 'FileOrFilegroup' WHEN t3.type = 'G' THEN 'DifferentialFile' WHEN t3.type = 'P' THEN 'Partial' WHEN t3.type = 'Q' THEN 'DifferentialPartial' END AS 'BackupType' ,t3.backup_size AS 'BackupSizeKB' ,t6.physical_device_name ,CASE WHEN t6.device_type = 2 THEN 'Disk' WHEN t6.device_type = 102 THEN 'Disk' WHEN t6.device_type = 5 THEN 'Tape' WHEN t6.device_type = 105 THEN 'Tape' END AS 'DeviceType' ,t3.recovery_model  FROM sys.databases t1 INNER JOIN backupset t3 ON (t3.database_name = t1.name )  LEFT OUTER JOIN backupmediaset t5 ON ( t3.media_set_id = t5.media_set_id ) LEFT OUTER JOIN backupmediafamily t6 ON ( t6.media_set_id = t5.media_set_id ) ORDER BY backup_start_date DESC I'll munge this into my Excel PowerShell chart script tomorrow. Script Disclaimer, for people who need to be told this sort of thing: Never trust any script, including those that you find here, until you understand exactly what it does and how it will act on your systems. Always check the script on a test system or Virtual Machine, not a production system. Yes, there are always multiple ways to do things, and this script may not work in every situation, for everything. It’s just a script, people. All scripts on this site are performed by a professional stunt driver on a closed course. Your mileage may vary. Void where prohibited. Offer good for a limited time only. Keep out of reach of small children. Do not operate heavy machinery while using this script. If you experience blurry vision, indigestion or diarrhea during the operation of this script, see a physician immediately. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Backup Meta-Data

    - by BuckWoody
    I'm working on a PowerShell script to show me the trending durations of my backup activities. The first thing I need is the data, so I looked at the Standard Reports in SQL Server Management Studio, and found a report that suited my needs, so I pulled out the script that it runs and modified it to this T-SQL Script. A few words here - you need to be in the MSDB database for this to run, and you can add a WHERE clause to limit to a database, timeframe, type of backup, whatever. For that matter, I won't use all of the data in this query in my PowerShell script, but it gives me lots of avenues to graph: SELECT distinct t1.name AS 'DatabaseName' ,(datediff( ss,  t3.backup_start_date, t3.backup_finish_date)) AS 'DurationInSeconds' ,t3.user_name AS 'UserResponsible' ,t3.name AS backup_name ,t3.description ,t3.backup_start_date ,t3.backup_finish_date ,CASE WHEN t3.type = 'D' THEN 'Database' WHEN t3.type = 'L' THEN 'Log' WHEN t3.type = 'F' THEN 'FileOrFilegroup' WHEN t3.type = 'G' THEN 'DifferentialFile' WHEN t3.type = 'P' THEN 'Partial' WHEN t3.type = 'Q' THEN 'DifferentialPartial' END AS 'BackupType' ,t3.backup_size AS 'BackupSizeKB' ,t6.physical_device_name ,CASE WHEN t6.device_type = 2 THEN 'Disk' WHEN t6.device_type = 102 THEN 'Disk' WHEN t6.device_type = 5 THEN 'Tape' WHEN t6.device_type = 105 THEN 'Tape' END AS 'DeviceType' ,t3.recovery_model  FROM sys.databases t1 INNER JOIN backupset t3 ON (t3.database_name = t1.name )  LEFT OUTER JOIN backupmediaset t5 ON ( t3.media_set_id = t5.media_set_id ) LEFT OUTER JOIN backupmediafamily t6 ON ( t6.media_set_id = t5.media_set_id ) ORDER BY backup_start_date DESC I'll munge this into my Excel PowerShell chart script tomorrow. Script Disclaimer, for people who need to be told this sort of thing: Never trust any script, including those that you find here, until you understand exactly what it does and how it will act on your systems. Always check the script on a test system or Virtual Machine, not a production system. Yes, there are always multiple ways to do things, and this script may not work in every situation, for everything. It’s just a script, people. All scripts on this site are performed by a professional stunt driver on a closed course. Your mileage may vary. Void where prohibited. Offer good for a limited time only. Keep out of reach of small children. Do not operate heavy machinery while using this script. If you experience blurry vision, indigestion or diarrhea during the operation of this script, see a physician immediately. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • PowerShell PowerPack Download

    - by BuckWoody
    I read Jeffery Hicks’ article in this month’s Redmond Magazine on a new add-in for Windows PowerShell 2.0. It’s called the PowerShell Pack and it has a some great new features that I plan to put into place on my production systems as soon as I finished learning and testing them. You can download the pack here if you have PowerShell 2.0. I’m having a lot of fun with it, and I’ll blog about what I’m learning here in the near future, but you should check it out. The only issue I have with it right now is that you have to load a module and then use get-help to find out what it does, because I haven’t found a lot of other documentation so far. The most interesting modules for me are the ones that can run a command elevated (in PSUserTools), the task scheduling commands (in TaskScheduler) and the file system checks and tools (in FileSystem). There’s also a way to create simple Graphical User Interface panels (in ). I plan to string all these together to install a management set of tools on my SQL Server Express Instances, giving the user “task buttons” to backup or restore a database, add or delete users and so on. Yes, I’ll be careful, and yes, I’ll make sure the user is allowed to do that. For now, I’m testing the download, but I thought I would share what I’m up to. If you have PowerShell 2.0 and you download the pack, let me know how you use it. Script Disclaimer, for people who need to be told this sort of thing: Never trust any script, including those that you find here, until you understand exactly what it does and how it will act on your systems. Always check the script on a test system or Virtual Machine, not a production system. Yes, there are always multiple ways to do things, and this script may not work in every situation, for everything. It’s just a script, people. All scripts on this site are performed by a professional stunt driver on a closed course. Your mileage may vary. Void where prohibited. Offer good for a limited time only. Keep out of reach of small children. Do not operate heavy machinery while using this script. If you experience blurry vision, indigestion or diarrhea during the operation of this script, see a physician immediately. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Starting this week: Dublin, Maidenhead, and London

    - by KKline
    This might be most most overcommitted four-week period of time ever in my life. I’m tired just thinking about it! Not only am I traveling internationally and speaking over the next few weeks, I’m also helping on two book projects, learning some new applications from Quest Software, and helping on a small Transact-SQL refactoring project. Swag on hand? I’ve got a special printing of 500 video training DVDs for this trip: SQL Server Training on DMVs Performance Monitor and Wait Events Plus, I’ll have...(read more)

    Read the article

  • Open the SQL Server Error Log with PowerShell

    - by BuckWoody
    Using the Server Management Objects (SMO) library, you don’t even need to have the SQL Server 2008 PowerShell Provider to read the SQL Server Error Logs – in fact, you can use regular old everyday PowerShell. Keep in mind you will need the SMO libraries – which can be installed separately or by installing the Client Tools from the SQL Server install media. You could search for errors, store a result as a variable, or act on the returned values in some other way. Replace the Machine Name with your server and Instance Name with your instance, but leave the quotes, to make this work on your system: [reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") $machineName = "UNIVAC" $instanceName = "Production" $sqlServer = new-object ("Microsoft.SqlServer.Management.Smo.Server") "$machineName\$instanceName" $sqlServer.ReadErrorLog() Want to search for something specific, like the word “Error”? Replace the last line with this: $sqlServer.ReadErrorLog() | where {$_.Text -like "Error*"} Script Disclaimer, for people who need to be told this sort of thing: Never trust any script, including those that you find here, until you understand exactly what it does and how it will act on your systems. Always check the script on a test system or Virtual Machine, not a production system. Yes, there are always multiple ways to do things, and this script may not work in every situation, for everything. It’s just a script, people. All scripts on this site are performed by a professional stunt driver on a closed course. Your mileage may vary. Void where prohibited. Offer good for a limited time only. Keep out of reach of small children. Do not operate heavy machinery while using this script. If you experience blurry vision, indigestion or diarrhea during the operation of this script, see a physician immediately. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Importing data from text file to specific columns using BULK INSERT

    - by Dinesh Asanka
    Bulk insert is much faster than using other techniques such as  SSIS. However, when you are using bulk insert you can’t insert to specific columns. If, for example, there are five columns in a table you should have five values for each record in the text file you are importing from. This is an issue when you are expecting default values to be inserted into tables. Let us say you have table as below: In this table, you are expecting ID, Status and CreatedDate to be updated automatically, so your text file may only have   FirstName  LastName  values as below: Dinesh,Asanka Saman,Liyanage Ruwan,Silva Susantha,Bathige Jude,Peires Sanjeewa,Jayawickrama If you use bulk insert to this table like follows, You will be returned an error: Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (ID). To avoid this you will need to create a view with the columns you are expecting to fill and use bulk insert against it. If you check the table now, you will see table with values in the text file and the default values.

    Read the article

  • Why "Tailoring" Your Resume Is Bad

    - by Mike C
    I was just writing a response to a comment on my "Sell Yourself!" presentation ( http://sqlblog.com/blogs/michael_coles/archive/2010/12/05/sell-yourself-presentation.aspx#comments ), and it started getting a little lengthy so I decided to turn it into a blog post. The "Sell Yourself!" post got a couple of very good comments on the blog, and quite a few more comments offline. I think I'll start this one with a great exchange from the movie "The Princess Bride": Vizzini: HE DIDN'T FALL? INCONCEIVABLE....(read more)

    Read the article

  • New T-SQL Features in SQL Server 2011

    - by Divya Agrawal
    SQL Server 2011 (or Denali) CTP is now available and can be downloaded at http://www.microsoft.com/downloads/en/details.aspx?FamilyID=6a04f16f-f6be-4f92-9c92-f7e5677d91f9&displaylang=en SQL Server 2011 has several major enhancements including a new look for SSMS. SSMS is now   similar to Visual Studio   with greatly improved Intellisense support. This article we will focus on the T-SQL Enhancements in SQL Server 2011. The main [...]

    Read the article

  • Implementing Database Settings Using Policy Based Management

    - by Ashish Kumar Mehta
    Introduction Database Administrators have always had a tough time to ensuring that all the SQL Servers administered by them are configured according to the policies and standards of organization. Using SQL Server’s  Policy Based Management feature DBAs can now manage one or more instances of SQL Server 2008 and check for policy compliance issues. In this article we will utilize Policy Based Management (aka Declarative Management Framework or DMF) feature of SQL Server to implement and verify database settings on all production databases. It is best practice to enforce the below settings on each Production database. However, it can be tedious to go through each database and then check whether the below database settings are implemented across databases. In this article I will explain it to you how to utilize the Policy Based Management Feature of SQL Server 2008 to create a policy to verify these settings on all databases and in cases of non-complaince how to bring them back into complaince. Database setting to enforce on each user database : Auto Close and Auto Shrink Properties of database set to False Auto Create Statistics and Auto Update Statistics set to True Compatibility Level of all the user database set as 100 Page Verify set as CHECKSUM Recovery Model of all user database set to Full Restrict Access set as MULTI_USER Configure a Policy to Verify Database Settings 1. Connect to SQL Server 2008 Instance using SQL Server Management Studio 2. In the Object Explorer, Click on Management > Policy Management and you will be able to see Policies, Conditions & Facets as child nodes 3. Right click Policies and then select New Policy…. from the drop down list as shown in the snippet below to open the  Create New Policy Popup window. 4. In the Create New Policy popup window you need to provide the name of the policy as “Implementing and Verify Database Settings for Production Databases” and then click the drop down list under Check Condition. As highlighted in the snippet below click on the New Condition… option to open up the Create New Condition window. 5. In the Create New Condition popup window you need to provide the name of the condition as “Verify and Change Database Settings”. In the Facet drop down list you need to choose the Facet as Database Options as shown in the snippet below. Under Expression you need to select Field value as @AutoClose and then choose Operator value as ‘ = ‘ and finally choose Value as False. Now that you have successfully added the first field you can now go ahead and add rest of the fields as shown in the snippet below. Once you have successfully added all the above shown fields of Database Options Facet, click OK to save the changes and to return to the parent Create New Policy – Implementing and Verify Database Settings for Production Database windows where you will see that the newly created condition “Verify and Change Database Settings” is selected by default. Continues…

    Read the article

  • Where is the SQL Azure Development Environment

    - by BuckWoody
    Recently I posted an entry explaining that you can develop in Windows Azure without having to connect to the main service on the Internet, using the Software Development Kit (SDK) which installs two emulators - one for compute and the other for storage. That brought up the question of the same kind of thing for SQL Azure. The short answer is that there isn’t one. While we’ll make the development experience for all versions of SQL Server, including SQL Azure more easy to write against, you can simply treat it as another edition of SQL Server. For instance, many of us use the SQL Server Developer Edition - which in versions up to 2008 is actually the Enterprise Edition - to develop our code. We might write that code against all kinds of environments, from SQL Express through Enterprise Edition. We know which features work on a certain edition, what T-SQL it supports and so on, and develop accordingly. We then test on the actual platform to ensure the code runs as expected. You can simply fold SQL Azure into that same development process. When you’re ready to deploy, if you’re using SQL Server Management Studio 2008 R2 or higher, you can script out the database when you’re done as a SQL Azure script (with change notifications where needed) by selecting the right “Engine Type” on the scripting panel: (Thanks to David Robinson for pointing this out and my co-worker Rick Shahid for the screen-shot - saved me firing up a VM this morning!) Will all this change? Will SSMS, “Data Dude” and other tools change to include SQL Azure? Well, I don’t have a specific roadmap for those tools, but we’re making big investments on Windows Azure and SQL Azure, so I can say that as time goes on, it will get easier. For now, make sure you know what features are and are not included in SQL Azure, and what T-SQL is supported. Here are a couple of references to help: General Guidelines and Limitations: http://msdn.microsoft.com/en-us/library/ee336245.aspx Transact-SQL Supported by SQL Azure: http://msdn.microsoft.com/en-us/library/ee336250.aspx SQL Azure Learning Plan: http://blogs.msdn.com/b/buckwoody/archive/2010/12/13/windows-azure-learning-plan-sql-azure.aspx

    Read the article

  • Where is the SQL Azure Development Environment

    - by BuckWoody
    Recently I posted an entry explaining that you can develop in Windows Azure without having to connect to the main service on the Internet, using the Software Development Kit (SDK) which installs two emulators - one for compute and the other for storage. That brought up the question of the same kind of thing for SQL Azure. The short answer is that there isn’t one. While we’ll make the development experience for all versions of SQL Server, including SQL Azure more easy to write against, you can simply treat it as another edition of SQL Server. For instance, many of us use the SQL Server Developer Edition - which in versions up to 2008 is actually the Enterprise Edition - to develop our code. We might write that code against all kinds of environments, from SQL Express through Enterprise Edition. We know which features work on a certain edition, what T-SQL it supports and so on, and develop accordingly. We then test on the actual platform to ensure the code runs as expected. You can simply fold SQL Azure into that same development process. When you’re ready to deploy, if you’re using SQL Server Management Studio 2008 R2 or higher, you can script out the database when you’re done as a SQL Azure script (with change notifications where needed) by selecting the right “Engine Type” on the scripting panel: (Thanks to David Robinson for pointing this out and my co-worker Rick Shahid for the screen-shot - saved me firing up a VM this morning!) Will all this change? Will SSMS, “Data Dude” and other tools change to include SQL Azure? Well, I don’t have a specific roadmap for those tools, but we’re making big investments on Windows Azure and SQL Azure, so I can say that as time goes on, it will get easier. For now, make sure you know what features are and are not included in SQL Azure, and what T-SQL is supported. Here are a couple of references to help: General Guidelines and Limitations: http://msdn.microsoft.com/en-us/library/ee336245.aspx Transact-SQL Supported by SQL Azure: http://msdn.microsoft.com/en-us/library/ee336250.aspx SQL Azure Learning Plan: http://blogs.msdn.com/b/buckwoody/archive/2010/12/13/windows-azure-learning-plan-sql-azure.aspx

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • System Variables, Stored Procedures or Functions for Meta Data

    - by BuckWoody
    Whenever you want to know something about SQL Server’s configuration, whether that’s the Instance itself or a database, you have a few options. If you want to know “dynamic” data, such as how much memory or CPU is consumed or what a particular query is doing, you should be using the Dynamic Management Views (DMVs) that you can read about here: http://msdn.microsoft.com/en-us/library/ms188754.aspx  But if you’re looking for how much memory is installed on the server, the version of the Instance, the drive letters of the backups and so on, you have other choices. The first of these are system variables. You access these with a SELECT statement, and they are useful when you need a discrete value for use, say in another query or to put into a table. You can read more about those here: http://msdn.microsoft.com/en-us/library/ms173823.aspx You also have a few stored procedures you can use. These often bring back a lot more data, pre-formatted for the screen. You access these with the EXECUTE syntax. It is a bit more difficult to take the data they return and get a single value or place the results in another table, but it is possible. You can read more about those here: http://msdn.microsoft.com/en-us/library/ms187961.aspx Yet another option is to use a system function, which you access with a SELECT statement, which also brings back a discrete value that you can use in a test or to place in another table. You can read about those here: http://msdn.microsoft.com/en-us/library/ms187812.aspx  By the way, many of these constructs simply query from tables in the master or msdb databases for the Instance or the system tables in a user database. You can get much of the information there as well, and there are even system views in each database to show you the meta-data dealing with structure – more on that here: http://msdn.microsoft.com/en-us/library/ms186778.aspx  Some of these choices are the only way to get at a certain piece of data. But others overlap – you can use one or the other, they both come back with the same data. So, like many Microsoft products, you have multiple ways to do the same thing. And that’s OK – just research what each is used for and how it’s intended to be used, and you’ll be able to select (pun intended) the right choice. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Open the SQL Server Error Log with PowerShell

    - by BuckWoody
    Using the Server Management Objects (SMO) library, you don’t even need to have the SQL Server 2008 PowerShell Provider to read the SQL Server Error Logs – in fact, you can use regular old everyday PowerShell. Keep in mind you will need the SMO libraries – which can be installed separately or by installing the Client Tools from the SQL Server install media. You could search for errors, store a result as a variable, or act on the returned values in some other way. Replace the Machine Name with your server and Instance Name with your instance, but leave the quotes, to make this work on your system: [reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") $machineName = "UNIVAC" $instanceName = "Production" $sqlServer = new-object ("Microsoft.SqlServer.Management.Smo.Server") "$machineName\$instanceName" $sqlServer.ReadErrorLog() Want to search for something specific, like the word “Error”? Replace the last line with this: $sqlServer.ReadErrorLog() | where {$_.Text -like "Error*"} Script Disclaimer, for people who need to be told this sort of thing: Never trust any script, including those that you find here, until you understand exactly what it does and how it will act on your systems. Always check the script on a test system or Virtual Machine, not a production system. Yes, there are always multiple ways to do things, and this script may not work in every situation, for everything. It’s just a script, people. All scripts on this site are performed by a professional stunt driver on a closed course. Your mileage may vary. Void where prohibited. Offer good for a limited time only. Keep out of reach of small children. Do not operate heavy machinery while using this script. If you experience blurry vision, indigestion or diarrhea during the operation of this script, see a physician immediately. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Whether to use UNION or OR in SQL Server Queries

    - by Dinesh Asanka
    Recently I came across with an article on DB2 about using Union instead of OR. So I thought of carrying out a research on SQL Server on what scenarios UNION is optimal in and which scenarios OR would be best. I will analyze this with a few scenarios using samples taken  from the AdventureWorks database Sales.SalesOrderDetail table. Scenario 1: Selecting all columns So we are going to select all columns and you have a non-clustered index on the ProductID column. --Query 1 : OR SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR ProductID =709 OR ProductID =998 OR ProductID =875 OR ProductID =976 OR ProductID =874 --Query 2 : UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 709 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 998 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 875 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 976 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 874 So query 1 is using OR and the later is using UNION. Let us analyze the execution plans for these queries. Query 1 Query 2 As expected Query 1 will use Clustered Index Scan but Query 2, uses all sorts of things. In this case, since it is using multiple CPUs you might have CX_PACKET waits as well. Let’s look at the profiler results for these two queries: CPU Reads Duration Row Counts OR 78 1252 389 3854 UNION 250 7495 660 3854 You can see from the above table the UNION query is not performing well as the  OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used. Scenario 2: Non-Clustered and Clustered Index Columns only --Query 1 : OR SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR ProductID =709 OR ProductID =998 OR ProductID =875 OR ProductID =976 OR ProductID =874 GO --Query 2 : UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 709 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 998 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 875 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 976 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 874 GO So this time, we will be selecting only index columns, which means these queries will avoid a data page lookup. As in the previous case we will analyze the execution plans: Query 1 Query 2 Again, Query 2 is more complex than Query 1. Let us look at the profile analysis: CPU Reads Duration Row Counts OR 0 24 208 3854 UNION 0 38 193 3854 In this analyzis, there is only slight difference between OR and UNION. Scenario 3: Selecting all columns for different fields Up to now, we were using only one column (ProductID) in the where clause.  What if we have two columns for where clauses and let us assume both are covered by non-clustered indexes? --Query 1 : OR SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR CarrierTrackingNumber LIKE 'D0B8%' --Query 2 : UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT * FROM Sales.SalesOrderDetail WHERE CarrierTrackingNumber  LIKE 'D0B8%' Query 1 Query 2: As we can see, the query plan for the second query has improved. Let us see the profiler results. CPU Reads Duration Row Counts OR 47 1278 443 1228 UNION 31 1334 400 1228 So in this case too, there is little difference between OR and UNION. Scenario 4: Selecting Clustered index columns for different fields Now let us go only with clustered indexes: --Query 1 : OR SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR CarrierTrackingNumber LIKE 'D0B8%' --Query 2 : UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT * FROM Sales.SalesOrderDetail WHERE CarrierTrackingNumber  LIKE 'D0B8%' Query 1 Query 2 Now both execution plans are almost identical except is an additional Stream Aggregate is used in the first query. This means UNION has advantage over OR in this scenario. Let us see profiler results for these queries again. CPU Reads Duration Row Counts OR 0 319 366 1228 UNION 0 50 193 1228 Now see the differences, in this scenario UNION has somewhat of an advantage over OR. Conclusion Using UNION or OR depends on the scenario you are faced with. So you need to do your analyzing before selecting the appropriate method. Also, above the four scenarios are not all an exhaustive list of scenarios, I selected those for the broad description purposes only.

    Read the article

  • Third Party Applications and Other Acts of Violence Against Your SQL Server

    - by KKline
    I just got finished reading a great blog post from my buddy, Thomas LaRock ( t | b ), in which he describes a useful personal policy he used to track changes made to his SQL Servers when installing third-party products. Note that I'm talking about line-of-business applications here - your inventory management systems and help desk ticketing apps. I'm not talking about monitoring and tuning applications since they, by their very nature, need a different sort of access to your back-end server resources....(read more)

    Read the article

  • Back up a single table in SQL Server

    - by BuckWoody
    SQL Server doesn’t have an easy way to take a table backup, so I often use the bcp (Bulk Copy Program) to accomplish the same goal. I’ve mentioned this before, and someone told me when they tried it they couldn’t restore the table – ah the dangers of telling people half the information! I should have mentioned that you need to have a “format file” ready if the table does not exist at the destination. In my case I already had the table, in this person’s case they did not. The format file can be used to rebuild that table structure before the data is bcp’d in, and you can read more about it here: http://msdn.microsoft.com/en-us/library/ms191516.aspx There’s another way to back up a table, and that’s to create a Filegroup and place the table there. Then you can take a Filegroup backup to back up a single table. Of course, there are other methods of moving a single table’s data in an out, including SQL Server Integration Services and even the older Data Transformation Services, or simply by using hte SQLCMD or PowerShell utilities to run a query and just save the output to a file. In fact, these days I’m using a PowerShell script to build INSERT statements from that query. That could also easily be modified to create the table structure (or modify one if needed) quite easily. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Check if Database Exists

    - by Derek Dieter
    In creating a database you also need to check whether or not the database already exists. In order to do so, simply use the ‘if exists’ method and select the name of the database from sysdatabases.IF NOT EXISTS (SELECT name FROM master.dbo.sysdatabases WHERE name = N'SQLServerPlanet') CREATE DATABASE [SQLServerPlanet]The code below will drop an [...]

    Read the article

  • Monitor SQL Server Replication Jobs

    - by Yaniv Etrogi
    The Replication infrastructure in SQL Server is implemented using SQL Server Agent to execute the various components involved in the form of a job (e.g. LogReader agent job, Distribution agent job, Merge agent job) SQL Server jobs execute a binary executable file which is basically C++ code. You can download all the scripts for this article here SQL Server Job Schedules By default each of job has only one schedule that is set to Start automatically when SQL Server Agent starts. This schedule ensures that when ever the SQL Server Agent service is started all the replication components are also put into action. This is OK and makes sense but there is one problem with this default configuration that needs improvement  -  if for any reason one of the components fails it remains down in a stopped state.   Unless you monitor the status of each component you will typically get to know about such a failure from a customer complaint as a result of missing data or data that is not up to date at the subscriber level. Furthermore, having any of these components in a stopped state can lead to more severe problems if not corrected within a short time. The action required to improve on this default settings is in fact very simple. Adding a second schedule that is set as a Daily Reoccurring schedule which runs every 1 minute does the trick. SQL Server Agent’s scheduler module knows how to handle overlapping schedules so if the job is already being executed by another schedule it will not get executed again at the same time. So, in the event of a failure the failed job remains down for at most 60 seconds. Many DBAs are not aware of this capability and so search for more complex solutions such as having an additional dedicated job running an external code in VBS or another scripting language that detects replication jobs in a stopped state and starts them but there is no need to seek such external solutions when what is needed can be accomplished by T-SQL code. SQL Server Jobs Status In addition to the 1 minute schedule we also want to ensure that key components in the replication are enabled so I can search for those components by their Category, and set their status to enabled in case they are disabled, by executing the stored procedure MonitorEnableReplicationAgents. The jobs that I typically have handled are listed below but you may want to extend this, so below is the query to return all jobs along with their category. SELECT category_id, name FROM msdb.dbo.syscategories ORDER BY category_id; Distribution Cleanup LogReader Agent Distribution Agent Snapshot Agent Jobs By default when a publication is created, a snapshot agent job also gets created with a daily schedule. I see more organizations where the snapshot agent job does not need to be executed automatically by the SQL Server Agent  scheduler than organizations who   need a new snapshot generated automatically. To assure this setting is in place I created the stored procedure MonitorSnapshotAgentsSchedules which disables snapshot agent jobs and also deletes the job schedule. It is worth mentioning that when the publication property immediate_sync is turned off then the snapshot files are not created when the Snapshot agent is executed by the job. You control this property when the publication is created with a parameter called @immediate_sync passed to sp_addpublication and for an existing publication you can use sp_changepublication. Implementation The scripts assume the existence of a database named PerfDB. Steps: Run the scripts to create the stored procedures in the PerfDB database. Create a job that executes the stored procedures every hour. -- Verify that the 1_Minute schedule exists. EXEC PerfDB.dbo.MonitorReplicationAgentsSchedules @CategoryId = 10; /* Distribution */ EXEC PerfDB.dbo.MonitorReplicationAgentsSchedules @CategoryId = 13; /* LogReader */ -- Verify all replication agents are enabled. EXEC PerfDB.dbo.MonitorEnableReplicationAgents @CategoryId = 10; /* Distribution */ EXEC PerfDB.dbo.MonitorEnableReplicationAgents @CategoryId = 13; /* LogReader */ EXEC PerfDB.dbo.MonitorEnableReplicationAgents @CategoryId = 11; /* Distribution clean up */ -- Verify that Snapshot agents are disabled and have no schedule EXEC PerfDB.dbo.MonitorSnapshotAgentsSchedules; Want to read more of about replication? Check at my replication posts at my blog.

    Read the article

  • Manage and Monitor Identity Ranges in SQL Server Transactional Replication

    - by Yaniv Etrogi
    Problem When using transactional replication to replicate data in a one way topology from a publisher to a read-only subscriber(s) there is no need to manage identity ranges. However, when using  transactional replication to replicate data in a two way replication topology - between two or more servers there is a need to manage identity ranges in order to prevent a situation where an INSERT commands fails on a PRIMARY KEY violation error  due to the replicated row being inserted having a value for the identity column which already exists at the destination database. Solution There are two ways to address this situation: Assign a range of identity values per each server. Work with parallel identity values. The first method requires some maintenance while the second method does not and so the scripts provided with this article are very useful for anyone using the first method. I will explore this in more detail later in the article. In the first solution set server1 to work in the range of 1 to 1,000,000,000 and server2 to work in the range of 1,000,000,001 to 2,000,000,000.  The ranges are set and defined using the DBCC CHECKIDENT command and when the ranges in this example are well maintained you meet the goal of preventing the INSERT commands to fall due to a PRIMARY KEY violation. The first insert at server1 will get the identity value of 1, the second insert will get the value of 2 and so on while on server2 the first insert will get the identity value of 1000000001, the second insert 1000000002 and so on thus avoiding a conflict. Be aware that when a row is inserted the identity value (seed) is generated as part of the insert command at each server and the inserted row is replicated. The replicated row includes the identity column’s value so the data remains consistent across all servers but you will be able to tell on what server the original insert took place due the range that  the identity value belongs to. In the second solution you do not manage ranges but enforce a situation in which identity values can never get overlapped by setting the first identity value (seed) and the increment property one time only during the CREATE TABLE command of each table. So a table on server1 looks like this: CREATE TABLE T1 (  c1 int NOT NULL IDENTITY(1, 5) PRIMARY KEY CLUSTERED ,c2 int NOT NULL ); And a table on server2 looks like this: CREATE TABLE T1(  c1 int NOT NULL IDENTITY(2, 5) PRIMARY KEY CLUSTERED ,c2 int NOT NULL ); When these two tables are inserted the results of the identity values look like this: Server1:  1, 6, 11, 16, 21, 26… Server2:  2, 7, 12, 17, 22, 27… This assures no identity values conflicts while leaving a room for 3 additional servers to participate in this same environment. You can go up to 9 servers using this method by setting an increment value of 9 instead of 5 as I used in this example. Continues…

    Read the article

  • Altering a Column Which has a Default Constraint

    - by Dinesh Asanka
    Setting up a default column is a common task for  developers.  But, are we naming those default constraints explicitly? In the below  table creation, for the column, sys_DateTime the default value Getdate() will be allocated. CREATE TABLE SampleTable (ID int identity(1,1), Sys_DateTime Datetime DEFAULT getdate() ) We can check the relevant information from the system catalogs from following query. SELECT sc.name TableName, dc.name DefaultName, dc.definition, OBJECT_NAME(dc.parent_object_id) TableName, dc.is_system_named  FROM sys.default_constraints dc INNER JOIN sys.columns sc ON dc.parent_object_id = sc.object_id AND dc.parent_column_id = sc.column_id and results would be: Most of the above columns are self-explanatory. The last column, is_system_named, is to identify whether the default name was given by the system. As you know, in the above case, since we didn’t provide  any default name, the  system will generate a default name for you. But the problem with these names is that they can differ from environment to environment.  If example if I create this table in different table the default name could be DF__SampleTab__Sys_D__7E6CC920 Now let us create another default and explicitly name it: CREATE TABLE SampleTable2 (ID int identity(1,1), Sys_DateTime Datetime )   ALTER TABLE SampleTable2 ADD CONSTRAINT DF_sys_DateTime_Getdate DEFAULT( Getdate()) FOR Sys_DateTime If we run the previous query again we will be returned the below output. And you can see that last created default name has 0 for is_system_named. Now let us say I want to change the data type of the sys_DateTime column to something else: ALTER TABLE SampleTable2 ALTER COLUMN Sys_DateTime Date This will generate the below error: Msg 5074, Level 16, State 1, Line 1 The object ‘DF_sys_DateTime_Getdate’ is dependent on column ‘Sys_DateTime’. Msg 4922, Level 16, State 9, Line 1 ALTER TABLE ALTER COLUMN Sys_DateTime failed because one or more objects access this column. This means, you need to drop the default constraint before altering it: ALTER TABLE [dbo].[SampleTable2] DROP CONSTRAINT [DF_sys_DateTime_Getdate] ALTER TABLE SampleTable2 ALTER COLUMN Sys_DateTime Date   ALTER TABLE [dbo].[SampleTable2] ADD CONSTRAINT [DF_sys_DateTime_Getdate] DEFAULT (getdate()) FOR [Sys_DateTime] If you have a system named default constraint that can differ from environment to environment and so you cannot drop it as before, you can use the below code template: DECLARE @defaultname VARCHAR(255) DECLARE @executesql VARCHAR(1000)   SELECT @defaultname = dc.name FROM sys.default_constraints dc INNER JOIN sys.columns sc ON dc.parent_object_id = sc.object_id AND dc.parent_column_id = sc.column_id WHERE OBJECT_NAME (parent_object_id) = 'SampleTable' AND sc.name ='Sys_DateTime' SET @executesql = 'ALTER TABLE SampleTable DROP CONSTRAINT ' + @defaultname EXEC( @executesql) ALTER TABLE SampleTable ALTER COLUMN Sys_DateTime Date ALTER TABLE [dbo].[SampleTable] ADD DEFAULT (Getdate()) FOR [Sys_DateTime]

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >