Search Results

Search found 79588 results on 3184 pages for 'sql data storage'.

Page 68/3184 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75  | Next Page >

  • New Big Data Appliance Security Features

    - by mgubar
    The Oracle Big Data Appliance (BDA) is an engineered system for big data processing.  It greatly simplifies the deployment of an optimized Hadoop Cluster – whether that cluster is used for batch or real-time processing.  The vast majority of BDA customers are integrating the appliance with their Oracle Databases and they have certain expectations – especially around security.  Oracle Database customers have benefited from a rich set of security features:  encryption, redaction, data masking, database firewall, label based access control – and much, much more.  They want similar capabilities with their Hadoop cluster.    Unfortunately, Hadoop wasn’t developed with security in mind.  By default, a Hadoop cluster is insecure – the antithesis of an Oracle Database.  Some critical security features have been implemented – but even those capabilities are arduous to setup and configure.  Oracle believes that a key element of an optimized appliance is that its data should be secure.  Therefore, by default the BDA delivers the “AAA of security”: authentication, authorization and auditing. Security Starts at Authentication A successful security strategy is predicated on strong authentication – for both users and software services.  Consider the default configuration for a newly installed Oracle Database; it’s been a long time since you had a legitimate chance at accessing the database using the credentials “system/manager” or “scott/tiger”.  The default Oracle Database policy is to lock accounts thereby restricting access; administrators must consciously grant access to users. Default Authentication in Hadoop By default, a Hadoop cluster fails the authentication test. For example, it is easy for a malicious user to masquerade as any other user on the system.  Consider the following scenario that illustrates how a user can access any data on a Hadoop cluster by masquerading as a more privileged user.  In our scenario, the Hadoop cluster contains sensitive salary information in the file /user/hrdata/salaries.txt.  When logged in as the hr user, you can see the following files.  Notice, we’re using the Hadoop command line utilities for accessing the data: $ hadoop fs -ls /user/hrdataFound 1 items-rw-r--r--   1 oracle supergroup         70 2013-10-31 10:38 /user/hrdata/salaries.txt$ hadoop fs -cat /user/hrdata/salaries.txtTom Brady,11000000Tom Hanks,5000000Bob Smith,250000Oprah,300000000 User DrEvil has access to the cluster – and can see that there is an interesting folder called “hrdata”.  $ hadoop fs -ls /user Found 1 items drwx------   - hr supergroup          0 2013-10-31 10:38 /user/hrdata However, DrEvil cannot view the contents of the folder due to lack of access privileges: $ hadoop fs -ls /user/hrdata ls: Permission denied: user=drevil, access=READ_EXECUTE, inode="/user/hrdata":oracle:supergroup:drwx------ Accessing this data will not be a problem for DrEvil. He knows that the hr user owns the data by looking at the folder’s ACLs. To overcome this challenge, he will simply masquerade as the hr user. On his local machine, he adds the hr user, assigns that user a password, and then accesses the data on the Hadoop cluster: $ sudo useradd hr $ sudo passwd $ su hr $ hadoop fs -cat /user/hrdata/salaries.txt Tom Brady,11000000 Tom Hanks,5000000 Bob Smith,250000 Oprah,300000000 Hadoop has not authenticated the user; it trusts that the identity that has been presented is indeed the hr user. Therefore, sensitive data has been easily compromised. Clearly, the default security policy is inappropriate and dangerous to many organizations storing critical data in HDFS. Big Data Appliance Provides Secure Authentication The BDA provides secure authentication to the Hadoop cluster by default – preventing the type of masquerading described above. It accomplishes this thru Kerberos integration. Figure 1: Kerberos Integration The Key Distribution Center (KDC) is a server that has two components: an authentication server and a ticket granting service. The authentication server validates the identity of the user and service. Once authenticated, a client must request a ticket from the ticket granting service – allowing it to access the BDA’s NameNode, JobTracker, etc. At installation, you simply point the BDA to an external KDC or automatically install a highly available KDC on the BDA itself. Kerberos will then provide strong authentication for not just the end user – but also for important Hadoop services running on the appliance. You can now guarantee that users are who they claim to be – and rogue services (like fake data nodes) are not added to the system. It is common for organizations to want to leverage existing LDAP servers for common user and group management. Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository. This simplifies the deployment and administration of the secure environment. Authorize Access to Sensitive Data Kerberos-based authentication ensures secure access to the system and the establishment of a trusted identity – a prerequisite for any authorization scheme. Once this identity is established, you need to authorize access to the data. HDFS will authorize access to files using ACLs with the authorization specification applied using classic Linux-style commands like chmod and chown (e.g. hadoop fs -chown oracle:oracle /user/hrdata changes the ownership of the /user/hrdata folder to oracle). Authorization is applied at the user or group level – utilizing group membership found in the Linux environment (i.e. /etc/group) or in the LDAP server. For SQL-based data stores – like Hive and Impala – finer grained access control is required. Access to databases, tables, columns, etc. must be controlled. And, you want to leverage roles to facilitate administration. Apache Sentry is a new project that delivers fine grained access control; both Cloudera and Oracle are the project’s founding members. Sentry satisfies the following three authorization requirements: Secure Authorization:  the ability to control access to data and/or privileges on data for authenticated users. Fine-Grained Authorization:  the ability to give users access to a subset of the data (e.g. column) in a database Role-Based Authorization:  the ability to create/apply template-based privileges based on functional roles. With Sentry, “all”, “select” or “insert” privileges are granted to an object. The descendants of that object automatically inherit that privilege. A collection of privileges across many objects may be aggregated into a role – and users/groups are then assigned that role. This leads to simplified administration of security across the system. Figure 2: Object Hierarchy – granting a privilege on the database object will be inherited by its tables and views. Sentry is currently used by both Hive and Impala – but it is a framework that other data sources can leverage when offering fine-grained authorization. For example, one can expect Sentry to deliver authorization capabilities to Cloudera Search in the near future. Audit Hadoop Cluster Activity Auditing is a critical component to a secure system and is oftentimes required for SOX, PCI and other regulations. The BDA integrates with Oracle Audit Vault and Database Firewall – tracking different types of activity taking place on the cluster: Figure 3: Monitored Hadoop services. At the lowest level, every operation that accesses data in HDFS is captured. The HDFS audit log identifies the user who accessed the file, the time that file was accessed, the type of access (read, write, delete, list, etc.) and whether or not that file access was successful. The other auditing features include: MapReduce:  correlate the MapReduce job that accessed the file Oozie:  describes who ran what as part of a workflow Hive:  captures changes were made to the Hive metadata The audit data is captured in the Audit Vault Server – which integrates audit activity from a variety of sources, adding databases (Oracle, DB2, SQL Server) and operating systems to activity from the BDA. Figure 4: Consolidated audit data across the enterprise.  Once the data is in the Audit Vault server, you can leverage a rich set of prebuilt and custom reports to monitor all the activity in the enterprise. In addition, alerts may be defined to trigger violations of audit policies. Conclusion Security cannot be considered an afterthought in big data deployments. Across most organizations, Hadoop is managing sensitive data that must be protected; it is not simply crunching publicly available information used for search applications. The BDA provides a strong security foundation – ensuring users are only allowed to view authorized data and that data access is audited in a consolidated framework.

    Read the article

  • How to present a stable data model in a public API that allows internal data structures to be changed without breaking the public view of the data?

    - by Max Palmer
    I am in the process of developing an application that allows users to write C# scripts. These scripts allow users to call selected methods and to access and manipulate data in a document. This works well, however, in the development version, scripts access the document's (internal) data structures directly. This means that if we were to change the internal data model/structure, there is a good chance that someone's script will no longer compile. We obviously want to prevent this breaking change from happening, but still want to allow the user to write sensible C# code (whilst not restricting how we develop our internal data model as a result). We therefore need to decouple our scripting API and its data structures from our internal methods and data structures. We've a few ideas as to how we might allow the user to access a what is effectively a stable public version of the document's internal data*, but I wanted to throw the question out there to someone who might have some real experience of this problem. NB our internal document's data structure is quite complex and it could be quite difficult to wrap. We know we want to expose as little as possible in our public API, especially as once it's out there, it's out there for good. Can anyone help? How do scripting languages / APIs decouple their public API and data structures from their internal data structures? Is there no real alternative to having to write a complex interaction layer? If we need to do this, what's a good approach or pattern for wrapping complex data structures that include nested objects, including collections? I've looked at the API facade pattern, which looks like it's trying to address these kinds of issues, but are there alternatives? *One idea is to build a data facade that is kept stable across versions of our application. The facade exposes a set of facade data objects that are used in the script code. These maintain backwards compatibility and wrap access to our internal document's data model.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • The blocking nature of aggregates

    - by Rob Farley
    I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here.  In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

    Read the article

  • Automating deployments with the SQL Compare command line

    - by Jonathan Hickford
    In my previous article, “Five Tips to Get Your Organisation Releasing Software Frequently” I looked at how teams can automate processes to speed up release frequency. In this post, I’m looking specifically at automating deployments using the SQL Compare command line. SQL Compare compares SQL Server schemas and deploys the differences. It works very effectively in scenarios where only one deployment target is required – source and target databases are specified, compared, and a change script is automatically generated and applied. But if multiple targets exist, and pressure to increase the frequency of releases builds, this solution quickly becomes unwieldy.   This is where SQL Compare’s command line comes into its own. I’ve put together a PowerShell script that loops through the Servers table and pulls out the server and database, these are then passed to sqlcompare.exe to be used as target parameters. In the example the source database is a scripts folder, a folder structure of scripted-out database objects used by both SQL Source Control and SQL Compare. The script can easily be adapted to use schema snapshots.     -- Create a DeploymentTargets database and a Servers table CREATE DATABASE DeploymentTargets GO USE DeploymentTargets GO CREATE TABLE [dbo].[Servers]( [id] [int] IDENTITY(1,1) NOT NULL, [serverName] [nvarchar](50) NULL, [environment] [nvarchar](50) NULL, [databaseName] [nvarchar](50) NULL, CONSTRAINT [PK_Servers] PRIMARY KEY CLUSTERED ([id] ASC) ) GO -- Now insert your target server and database details INSERT INTO dbo.Servers ( serverName , environment , databaseName) VALUES ( N'myserverinstance' , N'myenvironment1' , N'mydb1') INSERT INTO dbo.Servers ( serverName , environment , databaseName) VALUES ( N'myserverinstance' , N'myenvironment2' , N'mydb2') Here’s the PowerShell script you can adapt for yourself as well. # We're holding the server names and database names that we want to deploy to in a database table. # We need to connect to that server to read these details $serverName = "" $databaseName = "DeploymentTargets" $authentication = "Integrated Security=SSPI" #$authentication = "User Id=xxx;PWD=xxx" # If you are using database authentication instead of Windows authentication. # Path to the scripts folder we want to deploy to the databases $scriptsPath = "SimpleTalk" # Path to SQLCompare.exe $SQLComparePath = "C:\Program Files (x86)\Red Gate\SQL Compare 10\sqlcompare.exe" # Create SQL connection string, and connection $ServerConnectionString = "Data Source=$serverName;Initial Catalog=$databaseName;$authentication" $ServerConnection = new-object system.data.SqlClient.SqlConnection($ServerConnectionString); # Create a Dataset to hold the DataTable $dataSet = new-object "System.Data.DataSet" "ServerList" # Create a query $query = "SET NOCOUNT ON;" $query += "SELECT serverName, environment, databaseName " $query += "FROM dbo.Servers; " # Create a DataAdapter to populate the DataSet with the results $dataAdapter = new-object "System.Data.SqlClient.SqlDataAdapter" ($query, $ServerConnection) $dataAdapter.Fill($dataSet) | Out-Null # Close the connection $ServerConnection.Close() # Populate the DataTable $dataTable = new-object "System.Data.DataTable" "Servers" $dataTable = $dataSet.Tables[0] #For every row in the DataTable $dataTable | FOREACH-OBJECT { "Server Name: $($_.serverName)" "Database Name: $($_.databaseName)" "Environment: $($_.environment)" # Compare the scripts folder to the database and synchronize the database to match # NB. Have set SQL Compare to abort on medium level warnings. $arguments = @("/scripts1:$($scriptsPath)", "/server2:$($_.serverName)", "/database2:$($_.databaseName)", "/AbortOnWarnings:Medium") # + @("/sync" ) # Commented out the 'sync' parameter for safety, write-host $arguments & $SQLComparePath $arguments "Exit Code: $LASTEXITCODE" # Some interesting variations # Check that every database matches a folder. # For example this might be a pre-deployment step to validate everything is at the same baseline state. # Or a post deployment script to validate the deployment worked. # An exit code of 0 means the databases are identical. # # $arguments = @("/scripts1:$($scriptsPath)", "/server2:$($_.serverName)", "/database2:$($_.databaseName)", "/Assertidentical") # Generate a report of the difference between the folder and each database. Generate a SQL update script for each database. # For example use this after the above to generate upgrade scripts for each database # Examine the warnings and the HTML diff report to understand how the script will change objects # #$arguments = @("/scripts1:$($scriptsPath)", "/server2:$($_.serverName)", "/database2:$($_.databaseName)", "/ScriptFile:update_$($_.environment+"_"+$_.databaseName).sql", "/report:update_$($_.environment+"_"+$_.databaseName).html" , "/reportType:Interactive", "/showWarnings", "/include:Identical") } It’s worth noting that the above example generates the deployment scripts dynamically. This approach should be problem-free for the vast majority of changes, but it is still good practice to review and test a pre-generated deployment script prior to deployment. An alternative approach would be to pre-generate a single deployment script using SQL Compare, and run this en masse to multiple targets programmatically using sqlcmd, or using a tool like SQL Multi Script.  You can use the /ScriptFile, /report, and /showWarnings flags to generate change scripts, difference reports and any warnings.  See the commented out example in the PowerShell: #$arguments = @("/scripts1:$($scriptsPath)", "/server2:$($_.serverName)", "/database2:$($_.databaseName)", "/ScriptFile:update_$($_.environment+"_"+$_.databaseName).sql", "/report:update_$($_.environment+"_"+$_.databaseName).html" , "/reportType:Interactive", "/showWarnings", "/include:Identical") There is a drawback of running a pre-generated deployment script; it assumes that a given database target hasn’t drifted from its expected state. Often there are (rightly or wrongly) many individuals within an organization who have permissions to alter the production database, and changes can therefore be made outside of the prescribed development processes. The consequence is that at deployment time, the applied script has been validated against a target that no longer represents reality. The solution here would be to add a check for drift prior to running the deployment script. This is achieved by using sqlcompare.exe to compare the target against the expected schema snapshot using the /Assertidentical flag. Should this return any differences (sqlcompare.exe Exit Code 79), a drift report is outputted instead of executing the deployment script.  See the commented out example. # $arguments = @("/scripts1:$($scriptsPath)", "/server2:$($_.serverName)", "/database2:$($_.databaseName)", "/Assertidentical") Any checks and processes that should be undertaken prior to a manual deployment, should also be happen during an automated deployment. You might think about triggering backups prior to deployment – even better, automate the verification of the backup too.   You can use SQL Compare’s command line interface along with PowerShell to automate multiple actions and checks that you need in your deployment process. Automation is a practical solution where multiple targets and a higher release cadence come into play. As we know, with great power comes great responsibility – responsibility to ensure that the necessary checks are made so deployments remain trouble-free.  (The code sample supplied in this post automates the simple dynamic deployment case – if you are considering more advanced automation, e.g. the drift checks, script generation, deploying to large numbers of targets and backup/verification, please email me at [email protected] for further script samples or if you have further questions)

    Read the article

  • Heroku Postgres: A New SQL Database-as-a-Service

    Idera, a Houston-based company known worldwide for its SQL Server solutions in the realms of backup and recovery, performance monitoring, auditing, security, and more, recently announced that it had won five of SQL Server Magazine's 2011 Community Choice Awards. SQL Server Magazine, a publication produced by Penton Media, offers SQL Server users, both beginning and advanced, a host of hands-on information delivered by SQL Server experts. The magazine presented Idera with 2011 Community Choice Awards for five separate products which will only serve to boost the already strong reputation of it...

    Read the article

  • Idera Releases SQL Diagnostic Manager v7.1

    Idera recently beefed up its portfolio of SQL Server management and administration tools with the release of SQL diagnostic manager 7.1. Idera has enhanced SQL diagnostic manager's already impressive set of features in version 7.1 with new additions that should appeal to database administrators. The release is another example of Idera's dedication to growing its portfolio of SQL Server solutions that has allowed the Microsoft Managed Partner to expand its client base to over 10,000 customers worldwide. The highlights of SQL diagnostic manager 7.1's new features begin with an impressive Serve...

    Read the article

  • How to Schedule Backups with SQL Server Express

    - by The Official Microsoft IIS Site
    Microsoft’s SQL Server Express is a fantastic product for anyone needing a relational database on a limited budget. By limited budget I’m talking free. Yes SQL Server Express is free but it comes with a few limitations such as only utilizing 1 GB of RAM, databases are limited to 10 GB, and it does not include SQL Profiler. For low volume sites that do not need enterprise level capabilities, this is a compelling solution. Here is a complete SQL Server feature comparison of all the SQL Server...(read more)

    Read the article

  • Querying Visual Studio project files using T-SQL and Powershell

    - by jamiet
    Earlier today I had a need to get some information out of a Visual Studio project file and in this blog post I’m going to share a couple of ways of going about that because I’m pretty sure I won’t be the only person that ever wants to do this. The specific problem I was trying to solve was finding out how many objects in my database project (i.e. in my .dbproj file) had any warnings suppressed but the techniques discussed below will work pretty well for any Visual Studio project file because every such file is simply an XML document, hence it can be queried by anything that can query XML documents. Ever heard the phrase “when all you’ve got is hammer everything looks like a nail”? Well that’s me with querying stuff – if I can write SQL then I’m writing SQL. Here’s a little noddy database project I put together for demo purposes: Two views and a stored procedure, nothing fancy. I suppressed warnings for [View1] & [Procedure1] and hence the pertinent part my project file looks like this:   <ItemGroup>    <Build Include="Schema Objects\Schemas\dbo\Views\View1.view.sql">      <SubType>Code</SubType>      <SuppressWarnings>4151,3276</SuppressWarnings>    </Build>    <Build Include="Schema Objects\Schemas\dbo\Views\View2.view.sql">      <SubType>Code</SubType>    </Build>    <Build Include="Schema Objects\Schemas\dbo\Programmability\Stored Procedures\Procedure1.proc.sql">      <SubType>Code</SubType>      <SuppressWarnings>4151</SuppressWarnings>    </Build>  </ItemGroup>  <ItemGroup> Note the <SuppressWarnings> elements – those are the bits of information that I am after. With a lot of help from folks on the SQL Server XML forum  I came up with the following query that nailed what I was after. It reads the contents of the .dbproj file into a variable of type XML and then shreds it using T-SQL’s XML data type methods: DECLARE @xml XML; SELECT @xml = CAST(pkgblob.BulkColumn AS XML) FROM   OPENROWSET(BULK 'C:\temp\QueryingProjectFileDemo\QueryingProjectFileDemo.dbproj' -- <-Change this path!                    ,single_blob) AS pkgblob                    ;WITH XMLNAMESPACES( 'http://schemas.microsoft.com/developer/msbuild/2003' AS ns) SELECT  REVERSE(SUBSTRING(REVERSE(ObjectPath),0,CHARINDEX('\',REVERSE(ObjectPath)))) AS [ObjectName]        ,[SuppressedWarnings] FROM   (        SELECT  build.query('.') AS [_node]        ,       build.value('ns:SuppressWarnings[1]','nvarchar(100)') AS [SuppressedWarnings]        ,       build.value('@Include','nvarchar(1000)') AS [ObjectPath]        FROM    @xml.nodes('//ns:Build[ns:SuppressWarnings]') AS R(build)        )q And here’s the output: And that’s it – an easy way of discovering which warnings have been suppressed and for which objects in your database projects. I won’t bother going over the code as it is fairly self-explanatory – peruse it at your leisure.   Once I had the SQL above I figured I’d share it around a little in case it was ever useful to anyone else; hence I’m writing this blog post and I also posted it on the Visual Studio Database Development Tools forum at FYI: Discover which objects have had warnings suppressed. Luckily Kevin Goode saw the thread and he posted a different solution to the same problem, one that uses Powershell. The advantage of Kevin’s Powershell approach is that it is easy to analyse many .dbproj files at the same time. Below is Kevin’s code which I have tweaked ever so slightly so that it produces the same results as my SQL script (I just want any object that had had a warning suppressed whereas Kevin was querying specifically for warning 4151):   cd 'C:\Temp\QueryingProjectFileDemo\' cls $projects = ls -r -i *.dbproj Foreach($project in $projects) { $xml = new-object System.Xml.XmlDocument $xml.set_PreserveWhiteSpace( $true ) $xml.Load($project) #$xpath = @{Start="/e:Project/e:ItemGroup/e:Build[e:SuppressWarnings=4151]/@Include"} #$xpath = @{Start="/e:Project/e:ItemGroup/e:Build[contains(e:SuppressWarnings,'4151')]/@Include"} $xpath = @{Start="/e:Project/e:ItemGroup/e:Build[e:SuppressWarnings]/@Include"} $ns = @{ e = "http://schemas.microsoft.com/developer/msbuild/2003" } $xml | Select-Xml -XPath $xpath.Start -Namespace $ns |Select -Expand Node | Select -expand Value } and here’s the output: Nice reusable Powershell and SQL scripts – not bad for an evening’s work. Thank you to Kevin for allowing me to share his code. Don’t forget that these techniques can easily be adapted to query any Visual Studio project file, they’re only XML documents after all! Doubtless many people out there already have code for doing this but nonetheless here is another offering to the great script library in the sky. Have fun! @Jamiet

    Read the article

  • Top 10 Transact-SQL Statements a SQL Server DBA Should Know

    Microsoft SQL Server is a feature rich database management system product, with an enormous number of T-SQL commands. With each feature supporting its own list of commands, it can be difficult to remember them all. MAK shares his top 10 T-SQL statements that a DBA should know. Join SQL Backup’s 35,000+ customers to compress and strengthen your backups "SQL Backup will be a REAL boost to any DBA lucky enough to use it." Jonathan Allen. Download a free trial now.

    Read the article

  • SQL Server 2012 RTM Available!

    - by Davide Mauri
    SQL Server 2012 is available for download! http://www.microsoft.com/sqlserver/en/us/default.aspx The Evaluation version is available here: http://www.microsoft.com/download/en/details.aspx?id=29066 and along with the SQL Server 2012 RTM there’s also the Feature Pack available: http://www.microsoft.com/download/en/details.aspx?id=29065 The Feature Pack is rich of useful and interesting stuff, something needed by some feature, like the Semantic Language Statistics Database some other a very good (I would say needed) download if you use certain technologies, like MDS or Data Mining. Btw, for Data Mining also the updated Excel Addin has been released and it’s available in the Feature Pack. As if this would not be enough, also the SQL Server Data Tools IDE has been released in RTM: http://msdn.microsoft.com/en-us/data/hh297027 Remember that SQL Server Data Tool is completely free and can be used with SQL Server 2005 and after. Happy downloading!

    Read the article

  • SQL Server and the XML Data Type : Data Manipulation

    The introduction of the xml data type, with its own set of methods for processing xml data, made it possible for SQL Server developers to create columns and variables of the type xml. Deanna Dicken examines the modify() method, which provides for data manipulation of the XML data stored in the xml data type via XML DML statements. Too many SQL Servers to keep up with?Download a free trial of SQL Response to monitor your SQL Servers in just one intuitive interface."The monitoringin SQL Response is excellent." Mike Towery.

    Read the article

  • Delphi Application using COMMIT and ROLLBACK for Multiple SQL Updates

    - by Matt
    Is it possible to use the SQL BEGIN TRANSACTION, COMMIT TRANSACTION, ROLLBACK TRANSACTION when embedding SQL Queries into an application with mutiple calls to the SQL for Table Updates. For example I have the following code: Q.SQL.ADD(<UPDATE A RECORD>); Q.ExecSQL; Q.Close; Q.SQL.Clear; Q.SQL.ADD(<Select Some Data>); Q.Open; Set Some Variables Q.Close; Q.SQL.Clear; Q.SQL.ADD(<UPDATE A RECORD>); Q.ExecSQL; What I would like to do is if the second update fails I want to roll back the first transaction. If I set a unique notation for the BEGIN, COMMIT, ROLLBACK so as to specify what is being committed or rolled back, is it feasible. i.e. before the first Update specify BEGIN TRANSACTION_A then after the last update specify COMMIT TRANSACTION_A I hope that makes sense. If I was doing this in a SQL Stored Procedure then I would be able to specify this at the start and end of the procedure, but I have had to break the code down into manageable chunks due to process blocks and deadlocks on a heavy loaded SQL Server.

    Read the article

  • Easiest way to retrofit retry logic on LINQ to SQL migration to SQL Azure

    - by Pat James
    I have a couple of existing ASP .NET web forms and MVC applications that currently use LINQ to SQL with a SQL Server 2008 Express database on a Windows VPS: one VPS for both IIS and SQL. I am starting to outgrow the VPS's ability to effectively host both SQL and IIS and am getting ready to split them up. I am considering migrating the database to SQL Azure and keeping IIS on the VPS. After doing initial research it sounds like implementing retry logic in the data access layer is a must-do when adopting SQL Azure. I suspect this is even more critical to implement in my situation where IIS will be on a VPS outside of the Azure infrastructure. I am looking for pointers on how to do this with the least effort and impact on my existing code base. Is there a good retry pattern that can be applied once at the LINQ to SQL data access layer, as opposed to having to wrap all of my LINQ to SQL operations in try/catch/wait/retry logic?

    Read the article

  • Minimum privileges to read SQL Jobs using SQL SMO

    - by Gustavo Cavalcanti
    I wrote an application to use SQL SMO to find all SQL Servers, databases, jobs and job outcomes. This application is executed through a scheduled task using a local service account. This service account is local to the application server only and is not present in any SQL Server to be inspected. I am having problems getting information on job and job outcomes when connecting to the servers using a user with dbReader rights on system tables. If we set the user to be sysadmin on the server it all works fine. My question to you is: What are the minimum privileges a local SQL Server user needs to have in order to connect to the server and inspect jobs/job outcomes using the SQL SMO API? I connect to each SQL Server by doing the following: var conn = new ServerConnection { LoginSecure = false, ApplicationName = "SQL Inspector", ServerInstance = serverInstanceName, ConnectAsUser = false, Login = user, Password = password }; var smoServer = new Server (conn); I read the jobs by reading smoServer.JobServer.Jobs and read the JobSteps property on each of these jobs. The variable server is of type Microsoft.SqlServer.Management.Smo.Server. user/password are of the user found in each SQL Server to be inspected. If "user" is SysAdmin on the SQL Server to be inspected all works ok, as well as if we set ConnectAsUser to true and execute the scheduled task using my own credentials, which grants me SysAdmin privileges on SQL Server per my Active Directory membership. Thanks!

    Read the article

  • Why Cornell University Chose Oracle Data Masking

    - by Troy Kitch
    One of the eight Ivy League schools, Cornell University found itself in the unfortunate position of having to inform over 45,000 University community members that their personal information had been breached when a laptop was stolen. To ensure this wouldn’t happen again, Cornell took steps to ensure that data used for non-production purposes is de-identified with Oracle Data Masking. A recent podcast highlights why organizations like Cornell are choosing Oracle Data Masking to irreversibly de-identify production data for use in non-production environments. Organizations often copy production data, that contains sensitive information, into non-production environments so they can test applications and systems using “real world” information. Data in non-production has increasingly become a target of cyber criminals and can be lost or stolen due to weak security controls and unmonitored access. Similar to production environments, data breaches in non-production environments can cost millions of dollars to remediate and cause irreparable harm to reputation and brand. Cornell’s applications and databases help carry out the administrative and academic mission of the university. They are running Oracle PeopleSoft Campus Solutions that include highly sensitive faculty, student, alumni, and prospective student data. This data is supported and accessed by a diverse set of developers and functional staff distributed across the university. Several years ago, Cornell experienced a data breach when an employee’s laptop was stolen.  Centrally stored backup information indicated there was sensitive data on the laptop. With no way of knowing what the criminal intended, the university had to spend significant resources reviewing data, setting up service centers to handle constituent concerns, and provide free credit checks and identity theft protection services—all of which cost money and took time away from other projects. To avoid this issue in the future Cornell came up with several options; one of which was to sanitize the testing and training environments. “The project management team was brought in and they developed a project plan and implementation schedule; part of which was to evaluate competing products in the market-space and figure out which one would work best for us.  In the end we chose Oracle’s solution based on its architecture and its functionality.” – Tony Damiani, Database Administration and Business Intelligence, Cornell University The key goals of the project were to mask the elements that were identifiable as sensitive in a consistent and efficient manner, but still support all the previous activities in the non-production environments. Tony concludes,  “What we saw was a very minimal impact on performance. The masking process added an additional three hours to our refresh window, but it was well worth that time to secure the environment and remove the sensitive data. I think some other key points you can keep in mind here is that there was zero impact on the production environment. Oracle Data Masking works in non-production environments only. Additionally, the risk of exposure has been significantly reduced and the impact to business was minimal.” With Oracle Data Masking organizations like Cornell can: Make application data securely available in non-production environments Prevent application developers and testers from seeing production data Use an extensible template library and policies for data masking automation Gain the benefits of referential integrity so that applications continue to work Listen to the podcast to hear the complete interview.  Learn more about Oracle Data Masking by registering to watch this SANS Institute Webcast and view this short demo.

    Read the article

  • Removing Barriers to Create Effective Data Models

    After years of creating and maintaining data models, I have started to notice common barriers that decrease the accuracy and usefulness of models. In my opinion, the main causes of these barriers are the lack of knowledge and communication from within a company. The lack of knowledge in regards to data models or data modeling can take many forms. Company Culture Knowledge Whether documented or undocumented, existing business rules of a company can affect how data is modeled. For example, if a company only allows 1 assigned person per customer to be able to manipulate a customer’s record then then a data model that includes an associated table that joins customers and employee’s would be unneeded because that would allow for the possibility of multiple employees to handle a customer because of the potential for a many to many relationship between Customers and Employees. Technical Knowledge Depending on the data modeler’s proficiency in modeling data they can inadvertently cause issues and/or complications with a design without even noticing. It is important that companies share data modeling responsibilities so that the models are developed from multiple perspectives of a system, company and the original problem.  In addition, the tools that a company selects to create data models can also affect the accuracy of the model if designer are not familiar with the tools or the tools are too complex to use for the designer. Existing System Knowledge In order for a data modeler to model data for an existing system so that new changes can be applied to a system then they need to at least know the basic concepts of a system so that they can work within it. This will promote reusability of data and prevent the chance of duplicating data. Project Knowledge This should be pretty obvious, but it is very hard to create an accurate data model without knowing what data needs to be modeled. I have always found it strange that I have been asked to start modeling data prior to a client formalizing any requirements. Usually when this happens I have to make several iterations to a model, and the client still does not know exactly what they want.  In addition additional issues can arise when certain stakeholders of a project are not consulted prior to the design or after the project is over because it can cause miss understandings and confusion by the end user as well as possibly not solving the original problem for which a project is intended to solve. One common thread between each type of knowledge is that they can all be avoided through the use of good communication. For example, if a modeler is new to a company then they should ask older employees about any business specific rules that may be documented or undocumented that must be applied to projects in general. Furthermore, if a modeler is not really familiar with a specific data modeling software then they need to speak up and ask for help form other employees or their manager. This will not only help the modeler in the project, but also help them in future projects that they do for the company. Additionally, if a project is not clearly defined prior to a data modeler being assigned the modeling project then it is their responsibility to communicate with the other stakeholders to clarify any part of a project that is unclear so that the data model that is created is accurately aligned with a project.

    Read the article

  • How to control order of assignment for new identity column in SQL Server?

    - by alpav
    I have a table with CreateDate datetime field default(getdate()) that does not have any identity column. I would like to add identity(1,1) field that would reflect same order of existing records as CreateDate field (order by would give same results). How can I do that ? I guess if I create clustered key on CreateDate field and then add identity column it will work (not sure if it's guaranteed), is there a good/better way ? I am interested in SQL Server 2005, but I guess the answer will be the same for SQL Server 2008, SQL Server 2000.

    Read the article

  • Where should the partitioning column go in the primary key on SQL Server?

    - by Bialecki
    Using SQL Server 2005 and 2008. I've got a potentially very large table (potentially hundreds of millions of rows) consisting of the following columns: CREATE TABLE ( date SMALLDATETIME, id BIGINT, value FLOAT ) which is being partitioned on column date in daily partitions. The question then is should the primary key be on date, id or value, id? I can imagine that SQL Server is smart enough to know that it's already partitioning on date and therefore, if I'm always querying for whole chunks of days, then I can have it second in the primary key. Or I can imagine that SQL Server will need that column to be first in the primary key to get the benefit of partitioning. Can anyone lend some insight into which way the table should be keyed?

    Read the article

  • Does anyone have a backup strategy for SQL Azure databases?

    - by Pete
    I'm using SQL Azure on a project and it works great. The problem is that the usual backup features do not exist. I have exported the database a couple of times using SQLAzureMW ( http://sqlazuremw.codeplex.com/ ) but this tool is now choking trying to download the database data with bcp. In any case, it's not as nice a solution as SQL Server backups. Is anyone aware of a commercial or open source tool, or other technique, for making reliable backups of SQL Azure databases? This is really a showstopper.

    Read the article

  • What is the cost in bytes for the overhead of a sql_variant column in SQL Server?

    - by Elan
    I have a table, which contains many columns of float data type with 15 digit precision. Each column consumes 8 bytes of storage. Most of the time the data does not require this amount of precision and could be stored as a real data type. In many cases the value can be 0, in which case I could get away with storing a single byte. My goal here is to optimize space storage requirements, which is an issue I am facing working with a SQL Express 4GB database size limit. If byte, real and float data types are stored in a sql_variant column there is obviously some overhead involved in storing these values. What is the cost of this overhead? I would then need to evaluate whether I would actually end up in significant space savings (or not) switching to using sql_variant column data types. Thanks, Elan

    Read the article

  • What is the Sql Server equivalent for Oracle's DBMS_ASSERT?

    - by dotNetYum
    DBMS_ASSERT is one of the keys to prevent SQL injection attacks in Oracle. I tried a cursory search...is there any SQL Server 2005/2008 equivalent for this functionality? I am looking for a specific implementation that has a counterpart of all the respective Oracle package members of DBMS_ASSERT. NOOP SIMPLE_SQL_NAME QUALIFIED_SQL_NAME SCHEMA_NAME I know the best-practices of preventing injection...bind variables...being one of them. But,in this question I am specifically looking for a good way to sanitize input...in scenarios where bind-variables were not used. Do you have any specific implemetations? Is there a library that actually is a SQL Server Port of the Oracle package?

    Read the article

  • OData.org updated, gives clues about future SQL Azure enhancements

    - by jamiet
    The OData website at http://www.odata.org/home has been updated today to provide a much more engaging page than the previous sterile attempt. Moreover its now chockful of information about the progress of OData including a blog, a list of products that produce OData feeds plus some live OData feeds that you can hit up today, a list of OData-compliant clients and an FAQ. Most interestingly SQL Azure is listed as a producer of OData feeds: If you have a SQL Azure database account you can easily expose an OData feed through a simple configuration portal. You can select authenticated or anonymous access and expose different OData views according to permissions granted to the specified SQL Azure database user. A preview of this upcoming SQL Azure service is available at https://www.sqlazurelabs.com/OData.aspx and it enables you to select one of your existing SQL Azure databases and, with a few clicks, turn it into an OData feed. It looks as though SQL Azure will soon be added to the stable of products that natively support OData, good news indeed. @Jamiet Related blog posts: OData gunning for ubiquity across Microsoft products Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • SQL Server 2008 R2 transactional replication over VPN

    - by enashnash
    I'm having difficulty setting up replication over a VPN. I have a SQL Server 2008 R2, Enterprise Edition database on a Windows 2008 R2 Server. SQL Server is running on a non-standard port. I have set it up so that it is acting as its own distributor and have configured a publisher on this server. It is set as an updatable transational publication (yes, this is necessary). On this server, I have Routing and Remote Access enabled in order to be able to establish VPN connections. It is configured with a static IP address pool, of which the first in the range is always assigned to the server. I have assigned a test user a static address within this range (I don't know if this is necessary or not). All clients will be 2008 R2 versions, but could be SQL Express or standalone developer instances of the full product. I can establish a VPN connection from the client without problems and can see that the correct IP addresses are allocated. After connecting to the database to test that I can establish a connection, I realised that I needed to be able to connect to the database using the server name rather than an IP address - required for replication - which wouldn't work initially. I created an entry in the hosts file for the server on the client using the NETBIOS name of the server, and now I can connect to the server, from the client, using the SERVER\INSTANCE, PORT syntax, over the VPN. As it is the default instance on the server, I can also connect with simply SERVER, PORT syntax. After all that, I still get the following dreaded error: SQL Server replication requires the actual server name to make a connection to the server. Connections through a server alias, IP address, or any other alternate name are not supported. Specify the actual server name, 'SERVER\INSTANCE'. (Replication.Utilities). What have I missed? How do I get this to work? TIA

    Read the article

< Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75  | Next Page >