Search Results

Search found 37012 results on 1481 pages for 'sql query'.

Page 169/1481 | < Previous Page | 165 166 167 168 169 170 171 172 173 174 175 176  | Next Page >

  • PostgreSQL - fetch the row which has the Max value for a column

    - by Joshua Berry
    I'm dealing with a Postgres table (called "lives") that contains records with columns for time_stamp, usr_id, transaction_id, and lives_remaining. I need a query that will give me the most recent lives_remaining total for each usr_id There are multiple users (distinct usr_id's) time_stamp is not a unique identifier: sometimes user events (one by row in the table) will occur with the same time_stamp. trans_id is unique only for very small time ranges: over time it repeats remaining_lives (for a given user) can both increase and decrease over time example: time_stamp|lives_remaining|usr_id|trans_id ----------------------------------------- 07:00 | 1 | 1 | 1 09:00 | 4 | 2 | 2 10:00 | 2 | 3 | 3 10:00 | 1 | 2 | 4 11:00 | 4 | 1 | 5 11:00 | 3 | 1 | 6 13:00 | 3 | 3 | 1 As I will need to access other columns of the row with the latest data for each given usr_id, I need a query that gives a result like this: time_stamp|lives_remaining|usr_id|trans_id ----------------------------------------- 11:00 | 3 | 1 | 6 10:00 | 1 | 2 | 4 13:00 | 3 | 3 | 1 As mentioned, each usr_id can gain or lose lives, and sometimes these timestamped events occur so close together that they have the same timestamp! Therefore this query won't work: SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM (SELECT usr_id, max(time_stamp) AS max_timestamp FROM lives GROUP BY usr_id ORDER BY usr_id) a JOIN lives b ON a.max_timestamp = b.time_stamp Instead, I need to use both time_stamp (first) and trans_id (second) to identify the correct row. I also then need to pass that information from the subquery to the main query that will provide the data for the other columns of the appropriate rows. This is the hacked up query that I've gotten to work: SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM (SELECT usr_id, max(time_stamp || '*' || trans_id) AS max_timestamp_transid FROM lives GROUP BY usr_id ORDER BY usr_id) a JOIN lives b ON a.max_timestamp_transid = b.time_stamp || '*' || b.trans_id ORDER BY b.usr_id Okay, so this works, but I don't like it. It requires a query within a query, a self join, and it seems to me that it could be much simpler by grabbing the row that MAX found to have the largest timestamp and trans_id. The table "lives" has tens of millions of rows to parse, so I'd like this query to be as fast and efficient as possible. I'm new to RDBM and Postgres in particular, so I know that I need to make effective use of the proper indexes. I'm a bit lost on how to optimize. I found a similar discussion here. Can I perform some type of Postgres equivalent to an Oracle analytic function? Any advice on accessing related column information used by an aggregate function (like MAX), creating indexes, and creating better queries would be much appreciated! P.S. You can use the following to create my example case: create TABLE lives (time_stamp timestamp, lives_remaining integer, usr_id integer, trans_id integer); insert into lives values ('2000-01-01 07:00', 1, 1, 1); insert into lives values ('2000-01-01 09:00', 4, 2, 2); insert into lives values ('2000-01-01 10:00', 2, 3, 3); insert into lives values ('2000-01-01 10:00', 1, 2, 4); insert into lives values ('2000-01-01 11:00', 4, 1, 5); insert into lives values ('2000-01-01 11:00', 3, 1, 6); insert into lives values ('2000-01-01 13:00', 3, 3, 1);

    Read the article

  • Spooling in SQL execution plans

    - by Rob Farley
    Sewing has never been my thing. I barely even know the terminology, and when discussing this with American friends, I even found out that half the words that Americans use are different to the words that English and Australian people use. That said – let’s talk about spools! In particular, the Spool operators that you find in some SQL execution plans. This post is for T-SQL Tuesday, hosted this month by me! I’ve chosen to write about spools because they seem to get a bad rap (even in my song I used the line “There’s spooling from a CTE, they’ve got recursion needlessly”). I figured it was worth covering some of what spools are about, and hopefully explain why they are remarkably necessary, and generally very useful. If you have a look at the Books Online page about Plan Operators, at http://msdn.microsoft.com/en-us/library/ms191158.aspx, and do a search for the word ‘spool’, you’ll notice it says there are 46 matches. 46! Yeah, that’s what I thought too... Spooling is mentioned in several operators: Eager Spool, Lazy Spool, Index Spool (sometimes called a Nonclustered Index Spool), Row Count Spool, Spool, Table Spool, and Window Spool (oh, and Cache, which is a special kind of spool for a single row, but as it isn’t used in SQL 2012, I won’t describe it any further here). Spool, Table Spool, Index Spool, Window Spool and Row Count Spool are all physical operators, whereas Eager Spool and Lazy Spool are logical operators, describing the way that the other spools work. For example, you might see a Table Spool which is either Eager or Lazy. A Window Spool can actually act as both, as I’ll mention in a moment. In sewing, cotton is put onto a spool to make it more useful. You might buy it in bulk on a cone, but if you’re going to be using a sewing machine, then you quite probably want to have it on a spool or bobbin, which allows it to be used in a more effective way. This is the picture that I want you to think about in relation to your data. I’m sure you use spools every time you use your sewing machine. I know I do. I can’t think of a time when I’ve got out my sewing machine to do some sewing and haven’t used a spool. However, I often run SQL queries that don’t use spools. You see, the data that is consumed by my query is typically in a useful state without a spool. It’s like I can just sew with my cotton despite it not being on a spool! Many of my favourite features in T-SQL do like to use spools though. This looks like a very similar query to before, but includes an OVER clause to return a column telling me the number of rows in my data set. I’ll describe what’s going on in a few paragraphs’ time. So what does a Spool operator actually do? The spool operator consumes a set of data, and stores it in a temporary structure, in the tempdb database. This structure is typically either a Table (ie, a heap), or an Index (ie, a b-tree). If no data is actually needed from it, then it could also be a Row Count spool, which only stores the number of rows that the spool operator consumes. A Window Spool is another option if the data being consumed is tightly linked to windows of data, such as when the ROWS/RANGE clause of the OVER clause is being used. You could maybe think about the type of spool being like whether the cotton is going onto a small bobbin to fit in the base of the sewing machine, or whether it’s a larger spool for the top. A Table or Index Spool is either Eager or Lazy in nature. Eager and Lazy are Logical operators, which talk more about the behaviour, rather than the physical operation. If I’m sewing, I can either be all enthusiastic and get all my cotton onto the spool before I start, or I can do it as I need it. “Lazy” might not the be the best word to describe a person – in the SQL world it describes the idea of either fetching all the rows to build up the whole spool when the operator is called (Eager), or populating the spool only as it’s needed (Lazy). Window Spools are both physical and logical. They’re eager on a per-window basis, but lazy between windows. And when is it needed? The way I see it, spools are needed for two reasons. 1 – When data is going to be needed AGAIN. 2 – When data needs to be kept away from the original source. If you’re someone that writes long stored procedures, you are probably quite aware of the second scenario. I see plenty of stored procedures being written this way – where the query writer populates a temporary table, so that they can make updates to it without risking the original table. SQL does this too. Imagine I’m updating my contact list, and some of my changes move data to later in the book. If I’m not careful, I might update the same row a second time (or even enter an infinite loop, updating it over and over). A spool can make sure that I don’t, by using a copy of the data. This problem is known as the Halloween Effect (not because it’s spooky, but because it was discovered in late October one year). As I’m sure you can imagine, the kind of spool you’d need to protect against the Halloween Effect would be eager, because if you’re only handling one row at a time, then you’re not providing the protection... An eager spool will block the flow of data, waiting until it has fetched all the data before serving it up to the operator that called it. In the query below I’m forcing the Query Optimizer to use an index which would be upset if the Name column values got changed, and we see that before any data is fetched, a spool is created to load the data into. This doesn’t stop the index being maintained, but it does mean that the index is protected from the changes that are being done. There are plenty of times, though, when you need data repeatedly. Consider the query I put above. A simple join, but then counting the number of rows that came through. The way that this has executed (be it ideal or not), is to ask that a Table Spool be populated. That’s the Table Spool operator on the top row. That spool can produce the same set of rows repeatedly. This is the behaviour that we see in the bottom half of the plan. In the bottom half of the plan, we see that the a join is being done between the rows that are being sourced from the spool – one being aggregated and one not – producing the columns that we need for the query. Table v Index When considering whether to use a Table Spool or an Index Spool, the question that the Query Optimizer needs to answer is whether there is sufficient benefit to storing the data in a b-tree. The idea of having data in indexes is great, but of course there is a cost to maintaining them. Here we’re creating a temporary structure for data, and there is a cost associated with populating each row into its correct position according to a b-tree, as opposed to simply adding it to the end of the list of rows in a heap. Using a b-tree could even result in page-splits as the b-tree is populated, so there had better be a reason to use that kind of structure. That all depends on how the data is going to be used in other parts of the plan. If you’ve ever thought that you could use a temporary index for a particular query, well this is it – and the Query Optimizer can do that if it thinks it’s worthwhile. It’s worth noting that just because a Spool is populated using an Index Spool, it can still be fetched using a Table Spool. The details about whether or not a Spool used as a source shows as a Table Spool or an Index Spool is more about whether a Seek predicate is used, rather than on the underlying structure. Recursive CTE I’ve already shown you an example of spooling when the OVER clause is used. You might see them being used whenever you have data that is needed multiple times, and CTEs are quite common here. With the definition of a set of data described in a CTE, if the query writer is leveraging this by referring to the CTE multiple times, and there’s no simplification to be leveraged, a spool could theoretically be used to avoid reapplying the CTE’s logic. Annoyingly, this doesn’t happen. Consider this query, which really looks like it’s using the same data twice. I’m creating a set of data (which is completely deterministic, by the way), and then joining it back to itself. There seems to be no reason why it shouldn’t use a spool for the set described by the CTE, but it doesn’t. On the other hand, if we don’t pull as many columns back, we might see a very different plan. You see, CTEs, like all sub-queries, are simplified out to figure out the best way of executing the whole query. My example is somewhat contrived, and although there are plenty of cases when it’s nice to give the Query Optimizer hints about how to execute queries, it usually doesn’t do a bad job, even without spooling (and you can always use a temporary table). When recursion is used, though, spooling should be expected. Consider what we’re asking for in a recursive CTE. We’re telling the system to construct a set of data using an initial query, and then use set as a source for another query, piping this back into the same set and back around. It’s very much a spool. The analogy of cotton is long gone here, as the idea of having a continual loop of cotton feeding onto a spool and off again doesn’t quite fit, but that’s what we have here. Data is being fed onto the spool, and getting pulled out a second time when the spool is used as a source. (This query is running on AdventureWorks, which has a ManagerID column in HumanResources.Employee, not AdventureWorks2012) The Index Spool operator is sucking rows into it – lazily. It has to be lazy, because at the start, there’s only one row to be had. However, as rows get populated onto the spool, the Table Spool operator on the right can return rows when asked, ending up with more rows (potentially) getting back onto the spool, ready for the next round. (The Assert operator is merely checking to see if we’ve reached the MAXRECURSION point – it vanishes if you use OPTION (MAXRECURSION 0), which you can try yourself if you like). Spools are useful. Don’t lose sight of that. Every time you use temporary tables or table variables in a stored procedure, you’re essentially doing the same – don’t get upset at the Query Optimizer for doing so, even if you think the spool looks like an expensive part of the query. I hope you’re enjoying this T-SQL Tuesday. Why not head over to my post that is hosting it this month to read about some other plan operators? At some point I’ll write a summary post – once I have you should find a comment below pointing at it. @rob_farley

    Read the article

  • How to salvage SQL server 2008 query from KILLED/ROLLBACK state?

    - by littlegreen
    I have a stored procedure that inserts batches of millions of rows, emerging from a certain query, into an SQL database. It has one parameter selecting the batch; when this parameter is omitted, it will gather a list of batches and recursively call itself, in order to iterate over batches. In (pseudo-)code, it looks something like this: CREATE PROCEDURE spProcedure AS BEGIN IF @code = 0 BEGIN ... WHILE @@Fetch_Status=0 BEGIN EXEC spProcedure @code FETCH NEXT ... INTO @code END END ELSE BEGIN -- Disable indexes ... INSERT INTO table SELECT (...) -- Enable indexes ... Now it can happen that this procedure is slow, for whatever reason: it can't get a lock, one of the indexes it uses is misdefined or disabled. In that case, I want to be able kill the procedure, truncate and recreate the resulting table, and try again. However, when I try and kill the procedure, the process frequently oozes into a KILLED/ROLLBACK state from which there seems to be no return. From Google I have learned to do an sp_lock, find the spid, and then kill it with KILL <spid>. But when I try to kill it, it tells me SPID 75: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 554 seconds. I did find a forum message hinting that another spid should be killed before the other one can start a rollback. But that didn't work for me either, plus I do not understand, why that would be the case... could it be because I am recursively calling my own stored procedure? (But it should be having the same spid, right?) In any case, my process is just sitting there, being dead, not responding to kills, and locking the table. This is very frustrating, as I want to go on developing my queries, not waiting hours on my server sitting dead while pretending to be finishing a supposed rollback. Is there some way in which I can tell the server not to store any rollback information for my query? Or not to allow any other queries to interfere with the rollback, so that it will not take so long? Or how to rewrite my query in a better way, or how kill the process successfully without restarting the server?

    Read the article

  • SQL Query to delete oldest rows over a certain row count?

    - by Casey
    I have a table that contains log entries for a program I'm writing. I'm looking for ideas on an SQL query (I'm using SQL Server Express 2005) that will keep the newest X number of records, and delete the rest. I have a datetime column that is a timestamp for the log entry. I figure something like the following would work, but I'm not sure of the performance with the IN clause for larger numbers of records. Performance isn't critical, but I might as well do the best I can the first time. DELETE FROM MyTable WHERE PrimaryKey NOT IN (SELECT TOP 10,000 PrimaryKey FROM MyTable ORDER BY TimeStamp DESC)

    Read the article

  • MS Access CrossTab query - across 3 tables

    - by Prembo
    Hi, I have the following 3 tables: 1) Sweetness Table FruitIndex CountryIndex Sweetness 1 1 10 1 2 20 1 3 400 2 1 50 2 2 123 2 3 1 3 1 49 3 2 40 3 3 2 2) Fruit Name Table FruitIndex FruitName 1 Apple 2 Orange 3 Peaches 3) Country Name Table CountryIndex CountryName 1 UnitedStates 2 Canada 3 Mexico I'm trying to perform a CrossTab SQL query to end up with: Fruit\Country UnitedStates Canada Mexico Apple 10 20 400 Orange 50 123 1 Peaches 49 40 2 The challenging part is to label the rows/columns with the relevant names from the Name tables. I can use MS Access to design 2 queries, create the joins the fruit/country names table with the Sweetness table perform crosstab query However I'm having trouble doing this in a single query. I've attempted nesting the 1st query's SQL into the 2nd, but it doesn't seem to work. Unfortunately, my solution needs to be be wholly SQL, as it is an embedded SQL query (cannot rely on query designer in MS Access, etc.). Any help greatly appreciated. Prembo.

    Read the article

  • How to salvage SQL server 2008 query from KILLED/ROLLBACK state without waiting half a day?

    - by littlegreen
    I have a stored procedure that inserts batches of millions of rows, emerging from a certain query, into an SQL database. It has one parameter selecting the batch; when this parameter is omitted, it will gather a list of batches and recursively call itself, in order to iterate over batches. In (pseudo-)code, it looks something like this: CREATE PROCEDURE spProcedure AS BEGIN IF @code = 0 BEGIN ... WHILE @@Fetch_Status=0 BEGIN EXEC spProcedure @code FETCH NEXT ... INTO @code END END ELSE BEGIN -- Disable indexes ... INSERT INTO table SELECT (...) -- Enable indexes ... Now it can happen that this procedure is slow, for whatever reason: it can't get a lock, one of the indexes it uses is misdefined or disabled. In that case, I want to be able kill the procedure, truncate and recreate the resulting table, and try again. However, when I try and kill the procedure, the process frequently oozes into a KILLED/ROLLBACK state from which there seems to be no return. From Google I have learned to do an sp_lock, find the spid, and then kill it with KILL <spid>. But when I try to kill it, it tells me SPID 75: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 554 seconds. I did find a forum message hinting that another spid should be killed before the other one can start a rollback. But that didn't work for me either, plus I do not understand, why that would be the case... could it be because I am recursively calling my own stored procedure? (But it should be having the same spid, right?) In any case, my process is just sitting there, being dead, not responding to kills, and locking the table. This is very frustrating, as I want to go on developing my queries, not waiting hours on my server sitting dead while pretending to be finishing a supposed rollback. Is there some way in which I can tell the server not to store any rollback information for my query? Or not to allow any other queries to interfere with the rollback, so that it will not take so long? Or how to rewrite my query in a better way, or how kill the process successfully without restarting the server?

    Read the article

  • Left Join works with table but fails with query

    - by Frank Martin
    The following left join query in MS Access 2007 SELECT Table1.Field_A, Table1.Field_B, qry_Table2_Combined.Field_A, qry_Table2_Combined.Field_B, qry_Table2_Combined.Combined_Field FROM Table1 LEFT JOIN qry_Table2_Combined ON (Table1.Field_A = qry_Table2_Combined.Field_A) AND (Table1.Field_B = qry_Table2_Combined.Field_B); is expected by me to return this result: +--------+---------+---------+---------+----------------+ |Field_A | Field_B | Field_A | Field_B | Combined_Field | +--------+---------+---------+---------+----------------+ |1 | | | | | +--------+---------+---------+---------+----------------+ |1 | | | | | +--------+---------+---------+---------+----------------+ |2 |1 |2 |1 |John, Doe | +--------+---------+---------+---------+----------------+ |2 |2 | | | | +--------+---------+---------+---------+----------------+ [Table1] has 4 records, [qry_Table2_Combined] has 1 record. But it gives me this: +--------+---------+---------+---------+----------------+ |Field_A | Field_B | Field_A | Field_B | Combined_Field | +--------+---------+---------+---------+----------------+ |2 |1 |2 |1 |John, Doe | +--------+---------+---------+---------+----------------+ |2 |2 |2 | |, | +--------+---------+---------+---------+----------------+ Really weird is that the [Combined_Field] has a comma in the second row. I use a comma to concatenate two fields in [qry_Table2_Combined]. If the left join query uses a table created from the query [qry_Table2_Combined] it works as expected. Why does this left join query not give the same result for a query and a table? And how can i get the right results using a query in the left join?

    Read the article

  • jpa join query on a subclass

    - by Brian
    I have the following relationships in JPA (hibernate). Object X has two subclasses, Y and Z. Object A has a manyToOne relationship to object X. (Note, this is a one-sided relationship so object X cannot see object A). Now, I want to get the max value of a column in object A, but only where the relationship is of a specific subtype, ie...Y. So, that equates to...get the max value of column1 in object A, across all instances of A where they have a relationship with Y. Is this possible? I'm a bit lost as how to query it. I was thinking of something like: String query = "SELECT MAX(a.columnName) FROM A a join a.x; Query query = super.entityManager.createQuery(query); query.execute(); However that doesn't take account of the subclass of X...so I'm a bit lost. Any help would be much appreciated.

    Read the article

  • How can I improve my select query for storing large versioned data sets?

    - by Jason Francis
    At work, we build large multi-page web applications, consisting mostly of radio and check boxes. The primary purpose of each application is to gather data, but as users return to a page they have previously visited, we report back to them their previous responses. Worst-case scenario, we might have up to 900 distinct variables and around 1.5 million users. For several reasons, it makes sense to use an insert-only approach to storing the data (as opposed to update-in-place) so that we can capture historical data about repeated interactions with variables. The net result is that we might have several responses per user per variable. Our table to collect the responses looks something like this: CREATE TABLE [dbo].[results]( [id] [bigint] IDENTITY(1,1) NOT NULL, [userid] [int] NULL, [variable] [varchar](8) NULL, [value] [tinyint] NULL, [submitted] [smalldatetime] NULL) Where id serves as the primary key. Virtually every request results in a series of insert statements (one per variable submitted), and then we run a select to produce previous responses for the next page (something like this): SELECT t.id, t.variable, t.value FROM results t WITH (NOLOCK) WHERE t.userid = '2111846' AND (t.variable='internat' OR t.variable='veteran' OR t.variable='athlete') AND t.id IN (SELECT MAX(id) AS id FROM results WITH (NOLOCK) WHERE userid = '2111846' AND (t.variable='internat' OR t.variable='veteran' OR t.variable='athlete') GROUP BY variable) Which, in this case, would return the most recent responses for the variables "internat", "veteran", and "athlete" for user 2111846. We have followed the advice of the database tuning tools in indexing the tables, and against our data, this is the best-performing version of the select query that we have been able to come up with. Even so, there seems to be significant performance degradation as the table approaches 1 million records (and we might have about 150x that). We have a fairly-elegant solution in place for sharding the data across multiple tables which has been working quite well, but I am open for any advice about how I might construct a better version of the select query. We use this structure frequently for storing lots of independent data points, and we like the benefits it provides. So the question is, how can I improve the performance of the select query? I assume the nested select statement is a bad idea, but I have yet to find an alternative that performs as well. Thanks in advance. NB: Since we emphasize creating over reading in this case, and since we never update in place, there doesn't seem to be any penalty (and some advantage) for using the NOLOCK directive in this case.

    Read the article

  • Query Execution Plan - When is the Where clause executed?

    - by Alex
    I have a query like this (created by LINQ): SELECT [t0].[Id], [t0].[CreationDate], [t0].[CreatorId] FROM [dbo].[DataFTS]('test', 100) AS [t0] WHERE [t0].[CreatorId] = 1 ORDER BY [t0].[RANK] DataFTS is a full-text search table valued function. The query execution plan looks like this: SELECT (0%) - Sort (23%) - Nested Loops (Inner Join) (1%) - Sort (Top N Sort) (25%) - Stream Aggregate (0%) - Stream Aggregate (0%) - Compute Scalar (0%) - Table Valued Function (FullTextMatch) (13%) | | - Clustered Index Seek (38%) Does this mean that the WHERE clause ([CreatorId] = 1) is executed prior to the TVF ( full text search) or after the full text search? Thank you.

    Read the article

  • SQL: Interrupting a query

    - by NoozNooz42
    I've worked on a project using a proprietary non-SQL DB where queries could be interrupted and in the codebase there were quite some spots where that functionnality was used and made perfect sense (for example to stop a long running query that gets cancelled by the user, or when a more recent query takes place and renders the previous query obsolete, etc.) and I realized I never really saw that kind of "interrupted queries" previously and thought it could make a good SO question (several questions, but they're all related to exactly the same thing): can SQL queries be interrupted? is this part of the SQL standard? if it's not part of the SQL standard, which SQL DBs allow queries to be interrupted (any example most welcome)? is it common to interrupt a DB query (SQL or not) which you'll know you won't care about the result anymore? (in the codebase I've worked on, it sure helps lighten the server's load)

    Read the article

  • Migrating SQL Server Databases – The DBA’s Checklist (Part 1)

    - by Sadequl Hussain
    It is a fact of life: SQL Server databases change homes. They move from one instance to another, from one version to the next, from old servers to new ones.  They move around as an organisation’s data grows, applications are enhanced or new versions of the database software are released. If not anything else, servers become old and unreliable and databases eventually need to find a new home. Consider the following scenarios: 1.     A new  database application is rolled out in a production server from the development or test environment 2.     A copy of the production database needs to be installed in a test server for troubleshooting purposes 3.     A copy of the development database is regularly refreshed in a test server during the system development life cycle 4.     A SQL Server is upgraded to a newer version. This can be an in-place upgrade or a side-by-side migration 5.     One or more databases need to be moved between different instances as part of a consolidation strategy. The instances can be running the same or different version of SQL Server 6.     A database has to be restored from a backup file provided by a third party application vendor 7.     A backup of the database is restored in the same or different instance for disaster recovery 8.     A database needs to be migrated within the same instance: a.     Files are moved from direct attached storage to storage area network b.    The same database is copied under a different name for another application Migrating SQL Server database applications is a complex topic in itself. There are a number of components that can be involved: jobs, DTS or SSIS packages, logins or linked servers are only few pieces of the puzzle. However, in this article we will focus only on the central part of migration: the installation of the database itself. Unless it is an in-place upgrade, typically the database is taken from a source server and installed in a destination instance.  Most of the time, a full backup file is used for the rollout. The backup file is either provided to the DBA or the DBA takes the backup and restores it in the target server. Sometimes the database is detached from the source and the files are copied to and attached in the destination. Regardless of the method of copying, moving, refreshing, restoring or upgrading the physical database, there are a number of steps the DBA should follow before and after it has been installed in the destination. It is these post database installation steps we are going to discuss below. Some of these steps apply in almost every scenario described above while some will depend on the type of objects contained within the database.  Also, the principles hold regardless of the number of databases involved. Step 1:  Make a copy of data and log files when attaching and detaching When detaching and attaching databases, ensure you have made copies of the data and log files if the destination is running a newer version of SQL Server. This is because once attached to a newer version, the database cannot be detached and attached back to an older version. Trying to do so will give you a message like the following: Server: Msg 602, Level 21, State 50, Line 1 Could not find row in sysindexes for database ID 6, object ID 1, index ID 1. Run DBCC CHECKTABLE on sysindexes. Connection Broken If you try to backup the attached database and restore it in the source, it will still fail. Similarly, if you are restoring the database in a newer version, it cannot be backed up or detached and put back in an older version of SQL. Unlike detach and attach method though, you do not lose the backup file or the original database here. When detaching and attaching a database, it is important you keep all the log files available along with the data files. It is possible to attach a database without a log file and SQL Server can be instructed to create a new log file, however this does not work if the database was detached when the primary file group was read-only. You will need all the log files in such cases. Step 2: Change database compatibility level Once the database has been restored or attached to a newer version of SQL Server, change the database compatibility level to reflect the newer version unless there is a compelling reason not to do so. When attaching or restoring from a previous version of SQL, the database retains the older version’s compatibility level.  The only time you would want to keep a database with an older compatibility level is when the code within your database is no longer supported by SQL Server. For example, outer joins with *= or the =* operators were still possible in SQL 2000 (with a warning message), but not in SQL 2005 anymore. If your stored procedures or triggers are using this form of join, you would want to keep the database with an older compatibility level.  For a list of compatibility issues between older and newer versions of SQL Server databases, refer to the Books Online under the sp_dbcmptlevel topic. Application developers and architects can help you in deciding whether you should change the compatibility level or not. You can always change the compatibility mode from the newest to an older version if necessary. To change the compatibility level, you can either use the database’s property from the SQL Server Management Studio or use the sp_dbcmptlevel stored procedure.   Bear in mind that you cannot run the built-in reports for databases from SQL Server Management Studio if you keep the database with an older compatibility level. The following figure shows the error message I received when trying to run the “Disk Usage by Top Tables” report against a database. This database was hosted in a SQL Server 2005 system and still had a compatibility mode 80 (SQL 2000).     Continues…

    Read the article

  • The SSIS tuning tip that everyone misses

    - by Rob Farley
    I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

    Read the article

  • SQL Server service accounts and SPNs

    - by simonsabin
    Service Principal Names (SPNs) are a must for kerberos authentication which is a must when using sharepoint, reporting services and sql server where you access one server that then needs to access another resource, this is called the double hop. The reason this is a complex problem is that the second hop has to be done with impersonation/delegation. For this to work there needs to be a way for the security system to make sure that the service in the middle is allowed to impersonate you, after all you are not giving the service your password. To do this you need to be using kerberos. The following is my simple interpretation of how kerberos works. I find the Kerberos documentation rediculously complex so the following might be sligthly wrong but I think its close enough. Keberos works on a ticketing system, the prinicipal is that you get a security token from AD and then you can pass that to the service in the middle which can then use that token to impersonate you. For that to work AD has to be able to identify who is allowed to use the token, in this case the service account.But how do you as a client know what service account the service in the middle is configured with. The answer is SPNs. The SPN is the mapping between your logical connection to the service account. One type of SPN is for the DNS name for the server and the port. i.e. MySQL.mydomain.com and 1433. You can see how this maps to SQL Server on that server, but how does it map to the account. Well it can be done in two ways, either you can have a mapping defined in AD or AD can use a default mapping (this is something I didn't know about). To map the SPN in AD then you have to add the SPN to the user account, this is documented in the first link below either directly or using a tool called SetSPN. You might say that is complex, well it is and thats why SQL Server tries to do it for you, at start up it tries to connect to AD and set the SPN on the account it is running as, clearly that can only happen IF SQL is running as a domain account AND importantly it has permission to do so. By default a normal domain user account doesn't have the correct permission, and is why so many people have this problem. If the account is a domain admin then it will have permission, but non of us run SQL using domain admin accounts do we. You might also note that the SPN contains the port number (this isn't a requirement now in sql 2008 but I won't go into that), so if you set it manually and you are using dynamic ports (the default for a named instance) what do you do, well every time the port changes you need to change the SPN allocated to the account. Thats why its advised to let SQL Server register the SPN itself. You may also have thought, well what happens if I change my service account, won't that lead to two accounts with the same SPN. Possibly. Having two accounts with the same SPN is definitely a problem. Why? Well because if there are two accounts Kerberos can't identify the exact account that the service is running as, it could be either account, and so your security falls back to NTLM. SETSPN is useful for finding duplicate SPNs Reading this you will probably be thinking Oh my goodness this is really difficult. It is however I've found today in investigating something else that there is an easy option. Use Network Service as your service account. Network Service is a special account and is tied to the computer. It appears that Network Service has the update rights to AD to set an SPN mapping for the computer account. This then allows the SPN mapping to work. I believe this also works for the local system account. To get all the SPNs in your AD run the following, it could be a large file, so you might want to restrict it to a specific OU, or CN ldifde -d "DC=<domain>" -l servicePrincipalName -F spn.txt You will read in the links below that you need SQL to register the SPN this is done how to use Kerberos authenticaiton in SQL Server - http://support.microsoft.com/kb/319723 Using Kerberos with SQL Server - http://blogs.msdn.com/sql_protocols/archive/2005/10/12/479871.aspx Understanding Kerberos and NTLM authentication in SQL Server Connections - http://blogs.msdn.com/sql_protocols/archive/2006/12/02/understanding-kerberos-and-ntlm-authentication-in-sql-server-connections.aspx Summary The only reason I personally know to use a domain account is when you can't get kerberos to work and you want to do BULK INSERT or other network service that requires access to a a remote server. In this case you have to resort to using SQL authentication and the SQL Server uses its service account to access the remote service, and thus you need a domain account. You migth need this if using some forms of replication. I've always found Kerberos awkward to setup and so fallen back to this domain account approach. So in summary to get Kerberos to work try using the network service or local system accounts. For a great post from the Adam Saxton of the SQL Server support team go to http://blogs.msdn.com/psssql/archive/2010/03/09/what-spn-do-i-use-and-how-does-it-get-there.aspx 

    Read the article

  • MS-SQL: How to get a subitem of sp_helplanguage ?

    - by Quandary
    Question: I can get the MS-SQL database language by querying: SELECT @@language And I can get further info via EXEC sp_helplanguage How can I query for a column of sp_helplanguage where name= @@language I do SELECT * FROM sp_helplanguage WHERE name='DEUTSCH' but that obviously doesn't work. What's the correct way to query it ?

    Read the article

  • How to extract custom tokens in SQL Server NVarChar/VarChar field by using RegEx?

    - by Kthurein
    Is there any way to extract the matched strings by using Regex in T-SQL(SQL Server 2005)? For example: Welcome [CT Name="UserName" /], We hope that you will enjoy our services and your subscription will be expired on [CT Name="ExpiredDate" /]. I would like to extract the custom tokens in tabular format as follows: [CT Name="UserName" /] [CT Name="ExpiredDate" /] Thanks for your suggestion!

    Read the article

  • How do I find out the expiry date of a SQL Server 2008 trial-install instance?

    - by Peter Mounce
    So I installed a trial of SQL Server 2008 enterprise edition while waiting for MSDN licenses to come through - I now want to uninstall the trial and replace it with a developer edition installation. However, I'd like to first know how long I have left on the trial. Is there a way to do this programmatically with SQL? I looked at create_date in sys.databases, but these give dates that are in 2003 (which is, I guess, when master and model were originally created).

    Read the article

< Previous Page | 165 166 167 168 169 170 171 172 173 174 175 176  | Next Page >