SQL SERVER – Concurrency Basics – Guest Post by Vinod Kumar
- by pinaldave
This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions from SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect.
Let us read the blog post in Vinod’s own voice.
Learning is always fun when it comes to SQL Server and learning the basics again can be more fun. I did write about Transaction Logs and recovery over my blogs and the concept of simplifying the basics is a challenge. In the real world we always see checks and queues for a process – say railway reservation, banks, customer supports etc there is a process of line and queue to facilitate everyone. Shorter the queue higher is the efficiency of system (a.k.a higher is the concurrency). Every database does implement this using checks like locking, blocking mechanisms and they implement the standards in a way to facilitate higher concurrency. In this post, let us talk about the topic of Concurrency and what are the various aspects that one needs to know about concurrency inside SQL Server. Let us learn the concepts as one-liners:
Concurrency can be defined as the ability of multiple processes to access or change shared data at the same time.
The greater the number of concurrent user processes that can be active without interfering with each other, the greater the concurrency of the database system.
Concurrency is reduced when a process that is changing data prevents other processes from reading that data or when a process that is reading data prevents other processes from changing that data.
Concurrency is also affected when multiple processes are attempting to change the same data simultaneously.
Two approaches to managing concurrent data access:
Optimistic Concurrency Model
Pessimistic Concurrency Model
Concurrency Models
Pessimistic Concurrency
Default behavior: acquire locks to block access to data that another process is using.
Assumes that enough data modification operations are in the system that any given read operation is likely affected by a data modification made by another user (assumes conflicts will occur).
Avoids conflicts by acquiring a lock on data being read so no other processes can modify that data.
Also acquires locks on data being modified so no other processes can access the data for either reading or modifying.
Readers block writer, writers block readers and writers.
Optimistic Concurrency
Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying.
Default behavior of optimistic concurrency is to use row versioning to allow data readers to see the state of the data before the modification occurs.
Older versions of the data are saved so a process reading data can see the data as it was when the process started reading and not affected by any changes being made to that data.
Processes modifying the data is unaffected by processes reading the data because the reader is accessing a saved version of the data rows.
Readers do not block writers and writers do not block readers, but, writers can and will block writers.
Transaction Processing
A transaction is the basic unit of work in SQL Server.
Transaction consists of SQL commands that read and update the database but the update is not considered final until a COMMIT command is issued (at least for an explicit transaction: marked with a BEGIN TRAN and the end is marked by a COMMIT TRAN or ROLLBACK TRAN).
Transactions must exhibit all the ACID properties of a transaction.
ACID Properties
Transaction processing must guarantee the consistency and recoverability of SQL Server databases.
Ensures all transactions are performed as a single unit of work regardless of hardware or system failure.
A – Atomicity C – Consistency I – Isolation D- Durability
Atomicity: Each transaction is treated as all or nothing – it either commits or aborts.
Consistency: ensures that a transaction won’t allow the system to arrive at an incorrect logical state – the data must always be logically correct. Consistency is honored even in the event of a system failure.
Isolation: separates concurrent transactions from the updates of other incomplete transactions. SQL Server accomplishes isolation among transactions by locking data or creating row versions.
Durability: After a transaction commits, the durability property ensures that the effects of the transaction persist even if a system failure occurs. If a system failure occurs while a transaction is in progress, the transaction is completely undone, leaving no partial effects on data.
Transaction Dependencies
In addition to supporting all four ACID properties, a transaction might exhibit few other behaviors (known as dependency problems or consistency problems).
Lost Updates: Occur when two processes read the same data and both manipulate the data, changing its value and then both try to update the original data to the new value. The second process might overwrite the first update completely.
Dirty Reads: Occurs when a process reads uncommitted data. If one process has changed data but not yet committed the change, another process reading the data will read it in an inconsistent state.
Non-repeatable Reads: A read is non-repeatable if a process might get different values when reading the same data in two reads within the same transaction. This can happen when another process changes the data in between the reads that the first process is doing.
Phantoms: Occurs when membership in a set changes. It occurs if two SELECT operations using the same predicate in the same transaction return a different number of rows.
Isolation Levels
SQL Server supports 5 isolation levels that control the behavior of read operations.
Read Uncommitted
All behaviors except for lost updates are possible.
Implemented by allowing the read operations to not take any locks, and because of this, it won’t be blocked by conflicting locks acquired by other processes. The process can read data that another process has modified but not yet committed.
When using the read uncommitted isolation level and scanning an entire table, SQL Server can decide to do an allocation order scan (in page-number order) instead of a logical order scan (following page pointers). If another process doing concurrent operations changes data and move rows to a new location in the table, the allocation order scan can end up reading the same row twice.
Also can happen if you have read a row before it is updated and then an update moves the row to a higher page number than your scan encounters later.
Performing an allocation order scan under Read Uncommitted can cause you to miss a row completely – can happen when a row on a high page number that hasn’t been read yet is updated and moved to a lower page number that has already been read.
Read Committed
Two varieties of read committed isolation: optimistic and pessimistic (default).
Ensures that a read never reads data that another application hasn’t committed.
If another transaction is updating data and has exclusive locks on data, your transaction will have to wait for the locks to be released. Your transaction must put share locks on data that are visited, which means that data might be unavailable for others to use. A share lock doesn’t prevent others from reading but prevents them from updating.
Read committed (snapshot) ensures that an operation never reads uncommitted data, but not by forcing other processes to wait. SQL Server generates a version of the changed row with its previous committed values. Data being changed is still locked but other processes can see the previous versions of the data as it was before the update operation began.
Repeatable Read
This is a Pessimistic isolation level.
Ensures that if a transaction revisits data or a query is reissued the data doesn’t change.
That is, issuing the same query twice within a transaction cannot pickup any changes to data values made by another user’s transaction because no changes can be made by other transactions. However, this does allow phantom rows to appear.
Preventing non-repeatable read is a desirable safeguard but cost is that all shared locks in a transaction must be held until the completion of the transaction.
Snapshot
Snapshot Isolation (SI) is an optimistic isolation level.
Allows for processes to read older versions of committed data if the current version is locked.
Difference between snapshot and read committed has to do with how old the older versions have to be.
It’s possible to have two transactions executing simultaneously that give us a result that is not possible in any serial execution.
Serializable
This is the strongest of the pessimistic isolation level.
Adds to repeatable read isolation level by ensuring that if a query is reissued rows were not added in the interim, i.e, phantoms do not appear.
Preventing phantoms is another desirable safeguard, but cost of this extra safeguard is similar to that of repeatable read – all shared locks in a transaction must be held until the transaction completes.
In addition serializable isolation level requires that you lock data that has been read but also data that doesn’t exist. Ex: if a SELECT returned no rows, you want it to return no. rows when the query is reissued. This is implemented in SQL Server by a special kind of lock called the key-range lock.
Key-range locks require that there be an index on the column that defines the range of values. If there is no index on the column, serializable isolation requires a table lock.
Gets its name from the fact that running multiple serializable transactions at the same time is equivalent of running them one at a time.
Now that we understand the basics of what concurrency is, the subsequent blog posts will try to bring out the basics around locking, blocking, deadlocks because they are the fundamental blocks that make concurrency possible.
Now if you are with me – let us continue learning for SQL Server Locking Basics.
Reference: Pinal Dave (http://blog.sqlauthority.com)
Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Concurrency