Friday, June 29, 2012

Database Transactions

A recent topic on the mailing lists asked about when it makes sense to use Commit with your database code.  The answer is "it depends'.  To go into more detail, you need to know a bit about transactions.

A transaction is a collection of changes to the database.  It can be a single change, such as an INSERT statement, or it can be many changes such as a combination of INSERT and UPDATE statements.

When changes are made within a transaction, they are not permanent (and not visible to others looking at the database) until you commit them.  In order to use transactions effectively, you have to understand how they work with the database engine you are using.  Some databases start a transaction for you automatically.  And some will also automatically commit for you after each change so you don't have to actually commit manually.

Generally, you do not want to commit after every database change as seen by the following Banking example.


Banking Example

You want to transfer $10 from a savings account to a checking account.

Here are the starting amounts:

Savings: $100
Checking: $50

If you remove $10 from savings and commit, you have a database that looks like this:

Savings: $90
Checking: $50

There is now $10 in limbo. The next step is to add the $10 to checking:

Savings: $90
Checking: $60

Having that $10 in limbo is a big gamble.  If your app crashes before it can do the last step, you end up with the $10 being lost and your database integrity compromised.

Transactions prevent this problem.  Look at the example again.

Here are the starting amounts:

Savings: $100
Checking: $50

You remove $10 from savings and you add $10 to checking and then you commit.  The database now looks like this:

Savings: $90
Checking: $60

The data was never permanently in the database in its intermediate form.  If something bad happened before you were able to commit then the database would have remained in its original state with $100 in savings and $50 in checking.

Transactions give you database integrity.  But there is another good reason to use them: performance.


Performance

If you commit after every database change, you force your database to do a lot of work behind the scenes to make the data permanently available.  This is not a big problem when dealing with small amounts of data, but it can really add up when dealing with lots of data.

For example, if you are importing thousands of rows of data into a specific table, committing after each row could cause your import to take several minutes because of all the overhead.  Switching to a transaction that only commits at the end (or even every 1000 rows) could result in a tremendous improvement.  I've seen times drop from several minutes down to several seconds when using a transaction in this manner.

Transactions give you better performance.


Using Transactions

So now you know that transactions are often the way to go, but how do you use them? Unfortunately, that varies depending on the database engine.

With SQLite (RealSQLDatabase), a transaction is started for you automatically when you issue your first SQL command that changes the database.  You can also explicitly specify a transaction using the "BEGIN TRANSACTION" command.  Other database engines have similar commands.

When you are finished with the transaction and want to make the changes permanent, you send the "COMMIT" command.  Or if you need to cancel the transaction, you send the "ROLLBACK" command.*

Realbasic code might look like this:

// Assuming there is a property called db that is 
// connected to a SQLite (RealSQLDatabase) database.

db.SQLExecute("BEGIN TRANSACTION")
db.SQLExecute("UPDATE Savings SET Type = 'Withdrawal', Amount = -10 WHERE AccountNum = '123456';")

If db.Error Then
  MsgBox(db.ErrorMessage)
  db.Rollback
  Return
End If

db.SQLExecute("UPDATE Checking SET Type = 'Deposit', Amount = 10 WHERE AccountNum = '123456';")

If db.Error Then
  MsgBox(db.ErrorMessage)
  db.Rollback
  Return
End If

db.Commit

----
* If you have AutoCommit = True then SQLite issues a commit after each command that changes the database.  As discussed above, this is usually bad for both data integrity and performance.  Don't do it.

2 comments:

olivier vidal said...

Hi Paul,

Thank you.

To never forget, I use a transaction for ALL my treatments database, EVEN if they do not change database. Is this a problem?

For example, with SQLITE/RealSQLdatabase, I am showing a webpage that contains data:

- Begin transaction
- Select ...
- Select ....

- if db.error then
commit
else
rollback
end

there is no advantage to be here a rollback, but is it really a problem?

Pat Delaney said...

Nice article Paul - thanks.

One point I picked up on...
"When changes are made within a transaction, they are not permanent (and not visible to others looking at the database) until you commit them."

In MS SQL that depends on the Transaction Isolation Level set - these determine how concurrent transactions are handled.

MS SQL supports 5 isolation levels
1) Read Uncommitted
2) Read Committed
3) Repeatable Read
4) Snapshot
5) Serializable

It is possible to read uncommitted data within a transaction depending on the Transaction Isolation Level set. MySQL also uses Isolation Levels...

I heartily recommend people read the documentation on Isolation Levels if their DB platform supports them. You can get some unexpected results if you 'assume' that you can only read committed data.

I only mention this for clarity - not criticism.

Cheers
Pat Delaney
www.PandaTechServices.co.uk
Pat@PandaTechServices.co.uk