DataSet to Database Persistance, Examined
If you already know allot about it, go on an skip this blog. I thought I would cover this topic because I think it is not fully understood by allot of developers, and it is an essential step on the road to developing a full-fledged Data Access Layer (can you guess where this series of posts is headed?).
Negative Indexes
One of the first steps to writing a clean DAL starts with the DataSets themselves. It is a good idea to set all your auto-increment columns to IncrementSeed=-1, IncrementStep=-1. This way, as you add rows to your tables, their PK's will count: -1, -2, -3. This prevents the rows you are adding in your DataSet from conflicting with the Primary Keys of any new rows added back on the Database server.
Concurrency Checks
The SqlDataAdapter class has a wonderful feature, which is enabled by default, where it examines the number of affected rows reported by SQL. If the number of rows actually updated (always zero or one), doesn't match the number the DataAdapter was expecting (always one), it throws an exception! This should prevent unexpected results when you update query fails. You can also use it to deliberately throw a ConcurrencyException if the data has changed by adding a where clause to your update query:
set @NewDate=getdate()
update Employees set
FirstName=@FirstName,
LastName=@LastName,
LastUpdatedDate=@NewDate
where EmployeeId=@EmployeeId
and LastUpdateDate=@LastUpdatedDate
In this case, if the Employee.LastUpdateDate in the DB doesn't equal your DataRow.LastUpdateDate, then the query will return zero rows, and the DataAdapter will throw an exception.
Transactions
It might be obvious by now, but since the DataAdapter might throw a ConcurrencyException half-way through your update process, you may want to wrap the process in a client-side transaction. That way if any of your updates (or inserts or deletes for that matter) fail, then everything will be rolled back. This is pretty simple in ADO.NET 2.0: Just create a new SqlTransaction object, and pass it to all of your SqlCommands.
Output Parameters
You may have noticed in the above example, that we are updating the DB with the @NewDate parameter, but how might we get that back into our DataSet? If we don't, then our next update will fail, but surely we don't want to manually keep a list of all updated rows, and select() them all back from the DB? Well, the DataAdapter makes this easy for us as well. When you add your SqlParameter object to the SqlCommand, you can set it update as an "output" parameter (or even InputOutput). Then you can define the parameter as "output" in your stored procedure, and whatever value it has will be inserted back into your DataRow when the sproc completes. (Same thing goes for non-sproc commands). This is especially handy on inserts. For your insert queries you can simply append: "set @OutputId = @@Identity", and your -12345 PK in the DataSet will be automatically updated to 3725 or whatever the auto-increment in the DB happens to be set to. Pretty cool huh? Now you see why we use negative keys on the client-side.
UpdateAction = Cascade
We've reviewed some pretty cool stuff, but this last bit is killer. Going back to one more DataSet based thing, on all your relationships, you'll want to set UpdateRule = Cascade. The reason for this, is that when your Employee.EmployeeId=-1 is inserted into the DB, and comes back as an output parameter as Employee.EmployeeId=3725, the UpdateRule=Cascade will go around to all related tables, and change the FK from -1 to 3725 on those as well. So you can insert an Employee into your DataSet, then insert a bunch of EmployeeRole records, then persist them to the DB all at the same time, and when you persist is finished, they will have all gone from fairy-land-style negative keys to real-world values auto-magically! And all your DataSet constraints will remain intact! How cool is that?
DataRowVersion
Ok, although maybe not as cool as the last one, the SqlParameter.DataRowVersion is sometimes handy. Lets say you don't have a handy Employee.LastUpdatedDate field in your table to check for concurrency, but you still want to make sure that no one has updated the record in the DB while you were looking at it. One interesting feature of DataRows is that they track not only what value they have now, but also what value they originally had when you loaded them from the DB. You can create SqlParameters with DataRowVersion=Original, and pass them into your sproc in the where clause. You will be checking many more fields than is probably necessary, but if any of them don't match, then you can still get the same optimistic concurrency as with UpdatedDate (or Timestamp), but without changing your DB schema.
Persistence Order
I might be going a little too far here, but for the sake of completeness, I will mention it. With typed DataSets, you don't even need to worry about persistence order (i.e. insert Employees first, then insert EmployeeRoles). Since all the relations should have been brought over from the database into your DataSet by the designer, you should have all the info you need to determine parent/child relations, and insert/update/delete in the correct order: Insert parent rows first, update parent rows second (* update, see below), then delete child rows last. As easy as this sounds, there is actually a good bit of work to it, which I won't get into here. Instead I will focus on both loading and persistence in my next post, as well as post sample code for a Data Acess Layer that should make both a piece of cake. What's that? You don't want to use proprietary code in your application's DAL? Well if you can sit tight for a little while, ADO.NET 3.5 should include a new feature where it can automatically loop through an entire DataSet persisting rows in the correct order!
Sorry, no code this time :-(
Until next time,
- Brent
Update 6/13/2007:
In the article above I said to "update parent rows second". This was how I had my code working until I ran into a situation where two rows were each other's parents. This threw everything into an infinite loop. If updating PrimaryKeys were possible in DataSets, then there would be problems, but since the ADO guys had the foresight not to allow this, modified rows should be able to be persisted in any order (and should be to avoid infinite loops). More correctly worded, the aforementioned sentence should omit the word "parent" to read: "update modified rows second".