Relational and jCache Model's differences

Much of Cameron’s presentation also echo my experience (in Cache JDO cache, cluster work in the BPM company, the recent works I do with the distributed cache work, and even reading of Transaction Processing book that I mention a few times).

But, he reminded me an old problem I deal with in Castor JDO. The cache [and lock] was local, but we was trying to respect the isolation such that if there was another machine making incompatible change, data integrity will not be compromised but causing transaction roll back. After years, I now understand the problem better; I know that I didn’t achieve it. To be specific, I didn’t achieve Serialization (or Phantom read) level of isolation.

Let use class registration as an example. A student is allowed to add as many as 18 credits for a quarter. So, we do it in one query, and insert only if the first query return a result that met our rule. First,
<quote>SELECT sum(credits) FROM student_course_table WHERE student=? AND quarter=this</quote>
Now, if the sum returned by the query and credit of the new course is less than 18, we let the new course to be added.

In this case, we either disallow other thread to insert another course, or, we want to cause this transaction to fail.
The solution is pretty hard to implement efficiently (to allow parallelism). Because we read a range of value to get the result, we need to lock more than the just new row to insert, to ensure result is correct. So, we need lock set.

1/ A simple solution will be all read will also hold a share lock for the table and the item. And, if an insert is issued, the lock of the table is upgraded to exclusive lock.

2/ A more efficient implementation for reader to hold IS (intent share) or IX (intent exclusive) on the table.

3/ More efficient yet is to use IS or IX predicate lock (lock on a range).

Cameron didn’t mention about lock set with Coherence. And, I thought the only way to get isolation of right was to use lock set. So, I had a discussion with him.

It turned out the problem spaces are different. Because jCache use get(), put() which dissent it from caring about the inter-dependencies from data. So, we don’t need lock set. The specification is different.

So, does it mean jCache model is easier? Not necessarily. They are difficult in different ways. Cameron explained to me why even lock cannot guarantee to be enough (because of out of order message, absolute time problem). On the other hand, the database has a log (journal) that essential defines the absolute time (or order of events).

However, ORM product designer should aware of the differences in between relation model and jCache model, when they utilize jCache to scale out, especially, if Serialization isolation level is desired. One way is to pick (or let user pick) the right level of granularity. In case of the course registration example, choose student as the lock and relationship as depended objects will work (assume courses are stable). But, in some case, those are difficult problems and require analysis of the trade offs.

Tag: clustering, cluster cache, distributed cache, object relational, orm