The Big Grid

HSQL and Memory Database

The quest continues on HSQL.

Creating a new database was a snap. Simply obtained a connection by passing in the jdbc string appending the filename at the end. The database files was created related to the "working directory". (ie, jdbc:hsqldb:file:db)

I wrote the database table schema in SQL. The SQL syntax was pretty standard. It was also nice that they support IDENTITY for a field for primary on sequence.

The doc explains clearly how to use its SQL tools interactively or non-inteactively. Tables were created without any problem. Restarting the non-inteactive sqltool still saw the tables I created.

However, when it came to the next steps: making samples, the behaviour was unexpected. The row that inserted disappeared after restarted. The table was created as "MEMORY" type by default. After reading the doc, it should be fine. The whole data set was kept in memory, but unlike TEMP type, the data was persisted and reconstructed when the database is restarted. However, launching the sqltool again and I find no row with select all. Twisting the table to CACHED or TEXT type didn't help. After a couple of hours, I still didn't get further.

Ok, tight schedule, the next inline would be Derby from Apache.

Tag: concurrency, database, transaction

HSQL and Read Uncommitted

As a database person, the second step is to design database schema. I am going to push much to the database. The JSP will be a thin presentation layer to present different views of the database.

The initial deployment will not require high volume. But, I expect it will in the future. After a brief sketch of ER diagram, I am start picking a test database.

I came across HSQL, the Java embedded database a few times when I was working on Castor JDO. So, it was my first choice. The User Guide is very good. Clean, direct, and straightforward without any marketing talk. Small download and its claims good performance.

As I was reading it, I found myself a little bit disappointed to
the transaction support. It supports only at READ_UNCOMMITTED level, the level 0, which means if user want to guarantee data integrity it has to access data in more restricted way or to do quite a lot of works themselves.

Whether it matters depends on application. To ensure data integrity with weak isolation from the database can be quite challenged, as the complexity of the database schema grows. However, stronger isolation will incur performance disadvantage and decrease concurrencies. It might not be feasible for high-volume app.

I would start with stronger isolation first and have the database do more work for me. I would also want the flexibility to turn by changing the isolation level. So, I know I will need to switch before my web app is deployed to the real world.

Tag: database, transaction

Web Project

After years of working on very backend components (O/R mapper, clustering framework), I recently started a web project of my own.

Of course, I picked the Java camp. JSP Servlet, Tomcat, Eclipse WTP is my technologies, container, and developer tool. I expect to incorporate other J2EE technologies later. But, it is always good to start simple and only with what you need.

I first coded up some login logic. For my application, it is important to roll my own. I am going to reuse authentication done by other, such as Microsoft Passport, AOL Screen name, or Yahoo when it is available.

The main verdict with login logic is that I need redirection a lot. Learned the different between a few HTTP response 30x. I hit a bug in IE that if it is redirected by HTTP 302 (moved temporarily), and then is redirected back (with user input) by HTTP 301 (moved permanently), the URL in the “Address” is not updated correctly and is still pointing to the temporary address.

Yahoo’s own login doesn’t seem to suffer the problem. I need to learn its workaround. :-)

Tag: identity
, http forward

Virtualized Linux/BSD distribution with Java and Tomcat

I has been in search for Linux/BSD distribution with Java and Tomcat support that is suitable to be used for virtualization. I spent a few weekends (over the last few months), but I haven't found any that suits the task.

Both installation and runtime memory footprint should be very small, such that as many as instances can be fit into the same machine, and the VM instance can be activate or passivate quickly.

1/ The kernel should boot really fast, (less than a min with P2-500Mhz level machine)
2/ It boots as little drivers as possible,
3/ A firewall is optional but welcome,
4/ A few file transfer protocol and SSH is essential,
5/ support popular SAN, NAS clients,
5/ To certify for full Java support, it must also able to support Swing (so, it requires XWin of some sort). Ideally, the distribution only has Xwin client, but not the Server part to save space,
6/ Total installation 100MB with JDK 1.5.x and Tomcat 5.5.x would be ideal,
7/ Ant, CVS client, SVN client support (to obtain source or binary for app deployment)
8/ Kernel working set footprint of 32MB or less,
9/ Out of the box Java support, Tomcat, (and even an open source JMS), Type 4 JDBC drivers of popular database.

BEA jRocket's BareMetal sounds very interesting on that respect. Only very little information was released. It is hard to guess its availability. I think I better put my hope to Linux/BSD distribution with Java, at least for now.

I think such distribution, if available, we be an enabling technology that can change the game: it would make Java much more popular compare with PHP, Ruby etc. Java has been focus on scaling big, and it has been successful on it. But, it is losing ground as the development platform for weekend projects. It really shouldn’t be. Most projects started small. Simple projects are the ground for the bigger. Java Hosting is always limited and a few years behind in term of availability, feature and price. The hosting offerings are even worse than .Net which only becomes suitable for web programming a few years after Java/Servlet getting popular.

I don’t think the demand of Java hosting is low to begin with. The uncompetitive hosting options reflect that Java system is hard to be maintained cheaply. Indeed, individual, corporation, and system administrators face the same problem. Because of it, the Java hosting market is never getting mature.

I believe when such Java distribution is available, with VMWare (Microsoft Virtual Server, or XenSource) the game will change in favor for Java.

I tried to resist it, and I often prefer writing code than doing integration. But, maybe it is the time to roll my own Linux/BSD distribution. I am doing reading on T2 Project and Debian Developers' Corner.

Tag: virtualization

Visa Gift Card

I saw a banner ad on a news site for “Visa Gift Card” a few days ago. O yeah. It was a neat idea. Why didn't they come up with this idea earlier?

The answer probably goes back to fifteen years ago. At the time, majority of merchants use physical devices to make an imprint of client’s credit cards to charge. The process didn’t involve electronics at all. It was probably next few days that the physical imprint was sent to the bank for deposit and verification. A merchant might call-in to verify a card, but they can not always do it. I had seen a cashier actually checked the client card number against a thick book with thousands of counterfeit number to protect themselves against fraud. Even just a few years ago, Credit Card still made you pay a big penalty if you spent beyond your limit.

They certainly can’t make the purchaser of a Visa gift card to pay penalty for over the limit, or no one will buy it. The arrival of Visa gift card signifies that physical imprint device was totally obsolete. Chick-Cuck.

Links on Clustering Design Docs

Oracle Cluster File System

Oracle released a cluster file system implementation to Linux as an open source project since Late 2003. Its design document unveils many typical clustering concerns and solutions. Oracle Cluster File System Design

Among all, I found the file system header design most interesting:

The OCFS assume a shared storage architecture to host database in the same cluster. The file header is the data structure for nodes to get access to which chuck of data, to check isAlive check, to do voting between node etc.

MySQL Cluster Architecture

MySQL has a MySQL Cluster Architecture Overview document on its site. (requires your email). The interesting separation of Data nodes and Server nodes. Each machine was assumed to have its own storage. The Data node keeps as much information in memory as possible, and commuicate with each other via network commuication. It looks appropiate to be tailored to be Database is the app server model.

Tag: clustering, database

Database is the App Server?

A few weeks ago, in ISC2005 (Supercomputer Conference), Bill Gates mentioned his vision of Grid computing. According news.com, his vision was to bring the computation closer to the data. The article didn’t mention how and why. Google didn’t yield much else on Gates’s speech.

Even though I didn’t know more about Bill’s version of data grid, I tended to agree.

Sun’s Grid
----------
For example, Sun’s current Utility offering ($1/cpu day) are rather limiting. It is only suitable for low I/O and computation intensive application. It rules out most application that requires a database, which most enterprise application and researches analysis requires it. There was no option to rent long-term storage such as SAN that are local to the grid. Does the fact that the machines are rented means the software must be reinstalled every time? What if I want to form a cluster with a lot of machines? What speed can I expect from the inter-machine connection? Will they share the same LAN (switch and router)? Are the network shared with other computers that other people rented? In fact, the white paper I read a few weeks ago on Sun’s site suggested something about secure connection to and from your company and didn’t even mention clustering. It worried me.

It is true that owning and maintaining machine are expensive and a large capital investment. However, the Sun’s value-added is limited the physical hardware and lower level OS leasing and maintenance. It is hardly a big part of the TCO. The simplicity view of computation power, the remote administration limitation (bandwidth for example), and the temporary nature of renting sounds like adding a lot to the system maintained cost. Sun and Jonathan simply needs to come up with a more convincing story.

EGA
---
In constrast to Sun's current offering, Enterprise Grid Alliance's "Reference Model"
capture better the complexity of what are required to make Grid a reality for enterprise. (to be fair, Sun is also onboard. The current offering is bad on itself and doesn't necessary capture Sun vision to the future.)

Data Grid
---------
Now, back to Gates’ vision of data grid. Over the weekends, I read a few articles from Jim Gray, the authoritative of Transaction Processing who now working for Microsoft Research. It unveils what had gone into Gate’s mind.

Distributed Computing Economics by Jim Gray.
And,
A Call to Arms -- Avalanche of Information by Jim Gray and Mark Compton.

Active Database
---------------
My hobby to implement a distribute locks and cache also makes me aware of how hard it is to ensure data integrity all the way up to phantom level. Together with Jim’s articles, my vision of future high-volume enterprise computing calls for modifications. Maybe database will take a much more active roles: applications live inside a database, instead of split to different tiers. It is a dangerous thought.

I am also surprise that it is Jim Gray from Microsoft who has this vision, instead of marketing from Oracle. Oracle has been an active advocate of database trigger; it puts JVM into the database since the early days, and added CLI into it recently. But, if Jim Gray represents the unison vision of Microsoft vision, it is more database-centric than anyone else.

Tag: clustering, database, grid, virtualization

IS, IX and SIX

Deadlock and IS, IX and SIX
---------------------------
Occasionally, I hit deadlock when developing a database application. Entering the Oracle error code, a page about Oracle lock mode come up: IS, IX and SIX, S, X. Most people recognized S as Share, X as eXclusive. It maps well to Read or Write lock.

LockSet
-------
On other occasions, I developed in memory lock set. Read/write, and even (update lock) are really easy, and I used it as a starting block. On the other hand, maintaining a set is more difficult to do efficiently. The main difficulties lie in obtaining the specified read/write lock struct from the lock set. If the specified lock doesn’t exist, a new struct representing an individual lock needs to add to the lock set. Two threads try to acquire the same lock must resolve to the same struct instance. So, the obtaining of a lock struct from the lock set must be guarded by a semaphore S(t). After a thread obtains the lock from the list, it then tries to acquire the lock. If the thread is acquiring the lock in a mode that conflicts with what has granted to another thread, it waits on the lock. The acquiring is protected by another semaphore S(r) to allow concurrency. In this way, acquiring different the lock will wait on different semaphore. Similarly, when the lock is finished, it go into S(t)again to see if the lock can be removed from the list. Based on this thinking, I developed this algorithm (of course, the actual code look different):

synchornized(lockSet) {
Lock lock = lockSet.get(id);
if (lock==null) {
lock = new Lock();
lockSet.add(lock);
lock.incrementVisitor();
}
}
synchronized(lock) {
lock.acquire(id, mode);
}
synchronized(lockSet) {
lock.decrementVisitor();
boolean free = false;
if (lock.hasNoVisitor())
synchronized(lock) {
free = lock.isFree();
}
}
lockSet.remove(lock);
}

I believe this is working code. However, it takes 4 synchronized blocks to achieve it. This is pretty inefficient: there must be a better way.

Tag: database, transaction, cluster cache, distributed cache

Microsoft always thought the hardware should be free. Intel always thought the software should be free...

Economy is interesting. Consume seeks cheaper and better products. Producer seeks bigger market.

Barney Pell made a “meeting minutes” on his blog. Sam Jadallah made an interesting quote: “I talked to CEO of GM, trying to get media inside the car. He said: 'at $250/month of ads or services, I can actually give you a car'. That's believable! These devices, phones, cars, are just vehicles for delivering advertising.”

Car manufactures and dealers have been the big advertisers. It is virtually showing for a third of TV ads at peak hours. Now if GM's CEO “thinks” it is better off to compete with Google on the ads dollars, who is going to buy the ads? Maybe Toyota? Or, maybe Google?

Tag: software

Future (On Cluster Part VIII of VII :-)

The future of high-volume lies in flexibility and integration. Grid computing and virtualization will come into plays.

Utility computing model today is very limited. While Sun is trying very hard to convince people that computer power can be purchase as easy as electricity, it is based on a much simplified model. The currently model only apply to computing intensive tasks, such as simulation and data analysis.

It is not a situatable model for most enterprise applications, which generate very high network IO. It is critical for these applications to stay close to the data. The amount of data is also huge that cannot be moved easily. Those applications might require on large array of other applications or supporting system. Security concerns also makes it more fesible for those applications and supporting system to be put behind a same set of corporate firewall for access controls. Once network and disk IO performance, backup, large amount of configuration, seucrity, and data to be sync between a company and computer power provider, it is no longer as simply as measuring megawatt of electric power. With these constraints, leasing model simply doesn’t make sense.

Grid computing and virtualization that remains inside corporate firewall will result in very significant saving, without of cost and disadvantage utility model. Virtualization like VMWare provide a lot of flexibility. Each service can be setup in its own virtual machine. Depending on demand, they can be moved into or out of a single physical machine, suspended or activated. A three server tiers of a four tier system (Web Server, Application (business logic) Server, and Database Server) can begins in one physical machine with 3 VM instance. In fact, multiple four-tier system serving different departments can begin in one physical machine. As the traffic increase, the heaviest loaded tier can be moved out to live in its own machine. Service can also be occupying more machines in peak hours. The application can shrink to fewer machines during off-peak, leaving more machines in the Grid pool for offline reporting and data analysis. (VMWare has the functionality of moving a VM instance from one machine to another in real time -- less than 5 second downtime, with heavy system like Microsoft SQL Server running and taking more than 50% of CPU time constantly. The machine in which the VM move to can also has different configuration. It is wickedly cool. Check them out in your next trade show.)

Today, Oracle already claims to support Grid computing in the database tier (previously known and Oracle Parallel Server). Machine can easily added to an existing cluster to support higher loads.

In the future, application server will support similar Grid computing capacity. Such application server will able to spawn new instances of the application server into another other physical machine upon demand. For it to happens, a scalable distributed cache and lock will be integrated in the application server. Application is needed to be developed on a slightly different programming model, in which it accesses database indirectly (thru ORM layer that is integrated with the distributed cache and lock), locks resource thru the provided lock framework, and subscribes to incoming message endpoint indirectly via the service provided by the application server. Today component model already provide most of the abstraction, the change to the programming model is minor. The model does not incur performance penalty for single node operation nor limit programming flexibility. In single node operation, remote functionality is turned off. Small synchronization overhead will incur only when the application server is resided into multiple machine. The Web tier or Web Service tier can pick a random instance of the application server among the pool to make request. Each application server instance has the ability fulfill the request or forward by considering the state of the cache and locks state, such that the pool work in completely parallel manner.

Tag: clustering, database, grid, virtualization