Virtualized Linux/BSD distribution with Java and Tomcat

I has been in search for Linux/BSD distribution with Java and Tomcat support that is suitable to be used for virtualization. I spent a few weekends (over the last few months), but I haven't found any that suits the task.

Both installation and runtime memory footprint should be very small, such that as many as instances can be fit into the same machine, and the VM instance can be activate or passivate quickly.

1/ The kernel should boot really fast, (less than a min with P2-500Mhz level machine)
2/ It boots as little drivers as possible,
3/ A firewall is optional but welcome,
4/ A few file transfer protocol and SSH is essential,
5/ support popular SAN, NAS clients,
5/ To certify for full Java support, it must also able to support Swing (so, it requires XWin of some sort). Ideally, the distribution only has Xwin client, but not the Server part to save space,
6/ Total installation 100MB with JDK 1.5.x and Tomcat 5.5.x would be ideal,
7/ Ant, CVS client, SVN client support (to obtain source or binary for app deployment)
8/ Kernel working set footprint of 32MB or less,
9/ Out of the box Java support, Tomcat, (and even an open source JMS), Type 4 JDBC drivers of popular database.

BEA jRocket's BareMetal sounds very interesting on that respect. Only very little information was released. It is hard to guess its availability. I think I better put my hope to Linux/BSD distribution with Java, at least for now.

I think such distribution, if available, we be an enabling technology that can change the game: it would make Java much more popular compare with PHP, Ruby etc. Java has been focus on scaling big, and it has been successful on it. But, it is losing ground as the development platform for weekend projects. It really shouldn’t be. Most projects started small. Simple projects are the ground for the bigger. Java Hosting is always limited and a few years behind in term of availability, feature and price. The hosting offerings are even worse than .Net which only becomes suitable for web programming a few years after Java/Servlet getting popular.

I don’t think the demand of Java hosting is low to begin with. The uncompetitive hosting options reflect that Java system is hard to be maintained cheaply. Indeed, individual, corporation, and system administrators face the same problem. Because of it, the Java hosting market is never getting mature.

I believe when such Java distribution is available, with VMWare (Microsoft Virtual Server, or XenSource) the game will change in favor for Java.

I tried to resist it, and I often prefer writing code than doing integration. But, maybe it is the time to roll my own Linux/BSD distribution. I am doing reading on T2 Project and Debian Developers' Corner.

Tag:

Visa Gift Card

I saw a banner ad on a news site for “Visa Gift Card” a few days ago. O yeah. It was a neat idea. Why didn't they come up with this idea earlier?

The answer probably goes back to fifteen years ago. At the time, majority of merchants use physical devices to make an imprint of client’s credit cards to charge. The process didn’t involve electronics at all. It was probably next few days that the physical imprint was sent to the bank for deposit and verification. A merchant might call-in to verify a card, but they can not always do it. I had seen a cashier actually checked the client card number against a thick book with thousands of counterfeit number to protect themselves against fraud. Even just a few years ago, Credit Card still made you pay a big penalty if you spent beyond your limit.

They certainly can’t make the purchaser of a Visa gift card to pay penalty for over the limit, or no one will buy it. The arrival of Visa gift card signifies that physical imprint device was totally obsolete. Chick-Cuck.

Links on Clustering Design Docs


Oracle Cluster File System


Oracle released a cluster file system implementation to Linux as an open source project since Late 2003. Its design document unveils many typical clustering concerns and solutions. Oracle Cluster File System Design

Among all, I found the file system header design most interesting:

Media_httpphotos1blog_ghfma

The OCFS assume a shared storage architecture to host database in the same cluster. The file header is the data structure for nodes to get access to which chuck of data, to check isAlive check, to do voting between node etc.

 


MySQL Cluster Architecture


MySQL has a MySQL Cluster Architecture Overview document on its site. (requires your email). The interesting separation of Data nodes and Server nodes. Each machine was assumed to have its own storage. The Data node keeps as much information in memory as possible, and commuicate with each other via network commuication. It looks appropiate to be tailored to be Database is the app server model.

Media_httpphotos1blog_gibbf

 

Tag: ,

Database is the App Server?

A few weeks ago, in ISC2005 (Supercomputer Conference), Bill Gates mentioned his vision of Grid computing. According news.com, his vision was to bring the computation closer to the data. The article didn’t mention how and why. Google didn’t yield much else on Gates’s speech.

Even though I didn’t know more about Bill’s version of data grid, I tended to agree.

Sun’s Grid
----------
For example, Sun’s current Utility offering ($1/cpu day) are rather limiting. It is only suitable for low I/O and computation intensive application. It rules out most application that requires a database, which most enterprise application and researches analysis requires it. There was no option to rent long-term storage such as SAN that are local to the grid. Does the fact that the machines are rented means the software must be reinstalled every time? What if I want to form a cluster with a lot of machines? What speed can I expect from the inter-machine connection? Will they share the same LAN (switch and router)? Are the network shared with other computers that other people rented? In fact, the white paper I read a few weeks ago on Sun’s site suggested something about secure connection to and from your company and didn’t even mention clustering. It worried me.

It is true that owning and maintaining machine are expensive and a large capital investment. However, the Sun’s value-added is limited the physical hardware and lower level OS leasing and maintenance. It is hardly a big part of the TCO. The simplicity view of computation power, the remote administration limitation (bandwidth for example), and the temporary nature of renting sounds like adding a lot to the system maintained cost. Sun and Jonathan simply needs to come up with a more convincing story.

EGA
---
In constrast to Sun's current offering, Enterprise Grid Alliance's "Reference Model"
capture better the complexity of what are required to make Grid a reality for enterprise. (to be fair, Sun is also onboard. The current offering is bad on itself and doesn't necessary capture Sun vision to the future.)

Data Grid
---------
Now, back to Gates’ vision of data grid. Over the weekends, I read a few articles from Jim Gray, the authoritative of Transaction Processing who now working for Microsoft Research. It unveils what had gone into Gate’s mind.

Distributed Computing Economics by Jim Gray.
And,
A Call to Arms -- Avalanche of Information by Jim Gray and Mark Compton.

Active Database
---------------
My hobby to implement a distribute locks and cache also makes me aware of how hard it is to ensure data integrity all the way up to phantom level. Together with Jim’s articles, my vision of future high-volume enterprise computing calls for modifications. Maybe database will take a much more active roles: applications live inside a database, instead of split to different tiers. It is a dangerous thought.

I am also surprise that it is Jim Gray from Microsoft who has this vision, instead of marketing from Oracle. Oracle has been an active advocate of database trigger; it puts JVM into the database since the early days, and added CLI into it recently. But, if Jim Gray represents the unison vision of Microsoft vision, it is more database-centric than anyone else.

Tag: , , ,

IS, IX and SIX

Deadlock and IS, IX and SIX
---------------------------
Occasionally, I hit deadlock when developing a database application. Entering the Oracle error code, a page about Oracle lock mode come up: IS, IX and SIX, S, X. Most people recognized S as Share, X as eXclusive. It maps well to Read or Write lock.

LockSet
-------
On other occasions, I developed in memory lock set. Read/write, and even (update lock) are really easy, and I used it as a starting block. On the other hand, maintaining a set is more difficult to do efficiently. The main difficulties lie in obtaining the specified read/write lock struct from the lock set. If the specified lock doesn’t exist, a new struct representing an individual lock needs to add to the lock set. Two threads try to acquire the same lock must resolve to the same struct instance. So, the obtaining of a lock struct from the lock set must be guarded by a semaphore S(t). After a thread obtains the lock from the list, it then tries to acquire the lock. If the thread is acquiring the lock in a mode that conflicts with what has granted to another thread, it waits on the lock. The acquiring is protected by another semaphore S(r) to allow concurrency. In this way, acquiring different the lock will wait on different semaphore. Similarly, when the lock is finished, it go into S(t)again to see if the lock can be removed from the list. Based on this thinking, I developed this algorithm (of course, the actual code look different):

synchornized(lockSet) {
Lock lock = lockSet.get(id);
if (lock==null) {
lock = new Lock();
lockSet.add(lock);
lock.incrementVisitor();
}
}
synchronized(lock) {
lock.acquire(id, mode);
}
synchronized(lockSet) {
lock.decrementVisitor();
boolean free = false;
if (lock.hasNoVisitor())
synchronized(lock) {
free = lock.isFree();
}
}
lockSet.remove(lock);
}

I believe this is working code. However, it takes 4 synchronized blocks to achieve it. This is pretty inefficient: there must be a better way.

Tag: , , ,