The Big Grid

Microsoft Windows' franchise is getting weak on attracking new app

2010-11-29T08:37:00Z

Three screens

Depsite repeatedly vowed to be the best of the 3-screens (computer, tv, mobile), Microsoft keeps being out-innovated in all three. And, developer is moving away.

Windows Media Center

Windows Media Center is selling on the promise of doing typical TV operation with ease and coolness, plus occasional use to browse the web, including Youtube and some peer-to-peer video has been the selling point. I picked a WMC over an XBox to keep Bluray in the same box.

I have been an advocate of "Windows Media Center", but I was considered to move away after the power-supply of my WMC dies. It was because the experiment has not been smooth. Playing a Bluray always requires me to use 3 remotes and almost everytime, a mouse. It is partially because of the 3rd party software. But, then WMC didn't give me a choice, because Microsoft didn't make one.

Vudu.com

A friend showed me VUDU.com app on his Samsung Bluray player. VUDU's movie in HD-X at $5.99 is what I wanted

I watch very few movie at home (less than once a month), so I usually not deterred by a couple $ different on price.

Launching a VUDO demo video there is faster than me launching the Bluray that is already in my Bluray drive. And, I am not forced to watch over 5 min of junk which each Bluray disc force me to play.

Windows Media Center

Now, back home. Windows Media Center doesn't support VUDO.

Non it runs Windows on itself.

It has been a repeated theme that Windows are the last platform for anything cool. Hulu, Netflix streaming, Amazon streaming, and now VUDU.com.

Model Action View

2010-05-30T15:55:00Z

[I am still deciding to call it MAV, or MRAWV (Model Relation Action Widget View). The later is more accurate, but doesn't sounds any better.]

This MAV pattern gives a few advantage over MVC.

The controller-less pattern is more restrictive but allows better encapsulation, composability and reusablity (ECR).

ECR is possible because the patterns greatly simplified the message flow each component responsible for. Coupled with jQuery events binding, each components is self-contained, receives a single set of message and send messages to a single sink type.

Widget
When a widget is initialized, it binds itself to the known model. It also bind itself to any jQuery object events.

From the model, a widget receives well defined CRUD-like events. From the DOM, it receives well defined widget events (click, mouseover, change etc).

It is completely reactive, making it easy to write and predictable.

View
View is defined as a collection of widgets.

Model

This pattern also adopts a strict convention from "Action" and "Model", and from "Model" to listener. It enables common (perhap open source) component to be use with model, especially on quasi-typed language (read: javascript).

Notice that a model can be chained to a upstream model. An upstream model can be either a superset of equal set of the downstream model.

The beauty of the design is that the upstream model can be located remotely. They can be connected with HTTP server over RESTful service.

Action
While "Representational State Transfer" is elegant, having no way to conceptualize a state change sucks big time (for lack some of better words.)

It makes much more sense to call
picture.tag("Expo 2010");
then,
if (picture.tags !== undefined) pictures.tags = []; pictures.tags.push("Expo 2010");

MVA patterns allows both Action and Model to be resides on a remote server from the widgets.

Does it work?
I am able to build a 6000 lines "mobapp 2.0" app without violating the pattern. It is not entirely easy in situation where some components want to talk to multiple model. I worked around it by introducing the concept of Relation, in which Relation is a model holding entries with pairs of ids.

So far so good.

(new Wiki) The Quest to Cutting-Edge Cross-Platform Mobile App

2010-05-28T08:48:00Z

I am adding a new wiki.
http://mobapp.wikispaces.com/

You will able to follow major update of the wiki by following me here with "mobapp" label. (It will be manual update, blog my me. So, expect low noise level.)

Friendfeed and aggregator

2009-08-11T18:53:00Z

Friendfeed accepts Facebook's friend request. link

I think it is the best things happened to Facebook for a long while. I had said it long ago (2 years?) that Friendfeed completes my social network needs.

I was feeling excited when Friendfeed made its first Facebook app. Now, I could aggregate my online identity into a single place for friends who cares. And, I wished to see the deeper side of my friends too (what book they put in their Amazon's wishlist, what do they blog, what do they digg etc.) The critical mass of facebook made it so useful (if Facebook's platform where more friendly to FriendFeed applications)

Technically, aggregation is the way to go for the ultimate network effect. However large you're, you cannot cover everything.

Sure, if everyone plug-in to you, it will be easiest. (Facebook's app, Windows app, iPhone app)

And, some of the major Waterloo in technologies happens when a large company believes that they can makes existing best of breed change for them, and makes it the only option. Examples are WinFS, where they designed it to be useful when everyone changes their file format in order to join WinFS. Oracle's Database Filesystem is another. Same for Cardspace and other identity service.

Google search (and Desktop search) is the most representative a successful aggregation. Instead of requiring how a webpage should look, Google, the Aggregator, invests in the bridging, gopher, and pull in everything. It works better for standard conforming page, but still works for other. Desktop search (from both companies) is much less than the vision of WinFS, but then it works because it doesn't require you to change Microsoft Word's file format for it to be useful.

I predict Google Wave is another major Waterloo. The UX of Google Wave is superb, and the scenario is simply convincing. But instead of aggregate, it requires you to embed the Wave app and dictated Google as the only storage. It would be nice to them and user if they can pull it off, but I doubt it. Unless the figure out a way to include the best of breed to come or already out there, I don't see that they can go very far.

Dreams of an "Engine Company"

2009-02-22T07:37:00Z

Maybe Dreams of Engineers too!?

It is a well-made "commercial", feeling bad calling it so, but I don't know what else to call it. It definitely improves my perspective on Honda. Totally inspirational. I love it.

I think I am failing every day. Since the day I started the ideas on leafsoft.com, there had never been a day when I felt I finished enough. Sometime I lost to distraction; lost to urgent things that are not as important; lost tireless; sometime I just wanted to finish more; I struggle to become more productive, to ignore unimportant things, to find the right balance to allocate time between tools and end product, to stay focus (which is one of the hardest).

I doubt everyday am I smart enough for my goals. I think the differences between stupidity and admirable endurance are hair thin. One is you do exactly the same things and never stop; the other is you do almost exactly the same thing and never stop.

The video gave me a powerful push today. I know I won't finish as much as I wanted today. I know I am not going to settle for less. I tried to ask myself for less, I couldn't.

Thank Honda!

Computer on everyone hands

2009-01-18T07:55:00Z

The 500 Millions App downloaded is amazing. It is even two years since it comes out.

It just reminds me the day that Windows 3.0 come out. All little utilities flourish. There were even CD selling that come with like 300 sharewares. I wish the innovation at Apple continue no matter what Steve's health will be...

A new year reflection

2009-01-14T15:15:00Z

Welcome to 2009. A year's just started! Time to renew the blog habit, and do some new year reflection.

This blog was most active in the beginning of 2006. At that time, I was working for a mid-size software company on their Eclipse IDE product in Fremont Seattle. (I missed the day I walk to work, and the coffee shops around.) While I loved the tools, working in UI wasn't exactly what I wanted to build my career long term at the time.

I liked to drive deep and interested in Transaction, Caching, O/R kind of stuff, which was what I learned from my first paying jobs.

I wanted to continue on that, so I registered http://cacheca.com (it is like my 8th idea) and worked a bit on my own on distributed cache. Read and think a lot on the topic and that was why I have much to write on that.

I joined a large software company, a clustering project, on the manageability team. I was hoping my knowledge on the my previous work will be useful. In the beginning, I was really wishing to join for the engine side of the team. (I am still on manageability but no longer under the stealth project. The org structure makes sense, manageability at scale has much border application than scaled up or scaled out server. Manageability at scale is the manageability problem.) It is only natural for a critical server product to work on scalability, but it was a stealth project so I didn't try blog about work. Sometimes, a lot of the inspiration is coming from work. Without blogging those, I simply blogged much less.

After joining the company, on my own time, I shifted a little bit and develop a hobby project which was related to social network. I thought an open-platform can be game changing even against fierce competitors. I was looking at the online-identity (eg, openid.net) stuffs for that idea. On a thick stack of loose-leaf papers I drawn the idea, algorithm, and page flow, etc. In code, I didn't go too much beyond coding the login authentication logic (separate login server, passing obfuscated token with page redirection etc. It was interesting to understand what all these HTTP 3xx code about.) Well, I have witnesses on open-platform social-network idea. When facebook came out with their own, and I looked at their initial API, I knew they got it. (I only have the idea. I am not even close enough to any result to feel sour about. Result is everything.) Online-Identity actually become less relevant because of it, imo. Shall the killer app at the time uses online-identity, the landscape might look quite different today. Success is path dependent. It is also the network effect.

Busy at times for different reasons, and phew! 3 years have passed.

Looking forward, there are a lot more interesting problems calling (and even screaming) for solutions.

1) Phone hardware (and even OS) is certainly coming to the rip time. Time has comes for a computer on everyone's hand.

2) As more interesting web apps emerge independently, there is also data scatter problems.

3) We are also tearing ourselves by letting too many irrelevant notifications interrupted us for the fear of missing some important one.

4) Alright, this is the last one, I need to throw some food for those curious minds:
I start to think that the last piece of puzzle of Turning Test is not AI (Artificial Intelligence). It is not a problem about intelligence. (Intelligence is like a cocky joke at the right moment that makes everyone laugh.) The last piece of puzzle is something much more predictable and readily extract from the memory we have. Someone can be very human even when he don't do any cocky joke, but share some common memory with you. You will feel closer (and more human), if someone reminds you that you two shared the feeling on an anecdote.

This year, I am working on the solutions on these problem. They are not as complicated to solve as it appears, and they overlap a lot. Those are going to be blog topics this year. Stay tuned!

Google vs. Live Search

2008-06-25T03:53:00Z

With English as my second language, I occasionally use “Internet Search” to check expression that I am about to write down. I aware that sometimes a phrase that I come up might not be the way a native speaker would express it. Sometimes, what I feel a bit odd might be quite regular to him/her.

This time, the phrase is

“Knock on the heart”

http://www.google.com/search?q=knock+on+the+heart

http://search.live.com/results.aspx?q=knock+on+the+heart

I tried it on Google and “Microsoft’s Live Search”.
It demonstrates again Google has superior linguistic analysis on the query (and what is indexed) than Live Search.

With Google, the first link is a video that is an exact match. The other links has a title “Knock outs sweet heart”, “Knock out my heart” and “Knock against my heart”, etc., which *mean* similar thing as my input query. With Live Search, all the links are random with the words “Knock” and “Heart”.

Imagine it is a Tuning Machine test and I ask “show me something that has the expression of knock on the heart”.

Google, as robot wrapped in a human-like skin, replies, I know a video on “You tube” has the exact same title. I know Skechers has a line of shoes called “Knock Outs Sweet Heart”. I know Deidre wrote a blog entry about an Auto Show at Geneva. The title is “K.O. Cars Knock Out My Heart”.

It would be remarkable. It would be almost scary. I would say, “Wow, you’re so knowledgeable!!”

Live Search, as a pretty lady, replies, I know a joke “Knock-Knock Jokes for the young-at-heart.” I also know “Maisie is the heart of Knock Knock.” My reply? Eh? Conversation ended!

on Google Android

2007-11-08T06:27:00Z

I won’t bet on Google Android yet.

Has Google convinced me that he understands the messy aspects of how to build a platform to allow third party development? For a platform to gain momentum, they really need some killer apps with it. The killer apps and platform is a chickens and eggs that need be solved at the same time. I am not saying Android cannot succeed, and Google maybe cooking something right under the cover. But, nothing is shown. Also, observed those who can do an killer app is not on-board. None of Sony, Nokia, RIM, MSFT, Apple are onboard.

Sun tried it with Java. It was a great platform, and Sun really know how to write good API and doc. But, …

Apple iPhone has the most proof so far. iPhone has succeeded as a killer app, and it invertible to become a great platform, even Apple tried to resist to become a platform. iPhone even has killer app using Google Map service. If Andriod gains any momentum, iPhone just need to drop the price.

Windows Mobile always has the old bags for the killing apps: Pocket Word and Excel, and most important deep integration with Outlook (Contacts, Calendar, Corp mail). The ability to stay in the game cannot be questioned.

RIM, Nokia and Sony are still making products that interest some segments of customer without Android. They’re probably going to stay in the game until they make very big mistakes on their own.

Switched to google blog

2006-12-11T16:52:00Z

Labels? Not Tags? :-)

How can I talk to Kim?

2006-10-18T23:26:00Z

Well, to get across the message about “portability”, first I have to suffer the lacking of it.

I was trying to add a link or a trackback to Kim's blog thread on
BBAuth and OpenID move identity forward

First, it wasn't a fault of Cardspace. I sent him a message using the message post page on his site on September 20. The message was not answered and I had no way to tell if it was problem of 2idi.com or a spam filter. (I wish he wasn't trying to ignored me. If I didn't ask the question in the right way, at least I think my idea was pretty original. He got to give me credit for saying something new. I bet.)

Now, his relevant post about BBAuth reminded me to try again. The private way didn’t work. Maybe it should be a blog-to-blog discussion to begin with anyway. He would read user comment on his blog, I said.

Ar, it required another login (not 2idi.com that required to send him a message). Maybe it was better, cus 2idi.com didn't work for me anyway. It was an annoying fact of life of the web without federated identity system.

Now, trying to post a comment, I got this:
https://www.identityblog.com/wp-login.php

Alright, I found no link for creating a new account. Tried with Firefox first. It tried to fetch info for the required plugin, but didn’t suggest me how to get CardSpace plugin with it.

Alright, let try IE then. It didn’t work. Hum, I thought maybe IE 7 would. I downloaded it, gave my trust to a Beta, and restarted my computer. (It was pretty scary indeed. The download page asked me to backup all data I had before I proceed to avoid losing of all my data.) And, going to the site again, it was what I got:
To install Windows CardSpace, install .NET Framework Runtime 3.0.

Another download (I think .Net is a big thing), another restart?, taking another risk of losing all my data?

At least before a new system is widely adopted, all I want to say is that I wish there is an easier way to get a message across.

Tag: identity, cardspace

Questions to Kim Cameron on Identity

2006-10-18T20:30:00Z

Kim,

I appreciate your work on identity and the way you devote it to the public.

Introduction
--------------
I have a few questions (and some scattered ideas) about CardSpace. I have read most of document/demo/example on your site briefly, but other than that I am new to CardSpace.

Problem Space
------------------
I am looking at it because I am investigating on aggregating information for the same user from multiple sites that each use different authentication. (it is a personal project that I have been working on prior to joining the current company. :-)

Fixing Passport
------------------
My first question is the following:
What do you think about fixing Microsoft Passport, instead of introducing CardSpace. Please see my post on my blog:

identity-crisis

The Laws
------------
For the seven “laws” that you defined, many of them can be fixed without the radical from Passport to CardSpace.

For example, “User Control and Consent” and be built, so does “Minimal Disclosure for a Constrained Use”, “Justifiable Parties”, “Pluralism of Operations”.

Adoption of CardSpace
----------------------------
While I see CardSpace is a good solution in theory, I remain doubt about the adoption, even I aware Firefox and Sarifa demo was shown.

Accessibly that I am not willing to give up
----------------------------------------------------
I access my web email on work, home desktop, laptop, cell phone, and friends’ computer. All of them are running on Microsoft platform (including my cell phone), I don’t foresee all of them support CardSpace soon enough. For example, a friend of mine still use Windows 95, and my Windows smart phone is not upgradeable. I don’t think it is convincing for a user to move to a new mechanism to lose accessibly that he has already enjoyed.

Passport-like mechanism is not unique to Microsoft
---------------------------------------------------------------
In fact, other major portal is using similar authentication mechanism (forward to id server, request user/pass, forward back). They’re doing so in a more controlled manner and didn’t cause as much as bad publicity as Microsoft does. For example, Flickr.com use Yahoo id server to authenticate. I am not saying they don’t have security problem of their own. But, authenticating mechanism like Passport is already there and it worth the effort to fix it, instead of scarping it altogether.

Spoofing and Key Trapping
---------------------------------
You mentioned a few time that Spoofing as a major problem. However, the concept of having a USB drive to store my CardSpace cards concerns me much more than spoofing. How can I trust a computer (in internet café for example) not stealing my entire Cardspace cards in my USB drive once I plug it in? If it require a master password to open my Cardspace card, then I need to concerns about key trapping software in a internet cafe.

To me, Key Trapping problem can safely solved by deposable password like those generated by a RSA token. But, Cardspace doesn’t address it problem, which also part of the adoption problem. (of course, RSA token has adoption problem of its own… because of the cost?)

What do you think about adoption?

Tag: identity, cardspace

Flexible Rails (New Book on Ruby Rail and Macromedia Flex)

2006-09-10T12:02:00Z

I am very excited to relay this news from Peter Armstrong:
Flexible Rails Alpha Version Released!

This book is about using Macromedia’s Flex 2.0 and Ruby on Rails 1.1 together. The book presents the technologies as a tutorial. It gives a brief introduction and covers entire Web 2.0 application development: front end (Flex), web tiers (Rail), database and installation. It goes beyond typical tutorial books that you actually got a working and usable application at the end.

Peter is a great friend of mine. He graduated from the UVic a bit earlier than me. He went to work in Bay Area (and back to the Northwest) a bit earlier than me. But, he learned to appreciate Macallan a lot earlier than me. He is an early adopter of technologies and very passionate software engineer.

Excellence job, Peter!

Tag: ruby,
rails,
ruby rails

Reading List -- Sept 06 -- Google Research

2006-09-06T18:10:00Z

Links in my reading list:
http://labs.google.com/papers/gfs.html
http://labs.google.com/papers/bigtable.html

Tag: clustering

Identity Crisis

2006-09-06T16:41:00Z

A couple weeks ago, I spent a couple of days learning about login/identity. Didn’t come around to blog it until today.

Revisits of Microsoft Passport
-------------------------------------
Although Microsoft almost declares Passport Network a defeat, I think it can be made useful with a few twists. I feel that what it requires most is actually not technical related. First, it should improve its transparence to user on how its work. It should require user explicit consent to allow third party site to identify a user id, and allows easy modification of the allow list. It should not assume user has only one identity. It should also get rid of the centralized data store in msn sites.

Otherwise, the login delegation mechanism and the ability to use the same login for multiple sites are worthwhile to keep, at least until something else come along.
The login delegation works like this:
1/ User visits a site that supports login delegation. (let’s call it the action site for now)
2/ The site forward user to the login site (such as Passport)
3/ The login site shows login page to request user for password (if not yet log in)
4/ The login site forward user back to the action site with some information that identify the user as successfully logged in.

If a user already logged to Passport network previously with the same browser, and visits an action site (that supports Passport) site, the site can request Passport network to identify him by forward it to the login site.

User might expect to browse the site anonymously but was being identified. The problem can be fixed by letting user to identify what site should identity him and what site shouldn’t. If user is being forwarded by an action site the first time, user should be shown an agreement and warning. User should be let to choose between “Always Allow”, “Allow for this session only”, “Disallow”. If user chooses Disallow, every time the action site requests the user to be identified, the login site should always forward back as if no user is login to that computer. User should able to modify his/her choice later by going to the login site directly.

It should not assume user to have only one identity. The problem can also be helped with the above modification. When user logged into works, an action site shouldn’t able to identify his work id without his consent. When the same user logged into his hotmail account, if the user approves, the action site might identify him.

New to InfoCard
--------------------
Microsoft takes InfoCard as the next steps of Passport Network.

Folks in the group identified a few critical rules for identity and authentication in general: The LAWS OF IDENTITY. While all seven of them are totally critical, I don’t think they are covering everything.

The omission was portability (or should I say accessibility). Without total portability, users are bounded to two choices. First, not accessing the feature they want where they need it (for example, don’t check email in a public computer, don’t use cell phone to check WAP mail), or use a different mechanism when another feature is supported. The first choice is painful. The second choice… the strength of a chain is its weakest link. Either way, it doesn’t solve the original problem. It either restricts access, or leave user alone when he needs the security most.

It is a chicken and egg problem. The portability limits the adoption. And, the adoption limits new devices to support the mechanism, which eventually limits portability.

The omission is leaking from the laws to InfoCard, and I think it is fatal.

Tag: cardspace, identity

Very interesting conference about Grid Computing

2006-09-06T16:37:00Z

Too bad that I have to miss it this year.

Those "Topics in Grid Management" interests me most. I hope they will make the paper/powerpoint slides/video avaliable later.

Tag: clustering, grid

Economy of Scale vs. Scaling Economy

2006-06-24T16:46:00Z

Sometimes, large scale can bring disadvantage. Let’s starts with one of the largest cluster: Yahoo! Mail.

When Gmail comes out, I then realized I how deep I integrated with Y! Mail.
- Pop mail,
- SMTP,
- web mail,
- spam filter,
- another yahoo account for spam
- disposable email address,
- address book synchronization with outlook,
- email notification to Yahoo Messenger,
- message archive,
- mobile email alert,
- stock alert,
- weather alert,
- custom email address support,
- multiple account support,
- color coded email by account,
- ads free Yahoo Plus,
- WAP Yahoo mail, and
- Support reading multiple mails using browser’s tabs.

Yes, I used all of the listed features on daily basis.

I have a lot of sympathy to Yahoo when people mistaken Gmail was better. No, Gmail is years behind.

When Gmail switch from 1G storage to 2G, it hurts Y! Mail badly. At the time, the number of Yahoo mail users vs. Gmail’s is probably 100:1. (the ration these days maybe closer to 15:1)

To match Gmail average storage for each user (most user don’t use anything close to 2G), Yahoo pays 100 times more. Each additional MB for 300 millions of users is sustainable money for acquisition cost for the hardware and operational cost like electricity.

To scale to 300 millions users, it is much more than adding machines. It takes a lot of tricks and R&D to get it close to liner scaling. Anything less than liner scaling costs exponentially more. It makes or breaks the economy and feasibility of providing the service.

Werner Vogels [Amazon CTO] talked a lot about Dark Art, scaling and its pain.

In the famous “we [google] are a $100 billion company” financial conference, Eric Schmidt [Google CEO] quoted that the know-how, software and infrastructure to scale to massive number of user is one of Google’s key strength that they are expanding their leads on. However, Gmail (so does Yahoo) has periods that the performance is slowing down badly. It drives away users if not just growth.

Then Steve Ballmer [Microsoft CEO] use analogy about “data center” build everywhere in the world like electric stations. Later, Microsoft announces the billions of missing future profit to build the infrastructure to complete.

The dot com races reach the point that idea alone is far from enough. It is also about the ability to make economy sense out of the service providing to massive amount of users. The ability to scale the computing power is playing a big part to determine the success of a business.

Tag: scalability

Kelowna

2006-06-24T16:02:00Z

I took two days off. Adding a weekend and friends, I got a great 4 days trip to Kelowna BC Canada. Okanagan Lakes was amazing, so does the scenic wineries along it. Warm weather, clear sky, mild breeze from the lake: life is good! It was 4 hours from Vancouver. But, the highway was great. Two lanes in each direction for almost the whole trip (will be). If you like Napa, you "must" give Kelowna a chance. It can become one of your favorite too.

Tag: personal

Quest to a more efficient LockSet (volatile field, semaphore, reentrant lock)

2006-05-29T18:52:00Z

While I am staying up late, I recall an article about anti-sleeping pill that I read a couple of weeks ago. Somehow, I tend to have a much clearer mind late at night (I mean after I awake for many hours :-P). It explains why many of my blog was written late at night.

The feeling that I can stay up as late as I want in a Sunday is very good. A long weekend means that I can have two really long late nights for my own. Labor Day wasn't only an extra weekend day in a week, but also doubling my productive time to my project.

And, I spent it on coding a small part of the cluster cache project: the quest to a more efficient LockSet. It actually started while I was driving 100 miles north Friday. I was driving alone and turned off the stereo in my car to think about the problem. After about an hour running thru my 4-steps synchronized block LockSet in my head, I realized the only way to get an more efficient (in term of how much synchronization I need to do) lock set requires the ability to enter a semaphore before leaving the other.

I read the pseudocode of lock from Gray/Reuter book. The sequence looks something like that:
1/ semaphore get on the data structure for the lock set (ie, spinlock S),
2/ find the node represent the lock, or create one if not exists
3/ semaphore get on the node (ie, spinlock N)
4/ semaphore give the lock set (ie, unlock S)
5/ acquire the lock (may block the thread)

It sounds simple, but it cannot be done “efficiently” with Java’s synchronized blocks.

Spinlock (compare and store) is a primitive pillar in concurrent world that it cannot be reduced. Of course, you can simulate a spinlock using Java synchronized block. But, synchronized block is itself built by Spinlock.

Java 1.5 provided a set of new concurrent utilities. I know that it has a lock interface that would allow me to do what the Gray/Reuter lock implementation did. So, I dig deep into it.

After digging deep into the code, I found ReentrantLock is itself pretty expensive. Semaphore is a bit lighter, because it doesn’t maintain a linked list for thread, but it still maintaining more state than I need. But, in the process, I found what I want, the spinlock.

It is exposed to the API thru (AbstractQueuedSynchronizer/AtomicXYZ.compareAndSetState(int, int)). The “spin-unlock” is achieved by AbstractQueuedSynchronizer.setState(). The javadoc didn’t mention the memory model constraints. However, the way Semaphore implemented using those methods implies that compareAndSetState() and setState() acts as a read-barrier and a write-barrier respectively.

I would expect setState() to call a method in the other class that declare native. To my surprise, it simply sets the volatile field, that compareAndSetState() set using a native method.

Why it reqiures setting a volatile field only? I remember volatile as read/write-ordering on that specific field. But, a write-barrier is a different guarantee.

It was because I rarely find a useful case for volatile field. Because the guarantee was on the field only, you cannot use it to guard another data. While I aware JMM changes in Java 5, and followed the mailing list for a few good months, I didn’t pay much attention to volatile field changes.

Brian Goetz explained it very well with his article in developerWorks.

The semantic of volatile field in Java 5 is updated to have memory boundary guarantee.

Now, after this quest to a more efficient lock set, I gained the understanding of not only how to implement it efficiently, but also what was missing in older version of Java, why JMM change is needed, way better understanding of the JMM itself.

It is a good feeling that what I need is already created by someone else before I need it.

Now, I think I know JMM very well, come challenge me with tough questions! :-P

Tag: concurrency

Good load test to expose the vunerabilities of the cache

2006-05-24T14:17:00Z

In response to a user question on load test, I think it worth a blog by itself. Those are excellence questions. :-)

Coherence is a pretty mature product. I would think it should work pretty for the read-only cases.

I can think of 4 areas that the cached system can be choked with:
a/ network-load for cache synchronization,
b/ cpu load for cache management and synchronization,
c/ cpu load for doing deserialization of your data,
d/ and database access

Depends on the way which the application access the data, the system might still choke on the last two before the cache management overhead become a problem.

Database might still be the bottleneck
--------------------------------------
For example, if I have an application need to scale to high number of users who didn’t sharing too much data among them (HR application that most user concerns mostly about his/her own data), I would want to watch the CPU, file I/O, and network utilization of the database as I am adding more machines to cache cluster, especially if it is a single database (or a cluster of database) that all machines connect to. It is good to do a little projection on how many cached machines that the single database can support.

Deserialization
---------------
If I have an application that most machine shares the same set of data, then I would watch for the time that each machine spent on deserialization. If each machines request the same cache, the data will be sent over the wire and being deserialize on each machine. The time spend on deserialize might be significant. I am not sure Coherence’s near-cache is cache as object or serialized form. It would be good to check. Even if the near-cache is kept as object, with moderate changes to data, you might still see quite a bit of deserialization, because the cache will need to re-fetched.

Really large cluster
--------------------
If you have a really large cluster (say 64 machines or up), then you might need to profile the first two as well. The overhead is believed to be small, but the total times spend on the communication is at least in the magnitude of bigO(n^2), where n is the number of machine. Even the overhead is unnoticeable with 4 machines might show up as significant for when you have 64, for example.

Tag: distributed cache, cluster cache, clustering, concurrency

Economy of Scaling

2006-05-23T15:33:00Z

Echo everywhere.

Friendster as a counter example.

Google's finical statement. Name scaling as its core strength.

Amazon Werner's interview. Scaling economically.

Microsoft incentive. Putting data center everywhere. The missing $2 billing.

Yahoo mail. 2GB of storage. 10 times the user.

Tag: Scalability

Questions about Context Switch in VM

2006-05-08T16:34:00Z

Blogging is often about opinions, solutions, and feedback. But, what if I have questions?

While watching the MySQL video, it surprised me when Stewart Smith said the storage note daemon runs on a single thread. They have its own context switching that is more efficient than using the thread from the OS.

Talking about context switching doesn't work best for some situation, I think of another situation: when the OS runs inside VM like VMWare. Even when the primary OS is mainly idle, the guest VM is still not very responsive.

Would it be because we have too many context switches happening in the primary OS, and it makes context switch in the guest OS happens in bad time?

What VM system is doing to help this situation? Will we have a configuration flag for Linux (or other OS) to let the OS context switch differently when it is a guess? (Of course, the guest machine is not supposed to know it is guest, unless you config it as such.)

Tag: virtualization, concurrency

MySQL Cluster

2006-05-08T16:23:00Z

Relay the news from Ramblings. :-)

Tag: clustering

Relational and jCache Model's differences

2006-04-29T19:19:00Z

Much of Cameron’s presentation also echo my experience (in Cache JDO cache, cluster work in the BPM company, the recent works I do with the distributed cache work, and even reading of Transaction Processing book that I mention a few times).

But, he reminded me an old problem I deal with in Castor JDO. The cache [and lock] was local, but we was trying to respect the isolation such that if there was another machine making incompatible change, data integrity will not be compromised but causing transaction roll back. After years, I now understand the problem better; I know that I didn’t achieve it. To be specific, I didn’t achieve Serialization (or Phantom read) level of isolation.

Let use class registration as an example. A student is allowed to add as many as 18 credits for a quarter. So, we do it in one query, and insert only if the first query return a result that met our rule. First,
SELECT sum(credits) FROM student_course_table WHERE student=? AND quarter=this
Now, if the sum returned by the query and credit of the new course is less than 18, we let the new course to be added.

In this case, we either disallow other thread to insert another course, or, we want to cause this transaction to fail.
The solution is pretty hard to implement efficiently (to allow parallelism). Because we read a range of value to get the result, we need to lock more than the just new row to insert, to ensure result is correct. So, we need lock set.

1/ A simple solution will be all read will also hold a share lock for the table and the item. And, if an insert is issued, the lock of the table is upgraded to exclusive lock.

2/ A more efficient implementation for reader to hold IS (intent share) or IX (intent exclusive) on the table.

3/ More efficient yet is to use IS or IX predicate lock (lock on a range).

Cameron didn’t mention about lock set with Coherence. And, I thought the only way to get isolation of right was to use lock set. So, I had a discussion with him.

It turned out the problem spaces are different. Because jCache use get(), put() which dissent it from caring about the inter-dependencies from data. So, we don’t need lock set. The specification is different.

So, does it mean jCache model is easier? Not necessarily. They are difficult in different ways. Cameron explained to me why even lock cannot guarantee to be enough (because of out of order message, absolute time problem). On the other hand, the database has a log (journal) that essential defines the absolute time (or order of events).

However, ORM product designer should aware of the differences in between relation model and jCache model, when they utilize jCache to scale out, especially, if Serialization isolation level is desired. One way is to pick (or let user pick) the right level of granularity. In case of the course registration example, choose student as the lock and relationship as depended objects will work (assume courses are stable). But, in some case, those are difficult problems and require analysis of the trade offs.

Tag: clustering, cluster cache, distributed cache, object relational, orm

Distributed Caching: Essential Lessons

2006-04-29T17:10:00Z

Deadline was looming. My most productive (day)time in a week is often Wednesday and Thursday late afternoon. Fremont’s Peet’s Coffee was giving out free coffee to celebrate the one year anniversary of the store, and I was a bit over- caffeinated. :-P Under the temptation of getting more work done, I had almost forgone the ISTA meeting. It was a talk about distributed cache by Cameron Purdy from Tangosol.

Glad that I were there! Beside he scared me with a poor joke in the beginning, the talk was great. (I can no longer remember the joke)

I remembered his presentation as four parts.

1/ Introduction of himself, the company, the problem space, and the product name (ie, what “coherence cache” means technically).

2/ the evolution (what, how, why) of the distribution cache. (from replication cache, partition, failover cache, local cache, standalone cache server, to write behind cache)

3/ highlights of technical challenge in the product implementation (finite state machine to model the communication; edge case in partition local cache; the gap in load the data and propagate it to cache; no absolute time in distributed system; network constraints: 12 ms latency, out of order delivery; cannot be proved correct, but can't find incorrect; leaving cluster etc.,)

4/ cluster system design guidelines (13 of them, [java] serialization/externalization, identity, define equals, idempotent, etc.)

With the great wealth of experience he had with real-life systems, full knowledge of the product since the beginning, the talk was vastly interesting. (and, no one fall from his/her chairs even the talk ran pretty long :-)

Tag: clustering, cluster cache, distributed cache, object relational, orm

Summary of On Clustering Articles

2006-04-24T18:24:00Z

A summary my previous blog entries on Clustering:

High-volume computing (On Clustering Part I)

Cluster (On Clustering Part II)

Database Driven and Entity Tier (On Clustering III of VII)

Stateful Session (On Clustering Part IV of VII)

Cache (On Clustering Part V of VII)

In Depth look at Data-Driven Cluster (On Clustering Part VI of VII)

Future (On Cluster Part VIII of VII :-)

Tag: clustering, cluster cache, distributed cache, grid

In Depth look at Data-Driven Cluster (On Clustering Part VI of VII)

2006-04-24T18:00:00Z

In Depth look at Data-Driven Cluster
------------------------------------
Let’s focus on the scalability of data-driven applications. The demands for scaling of this kind of applications are increasing, but solutions remains expensive.

To understand why scaling out this kind applications are challenging, let's start with a nominal view of data operation, and categorizes them into two: read and update (including create and remove). A system performance (P) can be represented as the sum of the rate of Read (V) and Update (U):
P = V + U

Ideally, we would like the performance (P[n]) of a system to be linear as the number(n) of machines increases:
nP = n(V + U) -- ideally

However, it is not possible. To ensure data integrity, each update must be propagated to all machines, such that the next relevance read operations will obtain the newest values.

Now, consider a two machines cluster, the performance is theoretically limited to
P[2] = 2P - 2U

It is because, for every data update to one machine, the second machine needs to be updated as well. It is the penalty we need to pay for scaling.

The Equation
------------
Similar, for n machines, the (simplified) performance can be defined as
P[n] = nP - n(n-1)U

For a small U (close to zero), scaling can be very linear. With load-balancer, round-robin DNS, co-locations data replication, we indeed achieve very linear scalability for read-only data in the real world.

However, for a larger U, the performance peak off quickly. The penalty runs at the order of bigO(n^2).

(Note that the equation above is simplified because as more performance is spent on update, we actual have less to do Read as well. The proper equation rebalance the V:U ratio and the actual penalty is slightly lower for larger n, such that the performance will not become negative.)

IP Multicast
------------
Some may tempt to think that using IP Multicast will eliminate big0(n^2) performance hit. It is not true. Even if multicast is used, each machine receiving the update packet from another machine need to update its own version of the data. Consider a cluster of 100 machines, and assume each machine makes 1 update per second. So, in every seconds, each of the 99 machine now send out 1 multicast about its update, and receive and process 99 multicasts (99x99 updates). The big0(n^2) term doesn't go away.

Reduce U
--------
It does, however, reduces U, and in some case significantly. Similarly, there are other fancy techniques to reduce U, but not eliminate the big0(n^2). These techniques include data invalidation (instead of full update on each node), lock tables, voting and centralize updates. All of these techniques are important, but also come with its own trade offs. For example, centralized updates basically force us to rely on a single massive machine (that we want to replace using commodity hardware).

Not Just Data Replication
-------------------------
The equation might appear to apply to data replication setup only. It is not true. Invalidating data often mean we need to read all data from a centralized machine. In this case, we are just pushing the updates and scaling problem into a single machine. It is opposite to the goal of scaling out.

Ad-hoc Cluster Cache
--------------------
Caching might appear to relieve the single machine problem for non-data replication setup. However, the same can not be said for clustering. Applying invalidate technique to non-cluster aware cache (some might call it clustered-cache) works for smaller number of machines and frequences of updates. When either or both value gets large, the hit rate of the cache quickly approaches zero, because there are much more machine trying to invalidate the cache, and it renders the cache empty most of the time. (Of course, a true clustered cache design aware of this problem and try to do better)

Reduce N
--------
If bigO square on n cannot be reduced, the next best is to reduce n. In fact, it is an important consideration in real-life tuning. To reduce the n, we want to spun out any read that is not relevant to the data application. To ensure serializable level of data integrity, we need to keep track of relevant read to avoid read->write dependencies Chapter 7.6 on Gray book, or use exclusive lock on tables. It makes the performance penalty to be very high to be spun out relevance data. Only data that has no dependencies on other can be spun out.

Two the Parallel
----------------
If a parallel set of data can be isolated, we can have run two cluster systems instead of one. For example, if the two set is about as intensive, we will get
2 x P[n/2] = 2nP x 2(n/2)(n/2 -1)U

It approaches does not apply to all data. It takes symmetry out of the system, which increase design and administration complexity and cost.

Partition
---------
Similarly, we might able do data partitioning to reduce n as well. Partitioning can be divided with data-range, hash code, lookup table or other algorithm. This approach also depends on the data schema, and increases administration complexity and cost. For instance, it might require periodical administrative task to load-balance between the partitions. (or, requuire other software)

Isolation
---------
The ideas of division and partition are the same: to exploit parallelism. The concept of exploiting parallelism can go even further: much further. In some case, they can be automated with some restrictions that is ok with most application. I will share some of them in a later post.

Execution
---------
They are no silver bullets, either. But, putting them together helps. I tend to think that good enough solutions are already been discovered. The challenging problem of scaling data driven application is awaiting cost effective implementations. The execution matters!

Tag: clustering, cluster cache, distributed cache, grid

Do not format your harddrive

2006-04-12T16:11:00Z

I have been reading a book about "information theory". This idea come to my mind:

Do not format your harddrive,
because erasing memory always increases entropy,
and increasing entropy is a bad thing.

Tag: entropy

Java / Tomcat / Virtualization

2006-03-15T15:37:00Z

I was talking about Virtualized Linux/BSD distribution with Java and Tomcat

And, I am glad to discover that it is there.

I notice eApps.com before I made the previous blogs on December. However, until recently, they get to the price point that is very interesting: $20 a month.

For the price, I got my own virtual server, and we setup to run as many domain name as I want. To my surprise, it meets almost all my criteria. The HD footprint was about 63M with core linux, jdk and jre, iptables, tomcat, mail, ftp, mysql, ssh and various software. Additional software can be installed with simply checking a checkox. To my surprise, they also provide the XFree86 X11 Libraries. I tested that the JRE is able to utilize it. I was able to run a swing app on the virtual server and display the swing app on my home desktop.

The performance is rather unacceptable for interactive UI applications. I don’t except I need any kind of UI performance from a hosted server anyway. It also consider slow for shell, or ftp operations. However, it seems to work reasonably well for serving webpage.

The management software of eApps is provided by SWsoft. The control panel, HSPComplete is very intuitive. The Virtual server infrastructure, obviously also licensed from SWsoft.

eApps rocks! In term of features, it certainly beat my expectation. Highly recommended!

[Update June 24, 06] I update to the $30 plan, the performance of eApps are getting pretty good. Not sure if it is because of the plan upgrade, or is it because their ongoing performance improvment in general.

Tag: clustering, grid, virtualization

Moved!

2006-03-14T09:59:00Z

I am dedicating this blog for Clustering, Grid Computing and Virtualization.

I am moving this blog to its own domain: TheBigGrid.com.

Please update your link and site feed!