The Big Grid

How can I talk to Kim?

Well, to get across the message about “portability”, first I have to suffer the lacking of it.

I was trying to add a link or a trackback to Kim's blog thread on
<quote>BBAuth and OpenID move identity forward</quote>

First, it wasn't a fault of Cardspace. I sent him a message using the message post page on his site on September 20. The message was not answered and I had no way to tell if it was problem of 2idi.com or a spam filter. (I wish he wasn't trying to ignored me. If I didn't ask the question in the right way, at least I think my idea was pretty original. He got to give me credit for saying something new. I bet.)

Now, his relevant post about BBAuth reminded me to try again. The private way didn’t work. Maybe it should be a blog-to-blog discussion to begin with anyway. He would read user comment on his blog, I said.

Ar, it required another login (not 2idi.com that required to send him a message). Maybe it was better, cus 2idi.com didn't work for me anyway. It was an annoying fact of life of the web without federated identity system.

Now, trying to post a comment, I got this:
<quote> https://www.identityblog.com/wp-login.php</quote>

Alright, I found no link for creating a new account. Tried with Firefox first. It tried to fetch info for the required plugin, but didn’t suggest me how to get CardSpace plugin with it.

Alright, let try IE then. It didn’t work. Hum, I thought maybe IE 7 would. I downloaded it, gave my trust to a Beta, and restarted my computer. (It was pretty scary indeed. The download page asked me to backup all data I had before I proceed to avoid losing of all my data.) And, going to the site again, it was what I got:
<quote>To install Windows CardSpace, install .NET Framework Runtime 3.0.</quote>

Another download (I think .Net is a big thing), another restart?, taking another risk of losing all my data?

At least before a new system is widely adopted, all I want to say is that I wish there is an easier way to get a message across.

Tag: identity, cardspace

Questions to Kim Cameron on Identity

Kim,

I appreciate your work on identity and the way you devote it to the public.

Introduction
--------------
I have a few questions (and some scattered ideas) about CardSpace. I have read most of document/demo/example on your site briefly, but other than that I am new to CardSpace.

Problem Space
------------------
I am looking at it because I am investigating on aggregating information for the same user from multiple sites that each use different authentication. (it is a personal project that I have been working on prior to joining the current company. :-)

Fixing Passport
------------------
My first question is the following:
What do you think about fixing Microsoft Passport, instead of introducing CardSpace. Please see my post on my blog:

identity-crisis

The Laws
------------
For the seven “laws” that you defined, many of them can be fixed without the radical from Passport to CardSpace.

For example, “User Control and Consent” and be built, so does “Minimal Disclosure for a Constrained Use”, “Justifiable Parties”, “Pluralism of Operations”.

Adoption of CardSpace
----------------------------
While I see CardSpace is a good solution in theory, I remain doubt about the adoption, even I aware Firefox and Sarifa demo was shown.

Accessibly that I am not willing to give up
----------------------------------------------------
I access my web email on work, home desktop, laptop, cell phone, and friends’ computer. All of them are running on Microsoft platform (including my cell phone), I don’t foresee all of them support CardSpace soon enough. For example, a friend of mine still use Windows 95, and my Windows smart phone is not upgradeable. I don’t think it is convincing for a user to move to a new mechanism to lose accessibly that he has already enjoyed.

Passport-like mechanism is not unique to Microsoft
---------------------------------------------------------------
In fact, other major portal is using similar authentication mechanism (forward to id server, request user/pass, forward back). They’re doing so in a more controlled manner and didn’t cause as much as bad publicity as Microsoft does. For example, Flickr.com use Yahoo id server to authenticate. I am not saying they don’t have security problem of their own. But, authenticating mechanism like Passport is already there and it worth the effort to fix it, instead of scarping it altogether.

Spoofing and Key Trapping
---------------------------------
You mentioned a few time that Spoofing as a major problem. However, the concept of having a USB drive to store my CardSpace cards concerns me much more than spoofing. How can I trust a computer (in internet café for example) not stealing my entire Cardspace cards in my USB drive once I plug it in? If it require a master password to open my Cardspace card, then I need to concerns about key trapping software in a internet cafe.

To me, Key Trapping problem can safely solved by deposable password like those generated by a RSA token. But, Cardspace doesn’t address it problem, which also part of the adoption problem. (of course, RSA token has adoption problem of its own… because of the cost?)

What do you think about adoption?

Tag: identity, cardspace

Flexible Rails (New Book on Ruby Rail and Macromedia Flex)

I am very excited to relay this news from Peter Armstrong:
Flexible Rails Alpha Version Released!

This book is about using Macromedia’s Flex 2.0 and Ruby on Rails 1.1 together. The book presents the technologies as a tutorial. It gives a brief introduction and covers entire Web 2.0 application development: front end (Flex), web tiers (Rail), database and installation. It goes beyond typical tutorial books that you actually got a working and usable application at the end.

Peter is a great friend of mine. He graduated from the UVic a bit earlier than me. He went to work in Bay Area (and back to the Northwest) a bit earlier than me. But, he learned to appreciate Macallan a lot earlier than me. He is an early adopter of technologies and very passionate software engineer.

Excellence job, Peter!

Tag: ruby,
rails,
ruby rails

Reading List -- Sept 06 -- Google Research

Links in my reading list:
http://labs.google.com/papers/gfs.html
http://labs.google.com/papers/bigtable.html

Tag: clustering

Identity Crisis

A couple weeks ago, I spent a couple of days learning about login/identity. Didn’t come around to blog it until today.

Revisits of Microsoft Passport
-------------------------------------
Although Microsoft almost declares Passport Network a defeat, I think it can be made useful with a few twists. I feel that what it requires most is actually not technical related. First, it should improve its transparence to user on how its work. It should require user explicit consent to allow third party site to identify a user id, and allows easy modification of the allow list. It should not assume user has only one identity. It should also get rid of the centralized data store in msn sites.

Otherwise, the login delegation mechanism and the ability to use the same login for multiple sites are worthwhile to keep, at least until something else come along.
The login delegation works like this:
1/ User visits a site that supports login delegation. (let’s call it the action site for now)
2/ The site forward user to the login site (such as Passport)
3/ The login site shows login page to request user for password (if not yet log in)
4/ The login site forward user back to the action site with some information that identify the user as successfully logged in.

If a user already logged to Passport network previously with the same browser, and visits an action site (that supports Passport) site, the site can request Passport network to identify him by forward it to the login site.

User might expect to browse the site anonymously but was being identified. The problem can be fixed by letting user to identify what site should identity him and what site shouldn’t. If user is being forwarded by an action site the first time, user should be shown an agreement and warning. User should be let to choose between “Always Allow”, “Allow for this session only”, “Disallow”. If user chooses Disallow, every time the action site requests the user to be identified, the login site should always forward back as if no user is login to that computer. User should able to modify his/her choice later by going to the login site directly.

It should not assume user to have only one identity. The problem can also be helped with the above modification. When user logged into works, an action site shouldn’t able to identify his work id without his consent. When the same user logged into his hotmail account, if the user approves, the action site might identify him.

New to InfoCard
--------------------
Microsoft takes InfoCard as the next steps of Passport Network.

Folks in the group identified a few critical rules for identity and authentication in general: The LAWS OF IDENTITY. While all seven of them are totally critical, I don’t think they are covering everything.

The omission was portability (or should I say accessibility). Without total portability, users are bounded to two choices. First, not accessing the feature they want where they need it (for example, don’t check email in a public computer, don’t use cell phone to check WAP mail), or use a different mechanism when another feature is supported. The first choice is painful. The second choice… the strength of a chain is its weakest link. Either way, it doesn’t solve the original problem. It either restricts access, or leave user alone when he needs the security most.

It is a chicken and egg problem. The portability limits the adoption. And, the adoption limits new devices to support the mechanism, which eventually limits portability.

The omission is leaking from the laws to InfoCard, and I think it is fatal.

Tag: cardspace, identity

Very interesting conference about Grid Computing

Too bad that I have to miss it this year.

Those "Topics in Grid Management" interests me most. I hope they will make the paper/powerpoint slides/video avaliable later.

Tag: clustering, grid

Economy of Scale vs. Scaling Economy

Sometimes, large scale can bring disadvantage. Let’s starts with one of the largest cluster: Yahoo! Mail.

When Gmail comes out, I then realized I how deep I integrated with Y! Mail.
- Pop mail,
- SMTP,
- web mail,
- spam filter,
- another yahoo account for spam
- disposable email address,
- address book synchronization with outlook,
- email notification to Yahoo Messenger,
- message archive,
- mobile email alert,
- stock alert,
- weather alert,
- custom email address support,
- multiple account support,
- color coded email by account,
- ads free Yahoo Plus,
- WAP Yahoo mail, and
- Support reading multiple mails using browser’s tabs.

Yes, I used all of the listed features on daily basis.

I have a lot of sympathy to Yahoo when people mistaken Gmail was better. No, Gmail is years behind.

When Gmail switch from 1G storage to 2G, it hurts Y! Mail badly. At the time, the number of Yahoo mail users vs. Gmail’s is probably 100:1. (the ration these days maybe closer to 15:1)

To match Gmail average storage for each user (most user don’t use anything close to 2G), Yahoo pays 100 times more. Each additional MB for 300 millions of users is sustainable money for acquisition cost for the hardware and operational cost like electricity.

To scale to 300 millions users, it is much more than adding machines. It takes a lot of tricks and R&D to get it close to liner scaling. Anything less than liner scaling costs exponentially more. It makes or breaks the economy and feasibility of providing the service.

Werner Vogels [Amazon CTO] talked a lot about Dark Art, scaling and its pain.

In the famous “we [google] are a $100 billion company” financial conference, Eric Schmidt [Google CEO] quoted that the know-how, software and infrastructure to scale to massive number of user is one of Google’s key strength that they are expanding their leads on. However, Gmail (so does Yahoo) has periods that the performance is slowing down badly. It drives away users if not just growth.

Then Steve Ballmer [Microsoft CEO] use analogy about “data center” build everywhere in the world like electric stations. Later, Microsoft announces the billions of missing future profit to build the infrastructure to complete.

The dot com races reach the point that idea alone is far from enough. It is also about the ability to make economy sense out of the service providing to massive amount of users. The ability to scale the computing power is playing a big part to determine the success of a business.

Tag: scalability

Kelowna

I took two days off. Adding a weekend and friends, I got a great 4 days trip to Kelowna BC Canada. Okanagan Lakes was amazing, so does the scenic wineries along it. Warm weather, clear sky, mild breeze from the lake: life is good! It was 4 hours from Vancouver. But, the highway was great. Two lanes in each direction for almost the whole trip (will be). If you like Napa, you "must" give Kelowna a chance. It can become one of your favorite too.

Tag: personal

Quest to a more efficient LockSet (volatile field, semaphore, reentrant lock)

While I am staying up late, I recall an article about anti-sleeping pill that I read a couple of weeks ago. Somehow, I tend to have a much clearer mind late at night (I mean after I awake for many hours :-P). It explains why many of my blog was written late at night.

The feeling that I can stay up as late as I want in a Sunday is very good. A long weekend means that I can have two really long late nights for my own. Labor Day wasn't only an extra weekend day in a week, but also doubling my productive time to my project.

And, I spent it on coding a small part of the cluster cache project: the quest to a more efficient LockSet. It actually started while I was driving 100 miles north Friday. I was driving alone and turned off the stereo in my car to think about the problem. After about an hour running thru my 4-steps synchronized block LockSet in my head, I realized the only way to get an more efficient (in term of how much synchronization I need to do) lock set requires the ability to enter a semaphore before leaving the other.

I read the pseudocode of lock from Gray/Reuter book. The sequence looks something like that:
1/ semaphore get on the data structure for the lock set (ie, spinlock S),
2/ find the node represent the lock, or create one if not exists
3/ semaphore get on the node (ie, spinlock N)
4/ semaphore give the lock set (ie, unlock S)
5/ acquire the lock (may block the thread)

It sounds simple, but it cannot be done “efficiently” with Java’s synchronized blocks.

Spinlock (compare and store) is a primitive pillar in concurrent world that it cannot be reduced. Of course, you can simulate a spinlock using Java synchronized block. But, synchronized block is itself built by Spinlock.

Java 1.5 provided a set of new concurrent utilities. I know that it has a lock interface that would allow me to do what the Gray/Reuter lock implementation did. So, I dig deep into it.

After digging deep into the code, I found ReentrantLock is itself pretty expensive. Semaphore is a bit lighter, because it doesn’t maintain a linked list for thread, but it still maintaining more state than I need. But, in the process, I found what I want, the spinlock.

It is exposed to the API thru (AbstractQueuedSynchronizer/AtomicXYZ.compareAndSetState(int, int)). The “spin-unlock” is achieved by AbstractQueuedSynchronizer.setState(). The javadoc didn’t mention the memory model constraints. However, the way Semaphore implemented using those methods implies that compareAndSetState() and setState() acts as a read-barrier and a write-barrier respectively.

I would expect setState() to call a method in the other class that declare native. To my surprise, it simply sets the volatile field, that compareAndSetState() set using a native method.

Why it reqiures setting a volatile field only? I remember volatile as read/write-ordering on that specific field. But, a write-barrier is a different guarantee.

It was because I rarely find a useful case for volatile field. Because the guarantee was on the field only, you cannot use it to guard another data. While I aware JMM changes in Java 5, and followed the mailing list for a few good months, I didn’t pay much attention to volatile field changes.

Brian Goetz explained it very well with his article in developerWorks.

The semantic of volatile field in Java 5 is updated to have memory boundary guarantee.

Now, after this quest to a more efficient lock set, I gained the understanding of not only how to implement it efficiently, but also what was missing in older version of Java, why JMM change is needed, way better understanding of the JMM itself.

It is a good feeling that what I need is already created by someone else before I need it.

Now, I think I know JMM very well, come challenge me with tough questions! :-P

Tag: concurrency

Good load test to expose the vunerabilities of the cache

In response to a user question on load test, I think it worth a blog by itself. Those are excellence questions. :-)

Coherence is a pretty mature product. I would think it should work pretty for the read-only cases.

I can think of 4 areas that the cached system can be choked with:
a/ network-load for cache synchronization,
b/ cpu load for cache management and synchronization,
c/ cpu load for doing deserialization of your data,
d/ and database access

Depends on the way which the application access the data, the system might still choke on the last two before the cache management overhead become a problem.

Database might still be the bottleneck
--------------------------------------
For example, if I have an application need to scale to high number of users who didn’t sharing too much data among them (HR application that most user concerns mostly about his/her own data), I would want to watch the CPU, file I/O, and network utilization of the database as I am adding more machines to cache cluster, especially if it is a single database (or a cluster of database) that all machines connect to. It is good to do a little projection on how many cached machines that the single database can support.

Deserialization
---------------
If I have an application that most machine shares the same set of data, then I would watch for the time that each machine spent on deserialization. If each machines request the same cache, the data will be sent over the wire and being deserialize on each machine. The time spend on deserialize might be significant. I am not sure Coherence’s near-cache is cache as object or serialized form. It would be good to check. Even if the near-cache is kept as object, with moderate changes to data, you might still see quite a bit of deserialization, because the cache will need to re-fetched.

Really large cluster
--------------------
If you have a really large cluster (say 64 machines or up), then you might need to profile the first two as well. The overhead is believed to be small, but the total times spend on the communication is at least in the magnitude of bigO(n^2), where n is the number of machine. Even the overhead is unnoticeable with 4 machines might show up as significant for when you have 64, for example.

Tag: distributed cache, cluster cache, clustering, concurrency